2021-04-12 Biopolymer working session meeting notes

Date

Apr 12, 2021

Participants

@Jeffrey Wagner
@Lily Wang
@Pavan Behara
@Iván Pulido
@Simon Boothroyd

Discussion topics

Item	Notes

Item

Notes

Overall goals

LW –
- Thought about what pipeline I want to make. Ideally would begin by defining the monomers and how they’re joined, and make a FF for it.
- Trying to define linkages by smarts+indexes
- Some functionality for capping
  - Is there a meaningful difference between a cap and a residue? Is a cap just a residue with one attachment point? Is this important?
- Some problems with attachments/radicals/valences
SB – Have you looked at constructure package? I’ve done something similar with building large things from small things. This could use vanilla smarts/smirks language to enumerate poymers.
LW – There seems to be some overlap here, but maybe not totally.
SB – Some trouble with defining a backbone pattern in such a way as to disambiguate eg. backbone and sidechain
JW – Patterns that group detailed monomers?
- Do the patterns used to generate/enumerate polymers from monomers need to be related to the patterns that will appear in the final FF? Let’s say there are two pieces of infrastructre:
  - Training data enumerator – Consumes some definition of monomers and how tehy can grow, outputs a bunch of molecules that will be submitted to QM calcs
  - Parameter maker – Consumes some definition of what the substructures in the final FF should look like (should they contain entire monomers? Or just minimal chemical context to apply the right parameters to any valid polymer?)
- Seems like there are three options:
  - Inputs are exactly/almost entirely the same
  - Inputs overlap, but each has additional context
  - Inputs can be completely distinct
SB – What’s the original question? Is it “make good bespoke parameters for this polymer”?
- LW – Yes
- SB – We may not have enough context. Where will we tell this machinery whether we want every carbonyl to be the same, or for each one to be unique? Josh Horton may have useful input for how to fit to a single molecule. But I’m not sure how we’d look at intermediate cases
- LW – Define “general FF”?
- SB – Eg, whether C=C in ethylene/polystyrene are all treated the same.
- LW – Based on my conversaitons with polymer chemists, it would be useful to make a FF that behaves well for a set of known building blocks.
- SB – So, there’s a question that we may be able to answer – Could enumerate dimers+trimers of building blocks+caps, then do a general bespoke fit. Could compare this to refitting Parsley parameters.
- JW – so, in the context of the division above, the “training data enumerator” would make all N-length permutations of building blocks, and then then final parameters would come from bespokefit/fragmenter, without being informed by the original building blocks.
- SB – Yes
- JW – Does architecture above look good (with plugin interfaces for training data enumerator+parameter maker?
- SB – It’s probably premature to define architecture this early on
- LW – Agree with SB
- JW – Good point. This is unnecessary at this time.
JW – Ideas for today’s session
- Working on RDKit polymerization machinery
- Set up one-on-one session for QC dataset submission
  - LW – Do I have to use openff QC*?
  - SB – This is something that can be flexible. QCArchive is relatively easy to spin up locally/on your own resources.
- Set up session for bespoke fitting
  - LW – I’ve followed bespokefit’s examples but I don’t know what’s going on.
  - SB – We can set up a meeting with Josh Horton on how to use this.
  - (General) – Can join Wednesday’s bespoke fitting meeting to set up a one-on one.
LW – Should I use OE early on instead of RDKit? Will that be better/easier?
- JW – OE is generally faster/easier than RDKit. OpenFF as an org is requred to do stuff open-source, so we’ll have to refactor OE functionality to RDKit
- SB – OE is much easier to experiment with. But it’s kinda hard to port OE-based functionality back to RDKit. Eg, I was working with substructures in OE and used built-in functionality to satisfy their valences, and it was hard to port this code back to RDKit. It’s hard to say where all of these cases will appear, but they are there.
- LW – I’ll start building new functionality using OE.
LW – What is OpenFF looking to build in the direction of polymers?
- JW – Not too many solid plans, we want to support experimentation.
- IP – Way to study mutations/post translational modifications/generally graph changes to polymers.
- LW – This would be of interest to my previous research group as well.
- JW – Is there much overlap between this and LW’s current plans?
- LW – It could be viewed as “FF generation with one new monomer/building block”
- LW – Also thinking about refitting existing FF to a polymer. Would I use ForceBalance or BespokeFit for this?
- JW – You’d use ForceBalance, though Bespokefit can be used to easily generate default inputs for ForcaBalance, making it easier to automate
- LW – If I run forcebalance, does it refit the entire FF? or just newly added terms?
- JW – You can label which parameters you want refit in a SMIRNOFF forcefield. This is probably true to some extent for other formats as well.
- PB – That’s right, you can label which parameters you want to refit by labeling with the the “parameterize” keyword in the input file. I usually call forcebalance directly, instead of using bespokefit.
- JW – For training new terms, it’s probably helpful to have some overlap with the existing training data, but I’m not sure how much. If only the new parameters are allowed to vary, they may wind up being lower quality than then original FF parameters that were all co-optimized together.
Would protonation states be enumerated in the building blocks, or right before submission for QC computation?
- (General) – Probably best for protonation states to be enumerated during submission, otherwise could end up with triply-deprotonated trimers or other silly things.
How do conformers of polymers get enumerated? Which bonds are subjected to torsiondrives?
- JW – We don’t have rules/standards for this yet. Would be a good area of research.
“Selecting prior widths and weights” item on roadmap
- LW – Looking in GH repo, they’re between 1.4-1.7
- PB – Currently, prior widths and weights are using a single set of values. Prior widths are around 1-5 kcal/mol. This item on the roadmap is basically “done” – We’ve picked good values for these.
  - v1.3.0 release note | 4. Determination of priors width of valence parameters
- Weights are different from priors – Not sure if we have general recommendations for weights, or if they’ll necessarily vary by project/dataset.
- https://chemrxiv.org/articles/preprint/Development_and_Benchmarking_of_Open_Force_Field_v1_0_0_the_Parsley_Small_Molecule_Force_Field/13082561/1
Making a propertyestimator plugin for known experimental/measured polymer properties:
Should the same level of QM theory be used for polymers as small molecules?
Bayesian stuff?
- PB – Bayesian stuff largely deals with uncertainty quantification.
- (General) – Does bayesian stuff rely on analytical forms of everything (eg differentiable version of system)? Or can we do bayesian stuff in fitting already?
- Continuously differentiable view of system:

Meetings

2021-04-12 Biopolymer working session meeting notes

Date

Participants

Discussion topics

Action items

Decisions