2023-01-19 Force Field Release Meeting notes

 

 Date

Jan 19, 2023

 Participants

  • @Chapin Cavender

  • @Pavan Behara

  • @Jeffrey Wagner

  • @Lily Wang

  • @Trevor Gokey

  • @Matt Thompson

  • @John Chodera

 

 

MEETING RECORDING

 Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

GNN charges in field releases and changes to SMIRNOFF spec

LW/JW

  • Rosemary 3.0.0 will be released with LibraryCharges

  • Rosemary 3.1.0 will have GNN models

    • JW: this is a change from previous plans. Previously, we planned on using ToolkitWrappers so GNN models could take over charge prediction behind the scenes. Now we plan to actually have different force field contents that specify the backend of the charge model (leads into next topic)

    • JC: seems like a good idea to get rid of ambiguity and heterogeneity

    • PB: are you planning to include something like openff-x.x.x-gnn.offxml to differentiate the force field? If we kept releasing a librarycharge FF line then folks could iteratively improve on that

    • JC: it makes sense to continue on with GNN charges to gracefully handle novel chemistry / non-standard molecules and amino acids, e.g. posttranslational modifications

    • JW: LibraryCharges may be more difficult to maintain in the long-run, and branching the force fields would also make maintenance costs exponential

    • JC – Could call the GNN FF “thyme”.

    •  

Planned changes to SMIRNOFF spec

JW

  • JW: should think of a way to define the GNN in the force field in a way that’s future-proofed. Espaloma_charge refers to a URL, but URLs are not timeless

  • JC:

    • How do we featurize system/record featurization scheme (could call it “openff featurization model 1”)

    • What’s the model that the features get fed into (this can have a DOI). Could use serialized pytorch, some better options in ONXX.

  • YQ: what’s wrong with referring to a release?

  • JC – Still need to record features

  • YQ – Can serialize pytorch models. Seriaized model could have a version check.

  • JC: currently the tag in SMIRNOFF tells you exactly how to do the calculation

  • LW – Assuming you can do that, I’m still confused as to how referring to a release number wouldn’t work.

    • JC – Like specifying a GH repo and release number?

    • LW – Yeah

    • JC – It would mean that each version could be a totally different program

    • LW – Nagl tries to have the model weights and features BOTH be part of a version.

    • Nagl has a yaml file that specifies how to featurize and extract results.

    • LW –

    • YW –

    • JW – Would yamls be future proof? What if the yaml spec changes?

    • JC – Spec could use descriptors from OFFMol representation.

    • JW – Need to avoid “the implementation is the spec” - This is what we’d effectively have if we did repo+version

    • JC --- ONXX would be portable, the big thing to do is define the featurization.

    • MT – Wonder if there’s a fundamental wall we’re hitting - The current parts of the SMIRNOFF spec manage to define things using a few sentences of english. It’s possible that this is extremely hard to do with ML models. So it’s possible that we’ll need a less precise spec here.

    • JW – I’ll go ahead and prepare a proposal for this spec section.

    • JW – LW - Is aromaticity a possible descriptor in nagl?

      • LW – Yes

  •  

Update on current performance and GNN roadmap discussion

LW

  • Integrate GNNs for partial charge calculation into OpenFF toolkit if performance lies within AmberTools range across spice and OpenFF benchmark datasets

  • Improve NAGL usability and extensibility

  • JC (chat, slide 3) – The training strategy / hyperparameters also impacts the resulting model.

  • JC (ESP error slide) – We’re looking at the same sort of thing.

  • JC – Did the GNN determine hardness and electronegativity for an equil model?

    • LW – Yes

  • JC – Can/did you use regularization to keep outliers from going off the rails?

    • LW – Looking into that next

  • JC – Are you seeing the same sorts of outliers in the training set for the same chemistry?

    • LW – These are kinda exotic chemistries, they may only appear a few times in training set.

    • JC – Fitting to charges or ESP in training?

    • LW – charges. But ELF10 is pretty stable over different conformers.

    • JC – Since these are all sterically congested, it’s possible that this is an artifact from a tricky partition of e- density/ESPs, and GNNs are getting rightfully confused trying to fit them

  • JC – Do you expand protonation and tautomer states before training?

    • LW – Yes

    • JC – OE makes weird tautomers/protomers sometimes.

    • Lw – We make our own ELF conformers, too, which may introduce differences.

  • MT – I’d heard earlier goals as being “reproduce charges at least as well as ambertools”. Then I heard something slightly different today.

    • LW – Nailing down the exact finish line has been hard. I’m leaning toward “better in every category”. Like, if we could do better on most chemistries except that we do much worse on a few others, I don’t think that’s good enough for us.

    • JC – We initially wanted a drop-in replacement that’s improvable. But at this point you’ve accomplished that goal and this is ready as a starting point. The current starting point for library charges are ff14sb (not AM1BCC), right?

      • CC – The rosemary librarycharge will be AM1BCC-ELF10 calculated on 5-mers, not ff14sb charges.

    • JC – Can determine the locality by the number of message passing rounds.

    • JW – Not sure this is answered

    • LW – Upcoming HFE calcs will determine whether this model is ready to go

  • JC – Has anyone looked at the riniker charge model?

    • LW – Not past checking out the paper.

    • JC – They have an interesting idea for looking at multipoles. So I wonder what future architectural changes we might make and how we could prepare for that.

    • LW – May be more of a smirnoff spec question

    • JW – If we use NNs for other terms, we may need a third item in the spec covering “how do we match up the model outputs to the slots in the topology where they apply?”

    •  

  • JC (chat) Was just wondering if we expect to use external package (eg NAGL) to assign charges, or if GNN would be baked into toolkit.

    • MT (chat) Mostly the latter - it’ll be a ToolkitWrapper and NAGL will be installed with the toolkit by default

    •  

  • TG – How could I query a quality/error measure? Can this produce a confidence value for the outputs?

    • LW – Could have a population analysis of the training set. I’m not sure how the model itself would go about knowing its own confidence level.

    • TG – Like, you’ve identified outliers. I wonder if the NN could be trained to know when it’s making an outlier.

    • JC – There are ways to train such that you predict both the value and the error of the outputs. Then loss funciton could also score how well it predicted the error.

    • JW – I’d heard of other models that fingerprint the input and compare it to the fingerprints of the training set. If the input doesn’t look like anything in the training set then it predicts low confidence.

    • JC – This shouldn’t be a requirement though.

    • TG – This would be handy for me doing experimentation.

    • JW – Agree with JC – Would be useful but isn’t a requirement. We have AT in production right now which can also make big mistakes sometimes.

    • LW – Keep in mind that this uses AT ELF10.

  • JC (chat) Do we have ambertools elf10 in toolkit?

    • JW – Not yet

    •  

 

 

 

 Action items

 Decisions