GNN charges in field releases and changes to SMIRNOFF spec | LW/JW | |
Planned changes to SMIRNOFF spec | JW | JW: should think of a way to define the GNN in the force field in a way that’s future-proofed. Espaloma_charge refers to a URL, but URLs are not timeless JC: How do we featurize system/record featurization scheme (could call it “openff featurization model 1”) What’s the model that the features get fed into (this can have a DOI). Could use serialized pytorch, some better options in ONXX.
YQ: what’s wrong with referring to a release? JC – Still need to record features YQ – Can serialize pytorch models. Seriaized model could have a version check. JC: currently the tag in SMIRNOFF tells you exactly how to do the calculation LW – Assuming you can do that, I’m still confused as to how referring to a release number wouldn’t work. JC – Like specifying a GH repo and release number? LW – Yeah JC – It would mean that each version could be a totally different program LW – Nagl tries to have the model weights and features BOTH be part of a version. Nagl has a yaml file that specifies how to featurize and extract results. LW – YW – JW – Would yamls be future proof? What if the yaml spec changes? JC – Spec could use descriptors from OFFMol representation. JW – Need to avoid “the implementation is the spec” - This is what we’d effectively have if we did repo+version JC --- ONXX would be portable, the big thing to do is define the featurization. MT – Wonder if there’s a fundamental wall we’re hitting - The current parts of the SMIRNOFF spec manage to define things using a few sentences of english. It’s possible that this is extremely hard to do with ML models. So it’s possible that we’ll need a less precise spec here. JW – I’ll go ahead and prepare a proposal for this spec section. JW – LW - Is aromaticity a possible descriptor in nagl?
|
Update on current performance and GNN roadmap discussion | LW | Integrate GNNs for partial charge calculation into OpenFF toolkit if performance lies within AmberTools range across spice and OpenFF benchmark datasets Improve NAGL usability and extensibility JC (chat, slide 3) – The training strategy / hyperparameters also impacts the resulting model. JC (ESP error slide) – We’re looking at the same sort of thing. JC – Did the GNN determine hardness and electronegativity for an equil model? JC – Can/did you use regularization to keep outliers from going off the rails? JC – Are you seeing the same sorts of outliers in the training set for the same chemistry? LW – These are kinda exotic chemistries, they may only appear a few times in training set. JC – Fitting to charges or ESP in training? LW – charges. But ELF10 is pretty stable over different conformers. JC – Since these are all sterically congested, it’s possible that this is an artifact from a tricky partition of e- density/ESPs, and GNNs are getting rightfully confused trying to fit them
JC – Do you expand protonation and tautomer states before training? LW – Yes JC – OE makes weird tautomers/protomers sometimes. Lw – We make our own ELF conformers, too, which may introduce differences.
MT – I’d heard earlier goals as being “reproduce charges at least as well as ambertools”. Then I heard something slightly different today. LW – Nailing down the exact finish line has been hard. I’m leaning toward “better in every category”. Like, if we could do better on most chemistries except that we do much worse on a few others, I don’t think that’s good enough for us. JC – We initially wanted a drop-in replacement that’s improvable. But at this point you’ve accomplished that goal and this is ready as a starting point. The current starting point for library charges are ff14sb (not AM1BCC), right? JC – Can determine the locality by the number of message passing rounds. JW – Not sure this is answered LW – Upcoming HFE calcs will determine whether this model is ready to go
JC – Has anyone looked at the riniker charge model? LW – Not past checking out the paper. JC – They have an interesting idea for looking at multipoles. So I wonder what future architectural changes we might make and how we could prepare for that. LW – May be more of a smirnoff spec question JW – If we use NNs for other terms, we may need a third item in the spec covering “how do we match up the model outputs to the slots in the topology where they apply?”
JC (chat) Was just wondering if we expect to use external package (eg NAGL) to assign charges, or if GNN would be baked into toolkit. TG – How could I query a quality/error measure? Can this produce a confidence value for the outputs? LW – Could have a population analysis of the training set. I’m not sure how the model itself would go about knowing its own confidence level. TG – Like, you’ve identified outliers. I wonder if the NN could be trained to know when it’s making an outlier. JC – There are ways to train such that you predict both the value and the error of the outputs. Then loss funciton could also score how well it predicted the error. JW – I’d heard of other models that fingerprint the input and compare it to the fingerprints of the training set. If the input doesn’t look like anything in the training set then it predicts low confidence. JC – This shouldn’t be a requirement though. TG – This would be handy for me doing experimentation. JW – Agree with JC – Would be useful but isn’t a requirement. We have AT in production right now which can also make big mistakes sometimes. LW – Keep in mind that this uses AT ELF10.
JC (chat) Do we have ambertools elf10 in toolkit?
|