GNNCharges spec draft | Context: The next FF release, openff-2.3.0, is planned to be trained to NAGL-produced charges Important stakeholders have ruled out making NAGL “just another AM1BCC provider” under the ToolkitAM1BCC tag We’d like at least a release candidate of this FF to be available at the annual meeting in May.
MT started a proposal in this PR (though it’s easier to read the rendered version) (10 mins) Read over the current draft EP What are natural “cut points” in the level of detail where we can stop describing behavior (or refer to external specs)? JC: I like this PR, it could be the germ of something that’s quite flexible and suffice for several generations of NN. There’s still a lot of ambiguity that could be simplified. Two main things: The PT file contains the model architecture definition. Could instead use onnx files to contain the model and architecture. DM – Proposing that some of the detail in the draft spec can live in the pytorch model file itself? JC – Yes. Pytorch file can contain the architecture. LW – Some details that make this less straightforward - I tried porting to onnx but was unable to because of DGL, and there’s still some logic (static vs. dynamic compute graphs) that doesn’t port. Also I agree with maybe outsourcing to pytorch, but raises a philosophical question of how much we want to export. DM – If reasonable other folks can use the pytorch file and get correct results, then I’m in favor of keeping those details out of the spec. JC – I think there’s also tooling to convert pytorch models to otherthings. LW – Are pytorch files just pickle files? Are they NAGL specific? JC – If the model has nagl-specific things in it, then it might not be unpickleable by vanilla pytorch. JW – Details? LW – Right now our pt files are nonstandard and wouldn’t be loadable because we cram some other stuff in there. But even if we didn’t it still has some specific classes.
JC – Also question of whether charge equil could be made part of model, or if it needs to be a separately defined postprocessing stage.
DM – I’m thinking that we should strip out things at the level of defining “ReLu” since that’s too domain specific. So I’d be in favor of a simplified version of this spec that allows loading a pt file with nagl. And as we proceed scientifically we think about how to make this easier to re-implement in future versions. JC – Another thing from last time would be defining a more constrained NAGL version requirement/model, and then in the future implement a more portable standard. Would be a tradeoff for short time to delivery.
JW: formal charge enumeration reactions? LW – Enumerating resonance forms to get a noninteger formal charge. Defined in Gilson’s vcharge paper. DM – If it’s defined in Gilson’s paper then can we refer to that? JC – … LW – … JW – I think having it defined in paper is a very good case.
LW – Standardization reactions might not actually be part of the spec, since I consider it an implementation detail. JC – Would be good to define feature types (int, floats, etc). JC – Would be good to define BondFeatures JC – Would be good to define model outputs (what does it mean, size+shape?). That is, given a pt file, how to interpret what comes out (units, meaning; ex hardness and such from first layer) LW – get 3 vals - initial charge, electronegativity, hardness. Then equilibration happens. And final layer gives charge. JC + LW – … (see recording 36 mins) DM – Thinking of this being black-boxish. JC – If we want the OFFXML contents to be 1:1 for NAGL inputs, we should take the NAGLCharges approach. But if we want to be more general then it could be GNNCharges (general) - So for now we will not do the Model section and instead say that the details are stored in the custom pt file. And in the future we will make a new spec version that is made for additional portability.
JC – Permutational invariance? Molecule fragments/multiple mols? LW – Lookup table is also in the .pt file. This likely won’t be part of the next generation but should be documented in this one. JC – Could be implemented as librarycharges JW – LW – Lookup table contains several thousand mols, could be trimmed to hundreds if we remove unrealistic mols JW – I’m slightly in favor of leaving these in pt file, but neither solution is good. JC – If we’re not promising it’s AM1BCC, then do we really need a lookup table? (general) – We’re playing it very safe JC – JW – Actually maybe best to do librarycharges?
|