/
2025-04-01 SMIRNOFF committee meeting notes

2025-04-01 SMIRNOFF committee meeting notes

Participants

  • @David Mobley

  • @Lily Wang

  • @Matt Thompson

  • @Jeffrey Wagner

  • @John Chodera

Goals

  • Decide at which level of detail we want to specify technical details of the GNNCharges SMIRNOFF spec section

Discussion topics

Item

Notes

Item

Notes

GNNCharges spec draft

  • Context:

    • The next FF release, openff-2.3.0, is planned to be trained to NAGL-produced charges

    • Important stakeholders have ruled out making NAGL “just another AM1BCC provider” under the ToolkitAM1BCC tag

    • We’d like at least a release candidate of this FF to be available at the annual meeting in May.

  • MT started a proposal in this PR (though it’s easier to read the rendered version)

  • (10 mins) Read over the current draft EP

  • What are natural “cut points” in the level of detail where we can stop describing behavior (or refer to external specs)?

  • JC: I like this PR, it could be the germ of something that’s quite flexible and suffice for several generations of NN. There’s still a lot of ambiguity that could be simplified.

    • Two main things:

      • The PT file contains the model architecture definition. Could instead use onnx files to contain the model and architecture.

        • DM – Proposing that some of the detail in the draft spec can live in the pytorch model file itself?

        • JC – Yes. Pytorch file can contain the architecture.

        • LW – Some details that make this less straightforward - I tried porting to onnx but was unable to because of DGL, and there’s still some logic (static vs. dynamic compute graphs) that doesn’t port. Also I agree with maybe outsourcing to pytorch, but raises a philosophical question of how much we want to export.

        • DM – If reasonable other folks can use the pytorch file and get correct results, then I’m in favor of keeping those details out of the spec.

        • JC – I think there’s also tooling to convert pytorch models to otherthings.

        • LW – Are pytorch files just pickle files? Are they NAGL specific?

        • JC – If the model has nagl-specific things in it, then it might not be unpickleable by vanilla pytorch.

        • JW – Details?

        • LW – Right now our pt files are nonstandard and wouldn’t be loadable because we cram some other stuff in there. But even if we didn’t it still has some specific classes.

      • JC – Also question of whether charge equil could be made part of model, or if it needs to be a separately defined postprocessing stage.

        • LW – In one world, everything could live in the pytorch model, and in another we could define everything in spec.

    • DM – I’m thinking that we should strip out things at the level of defining “ReLu” since that’s too domain specific. So I’d be in favor of a simplified version of this spec that allows loading a pt file with nagl. And as we proceed scientifically we think about how to make this easier to re-implement in future versions.

    • JC – Another thing from last time would be defining a more constrained NAGL version requirement/model, and then in the future implement a more portable standard. Would be a tradeoff for short time to delivery.

      • DM – Then the first version could be NAGLCharges, before generalizing to GNNCharges.

      •  

  • JW: formal charge enumeration reactions?

    • LW – Enumerating resonance forms to get a noninteger formal charge. Defined in Gilson’s vcharge paper.

    • DM – If it’s defined in Gilson’s paper then can we refer to that?

    • JC – …

    • LW – …

    • JW – I think having it defined in paper is a very good case.

  • LW – Standardization reactions might not actually be part of the spec, since I consider it an implementation detail.

    • JW – If we later upstream standardization into toolkit behavior, it might be weird for this to start in the GNNCharges spec.

    •  

  • JC – Would be good to define feature types (int, floats, etc).

  • JC – Would be good to define BondFeatures

  • JC – Would be good to define model outputs (what does it mean, size+shape?). That is, given a pt file, how to interpret what comes out (units, meaning; ex hardness and such from first layer)

    • LW – get 3 vals - initial charge, electronegativity, hardness. Then equilibration happens. And final layer gives charge.

    • JC + LW – … (see recording 36 mins)

    • DM – Thinking of this being black-boxish.

    • JC – If we want the OFFXML contents to be 1:1 for NAGL inputs, we should take the NAGLCharges approach. But if we want to be more general then it could be GNNCharges

    • (general) - So for now we will not do the Model section and instead say that the details are stored in the custom pt file. And in the future we will make a new spec version that is made for additional portability.

    •  

  • JC – Permutational invariance? Molecule fragments/multiple mols?

  • LW – Lookup table is also in the .pt file. This likely won’t be part of the next generation but should be documented in this one.

    • JC – Could be implemented as librarycharges

    • JW –

    • LW – Lookup table contains several thousand mols, could be trimmed to hundreds if we remove unrealistic mols

    • JW – I’m slightly in favor of leaving these in pt file, but neither solution is good.

    • JC – If we’re not promising it’s AM1BCC, then do we really need a lookup table?

    • (general) – We’re playing it very safe

    • JC –

    • JW – Actually maybe best to do librarycharges?

To do

  • Add reference to vcharge or charge enumaration

  • Point to implementation instead of defining Model section

  • Name this version NAGLCharges or something like that

  • Decide whether to split lookups into big librarycharge section

  • Fill out bond features

  • Specify data type of features (int, float, etc)

Action items

Decisions

Related content