2025_03_04 JM/LW/JW JM – Do we want to use ff_optimize? Huge amount of code that will need a lot of documentation. If all we want from BeSMARTS is splitting and optimizing that’s far smaller. LW – I’m not super interested in using TG’s exact code - From what TG’s walked me through, the fitting and splitting is super intertwined, but it doesn’t seem like it’ll scale. JM – LW – I’m happy to leaving splitting to BeSMARTS and do fitting ourselves. JM – Interested in knowing high-level API point(s) that we want - I can document that pathway/work it into an example. LW – Can’t lay that out off the top of my head but would be helpful to talk through it today. BeSMARTS currently … If we could have a work flow that splits the split/combining and the actual parameter fitting into two steps, where besmarts handles the split-combining with: Inputs of [starting besmarts / last set of candidate fits] Inputs of a FF score for each candidate fit Outputs of either the next candidate fits to try, or converges to a final answer
And we handle the actual fitting + scoring of each FF with our choice of backend (smee/FB) Re: output the next candidates - BeSMARTS outputs ALL candidates (though it then does subsequent rounds of elimination based on increasingly expensive benchmarks) Are output candidate SMARTS ranked? Is there “memory” between iterations? Do outputs have physics parameters? Does BeSMARTS do something clever to avoid trying all splits, or does it really start with all of them and narrow down?
What is scope of this work? Do we want to aim to reimplement or use as-is? JW – I think we’ll want to reimplement if we need to use this seriously. LW – Code currently isn’t very pythonic, hard to delve into to assess code quality.
User stories Science team picks up what we made next month and starts doing parameter splitting We all get hit by a bus and none of our memories survive, and next year the new OpenFF staff pick up the thing we made and start doing automated parameter splitting for FF fits
JM – So should I try to understand splitting code rather than optimization, since we’ll use our own opt? LW – I think roughly yes. What was the deal with the interface that was being discussed at the meeting. JM – Idea was to write a wrapper to read SMARTS/OFFXMLs using OpenFF stack. TG also thought we’d want to make an OE wrapper, but we don’t, and there’s some confusion around aromaticity models that are likely a bug on his end. JW – I think there’s a high likelihood of a molecule processing bug. And if there’s not a bug then we should use the RDKit MDL implementation instead of the OE one.
JM – We’ll also want to use the OpenFF ForceField object, which will require a wrapper.
Desired outcomes (high to low priority) Want to know fairly rigorously how splitting works, especially gotchas and assumptions, and links to code in BeSMARTS if possible Documenting conceptually how splitting links to ff_optimize code (loosely) Documenting initial guess logic would be good (TG’s guesses seem to beat MSM values and can be applied to torsions) Using this on problems reported by cresset If it becomes clear or already exists, documenting a specification for the splitting behavior.
Products Start writing docs in JM’s fork. Focus on function over form, even if it means that changes may not get upstreamed. JM tries scripting up a toy system (maybe debugging Cresset FF issues?) as a means of learning to use BeSMARTS. Would need a pretty well laid-out problem. LW – The current cresset issue is a bit subtle and may not be great here - complicated by similar parameters and symmetry issues. LW – I actually sent trevor some info about this and I think he’s already running it. JM – I’ll ask TG about this tomorrow, and I’ll work on learning splitting code generally.
Agenda for tomorrow
|