/
2025-03-04 LW/JM/JW Check-in meeting notes

2025-03-04 LW/JM/JW Check-in meeting notes

Participants

  • @Josh Mitchell

  • @Lily Wang

  • @Jeffrey Wagner

Discussion topics

Notes

Notes

  • 2025_03_04 JM/LW/JW

    JM – Do we want to use ff_optimize? Huge amount of code that will need a lot of documentation. If all we want from BeSMARTS is splitting and optimizing that’s far smaller.

    LW – I’m not super interested in using TG’s exact code - From what TG’s walked me through, the fitting and splitting is super intertwined, but it doesn’t seem like it’ll scale.

    JM –

    LW – I’m happy to leaving splitting to BeSMARTS and do fitting ourselves.

    JM – Interested in knowing high-level API point(s) that we want - I can document that pathway/work it into an example.

    LW – Can’t lay that out off the top of my head but would be helpful to talk through it today. BeSMARTS currently … If we could have a work flow that splits the split/combining and the actual parameter fitting into two steps, where besmarts handles the split-combining with:

    • Inputs of [starting besmarts / last set of candidate fits]

    • Inputs of a FF score for each candidate fit

    • Outputs of either the next candidate fits to try, or converges to a final answer

  • And we handle the actual fitting + scoring of each FF with our choice of backend (smee/FB)

    •  

  • Re: output the next candidates - BeSMARTS outputs ALL candidates (though it then does subsequent rounds of elimination based on increasingly expensive benchmarks)

    • Are output candidate SMARTS ranked?

      • Yes. Complex/additional parameters are penalized

    • Is there “memory” between iterations?

      •  

    • Do outputs have physics parameters?

      • There’s some sort of initial guess for these outputted, not MSM - something else

    • Does BeSMARTS do something clever to avoid trying all splits, or does it really start with all of them and narrow down?

  • What is scope of this work? Do we want to aim to reimplement or use as-is?

    • JW – I think we’ll want to reimplement if we need to use this seriously.

    • LW – Code currently isn’t very pythonic, hard to delve into to assess code quality.

    •  

  • User stories

    • Science team picks up what we made next month and starts doing parameter splitting

      • LW – TG has something like this in progress but maybe not on GitHub - The “t17” example

        • Use case: splitting t17 to <100 TorsionDrives + optimizations (from Cresset)

    • We all get hit by a bus and none of our memories survive, and next year the new OpenFF staff pick up the thing we made and start doing automated parameter splitting for FF fits

      • JM – Is this example sufficient?

        • LW – I think I looked at this and determined it wasn’t for that particular use case… One example was when I was trying to find a smarts that would split alkanes and non-alkanes.

        • JM – Besmarts has a concept called a “topology” that maps parameter indices to some higher-level idea of how they’re related to each other (ex bond between particle 1 and 2 is same as 2 and 1)…

  • JM – So should I try to understand splitting code rather than optimization, since we’ll use our own opt?

    • LW – I think roughly yes. What was the deal with the interface that was being discussed at the meeting.

      • JM – Idea was to write a wrapper to read SMARTS/OFFXMLs using OpenFF stack. TG also thought we’d want to make an OE wrapper, but we don’t, and there’s some confusion around aromaticity models that are likely a bug on his end.

      • JW – I think there’s a high likelihood of a molecule processing bug. And if there’s not a bug then we should use the RDKit MDL implementation instead of the OE one.

    • JM – We’ll also want to use the OpenFF ForceField object, which will require a wrapper.

  • Desired outcomes (high to low priority)

    • Want to know fairly rigorously how splitting works, especially gotchas and assumptions, and links to code in BeSMARTS if possible

    • Documenting conceptually how splitting links to ff_optimize code (loosely)

    • Documenting initial guess logic would be good (TG’s guesses seem to beat MSM values and can be applied to torsions)

    • Using this on problems reported by cresset

    • If it becomes clear or already exists, documenting a specification for the splitting behavior.

  • Products

    • Start writing docs in JM’s fork. Focus on function over form, even if it means that changes may not get upstreamed.

    • JM tries scripting up a toy system (maybe debugging Cresset FF issues?) as a means of learning to use BeSMARTS. Would need a pretty well laid-out problem.

      • LW – The current cresset issue is a bit subtle and may not be great here - complicated by similar parameters and symmetry issues.

        • Data:

        • Case 2, torsion drives are a subset of these molecules

      • LW – I actually sent trevor some info about this and I think he’s already running it.

      • JM – I’ll ask TG about this tomorrow, and I’ll work on learning splitting code generally.

  • Agenda for tomorrow

    • Catch up with TG about in-progress t17 fits and see if there’s a way to have JM use it as a learning example

      • Status of source code changes?

    • Find task that will help JM practice learning splitting logic

    •  

    •  

 



Action items

Decisions