2025-03-05 BeSMARTS Check-in meeting notes

Participants

@Trevor Gokey
@Lily Wang
@Josh Mitchell
@Jeffrey Wagner

Recording of last 40 mins: https://drive.google.com/file/d/1tNiAFmYACNutDRsFjGb1vljLyuXD0ndO/view?usp=sharing

Discussion topics

Notes

Notes
Catch up with TG about in-progress t17 fits and see if there’s a way to have JM use it as a learning example Status of source code changes? Those are merged now and are available on pypi (TG: not tested well!) TG - Two lines of progress happening here The t17 fit looks like example 11, just different data and hyperparameters. The energy fitting stuff should be generally workingj. With appropriate driver you should be able to fit energies similar to what CC is doing. Fitting to torsiondrives doesn’t do it the old “forcebalance” style way where it does lots of MM minimizations, instead it does the “chapin” way where the torsiondrive confs are just treated as single point targets JM – TG – JM – That uses ff_optimize? TG – Yes Secnd line of progress is fitting LJ parameters using a different objective function and SAPT (less OpenFF infra here) LW – Does code look like t17 example? TG – In terms of actual fitting it’s the same. But lots of code needed to format/check datasets. Need to prune/parse/filter things into a graph DB, the process is fairly dataset-specific right now. Same sort of path to parse for torsion fits and for nonbonded fits. JM – Addl docs for ff_optimize? I opened a PR to add some. TG – Would need to refer to code mostly. I have changes I’d like to see in docs, will relay those. JM – I think we’ll need that-ish level of documentation through whole function to get it to a transferrable state. JW – Some pairing sessions to go through together and help JM make an outline could be useful here. TG – Trying to think about the best use of time moving forward… Two levels of this: A sort of “document everything to the point of being able to fix/maintain/develop it” push (original goal) More pointed user guide style docs/examples JM – I think documentation of ff_optimize is necessary in either case. I’d be able to progress this much faster on a pairing call. TG – Re types of docs - should be type hinted everywhere, describing what arguments mean and what shape they should be - JM – That’s great. Also in forceFieldOptimizaitonStrategy - lots of attributes that could use documentation for users TG – Kinda, some of this is evident from examples LW – Would be good to have additional info to go beyond examples. TG – It’s worth noting that a lot of these things are not set in stone - hard to make statements about what things mean when things move so much JM - Could be handy to identify a subset of things as being “in the API” and holding them more constant, while letting other things change JM – Some question as to whether we want to focus on ff_optimize … TG – ff_optimize is hard-coded to use scipy, openmm in some places and native in others, etc. So very specifically designed for my use cases. LW – We eventually want to use BeSMARTS with our fitting machinery, but if it’s still in flux we might not want to tackle it immediately. TG – Uncertain about stability of ff_optimize. Also some hard-codedness in objective definition/evaluation. Find task that will help JM practice learning splitting and clustering logic TG – High level view is that the touchpoint is the splitter config and extender config, like examples 7 and 11. Then turn those into SMARTS and you should be good to go. Could add more to readthedocs here. The issue here is that there’s no code on OpenFF side to handle candidates that come out. JM + LW – Could link into existing functionality that takes a candidate FF and returns an objective function…. But depends on what we want here. … generate_candidates is used once in middle of ff_optimize function… (see recording) …
Initially (goal for the end of this effort): Understand BeSMARTS well enough to move to “Eventually” Eventually (mainline FF fitting in 2026): Clustering Splitting Initial value estimation `for iteration in range(10): candidate_ffs = besmarts.propose_splits`
Idea for starting OpenFF wrapper “top down” - Write function that calls ff_optimize, discover pain points, write more functions to alleviate pain points. JM – Would want guidance from LW on example inputs, general types of inputs TG – There are inputs in the examples, if you want a new type of input you’ll need to make a parser for those yourself. LW – Slightly different persepctive/approach - might be useful if I gave you a set of inputs, almost a science project that I’d like you to execute using beSMARTS and smee - similar to but not the t17 thing. JM – That’d work, and I could use that to guide implementing a wrapper. JM – Or could implement an API specified by LW. LW – I don’t think I have the experience with BeSMARTS and smee to define this right now. TG – Closest thing I have to an API is core/graph/configs.py. If you wanted to take a bottom-up approach to defining API this might be a good place to start.
TG – PR feedback The strategy with docstrings currently should be to leave the types documented, but not try to speculate about function too much and leave todos. JM – Hard to work with concepts like ChemicalSystem (which are analagous to our ForceField) without noting that in docs. Also the TODOs/questions are for you in reviewing, and you can answer them or remove them. TG – I’m time constrained right now but generally would advise you to remove those. … Future meetings will be mostly dedicated to TG + JM pairing, esp on PR review
Next steps TG+JM will meet next week to do rapid docs PR review/line-by-line LW will send JM inputs for a small science project, using BeSMARTS and smee (will put up as a GH repo) JM will write an OpenFF wrapper that fits the needs of the science project above

Meetings

2025-03-05 BeSMARTS Check-in meeting notes

Analytics

Participants

Discussion topics

Decisions