2025-07-08 FF Fitting Meeting

2025-07-08 FF Fitting Meeting

Participants

  • @Jennifer Clark

  • @Chapin Cavender

  • @Jeffrey Wagner

  • @Pavan Behara

  • @David Mobley

  • Bill Swope

  • @Lily Wang

Discussion topics

 

Recording: https://us06web.zoom.us/rec/share/jITr1x4IH506fxXwa2i7kZRXVBBpWo4zik-JlInGbuzuZrSGx7NsLOtKfTYg3ndk.VNyMZSBzeFjeX-Ij

Passcode: h@E8KZ1#

Notes

Notes

JW – IA, MT, and I will be working on infrastructure for having SMIRNOFF proteins in OpenFE calcs (maybe not public-facing, but some blessed infrastructure where we can submit these jobs for our internal benchmarking). We have two weeks of work planned on this for late July/early August

Sage 2.3.0 updates

  • PB – Slide 8 – What does ratio mean?

    • LW – Ratio of chemistries in molecules that have bad props relative to their prevalence in the general dataset

  • PB (10) – 2.3.0rc1 is vdw and valence retrained?

    • LW – Right, valence trained after vdw

  • BS (17) – Chlorides and bromides are places where schrodinger would put a vsite

    • LW – Potentially, alkyl bromides improve in performance across several benchmarks.

  • CC (28) – Why not both equilibrate the slow ones for longer and loosen the restriction on …?

    • LW – We could do both, however we’re GPU-time limited at the moment. Let’s discuss this further at the end

  • DM (43) – If FF already disagrees with enthalpy of mixing, how would changing the experimental mixtures help?

    • LW – My hypothesis is that the signal:noise ratio of having two points with different concs having the same exptl value is confounding the optimization.

    • DM – Is expt usually spaced regularly? (yes) so maybe we could have the automation that selects the points pick the extrema

    • LW – Yes, this would require a little work in evaluator but wouldn’t be too hard.

  • JC (50) – Given that you’re seeing a tradeoff where fitting an outlier pulls lots of other points out, do you think another parameter split would help relieve that?

    • LW – Maybe, but the split would be really hard to encode in smarts since the outlier is so similar to the other points

  • BS(52) – When you change FF a little bit, do you use some info from the last iteration or do you re-equilibrate every time?

    • LW – Previously we didn’t reuse equilibrated boxes from previous iterations. But now we do. Doesn’t actually save much on runtime but the values are more converged.

  • CC (52) – Understand constraints from compute, if that’s the limit then that makes sense. I am in favor of doing more thorough training on primary amines.

    • LW – Yeah, would be in favor of bringing in some members of the validation set into the training set and using some of the slower-equilibrating amines. The added computational expense of the latter is probably worth it.

    • CC – So not saving time …

    • LW – Previous workflow … Now we do longer timescale sims for initial box equil and then save it for subsequent property calcs which can be quicker.

    • CC – So new protocol is nto enough for the primary amines?

    • LW – 10 or 20 ns was not enough time for the primary amines. But these are small enough that we could just let them run longer. One of my big concerns that changes in parameters would require more equil time (?)

  • PB – Equil time is a force field issue, or intrinsic to those molecules?

    • LW – Good Q, I think it’s a starting condition issue. Packmol packs boxes loosely so it takes a long time for density to equilibrate, then enthalpy takes a while after that. But the lack of equil is more evident in densities.

    • LW – Could equilibrate with a different FF. One other option is starting from previous iteration for the next iteration. But evaluator changes take a lot longer than I plan every time, so that’s a bottleneck with our resources.

    • JW – Team cross training might help get more hands/eyes on evaluator performance issues

  • DM – (hydroxyl H something something)

    • CC – GAFF has a 0 radius H, is that so much of a problem?

    • DM – Yes, I showed that certain conditions with benzoic acid dimer can lead to atom overlap/energy explosion. This appears sometimes in simulation. There are some host-guest systems that you just can’t simulate with 0 radius hydroxyls.

    • LW – We did see this going to zero in our fitting, and had to set it back. But if we can train the rest of the FF around this not being allowed to be 0 then we should be ok.

  • LW – Ok, so we’ll loosen restrictions on training set membership and pull some validation set mixtures into training. Regarding compute bottleneck - I’ll do benchmarks on vdW and valence fits, so we’ll know a bit about the amine problem. If those simple changes don’t work, then we’ll move on to bigger projects like … and co-training.

    •  

    •  

    • r

    •  

    •  

    •  

  •  

  •  

  •  

  •  

 

 

Action items

Decisions