2025-08-12 FF Fitting Meeting

2025-08-12 FF Fitting Meeting

  1. @Lily Wang

  2. @Christopher Bayly

  3. @Michael Shirts

  4. Bill Swope

  5. @Chapin Cavender

  6. @David Mobley

  7. @Jennifer Clark

  8. Julia Rice

  9. @Julianne Hoeflich

  10. @Pavan Behara

  11. @Jeffrey Wagner

Discussion topics

Video Conferencing, Web Conferencing, Webinars, Screen Sharing

Passcode: wUp7&pZ7

 

Notes

Notes

LW – 2.3.0 refit is ongoing, about 1/3rd done. Expect it to be ready after I return from leave in two weeks

Minimal torsion fits – MS

  • MS – Alkane work showed some issues with torsions. We added extra QC data but torsions still weren’t matching QM.

  • JH – Torsions did move, went to expected profile (we had been underestimating gauche population). But what didn’t work was when I took alkane QC data we’d generated, and threw it in with regular training pipeline iwth sage 2.2 data. That got population wrong. But when I only trained to the new alkane data (no sage 2.2 data) it got better.

  • MS – https://docs.google.com/document/d/1gYrOclF1Ffq6UWj27eVeEeA6wgyK_W7gm31XinbKqh4/edit?usp=sharing

  • LW – smee vs. FB - DC tried fits starting with k=0 - Do you think this would have as much of an effect with smee?

    • MS – This hypothesis says that if FB is getting stuck in minima and FB is not… then …

    • LW –

    • MS – Did we run on that dataset? Does smee do better than sage 2.3?

    • DM – About comparable.

    • CBy – What makes you think we’re stuck in a local minimum? Are you confident that this is a multiple-minimum problem?

      • MS: not entirely, when we look, we see some alkanes have really large energies because CCCC torsions are mixed up with rings. Maybe what I’m saying is it’s the wrong minimum. This strategy seems to get our alkanes wrong and that’s a problem.

      • MS – Propose starting form parameters where alkanes are “right”, then throw in the whole dataset, and see if it moves back to the “bad” region we’re seeing

      • CBy: maybe explanation is single parameter (CCCC) needs to be split, e.g. in-ring, not in-ring. Other way to approach this is to determine whether there’s a symptom that indicates whether it needs to be split

      • LW – Finlay ran an experiment that looked into this. He split torsions into 1000s of individual parameters. Didn’t see much improvement. Possible that he didn’t split enough (only looked at atom bond primitives, not eg ring membership)

      • CC – Similar conclusion from TG’s work on besmarts - It’s highly nontrivial to figure out what right splits are to improve performance

      • DM – Since beginning there’s been interest in starting from very simple molecules and build up. PB led one effort at it, we haven’t ever really gotten over the hurdle to making the minimal molecule datasets.

      • CBy – One thing I used for BCC work (to decide when to split BCC params) is to look at whether groups of molecules drove the parameter gradients in opposite directions. This indicated that a split was favorable. TG says he couldn’t get this to work early on in his PhD.

      • MS: in theory that data is in Evaluator.

    • MS – Kinda two questions: One in general about training, another specifically about how we’ll deal with observed problems with alkane params. CC also saw problems with alkane params.

      • CC – Right

    • MS – … This feels like multiple minima problem. JH could rerun an opt using an optimized alkane FF as a starting point using the whole Sage 2.2 dataset and see if it moves alkane params back to being bad.

    • JR – Energy difference between alkane minima? Are we in the range of 0.6 kcal/mol (kT)? Quantum won’t have that. Properties of alkanes will have temperature dependence. Also there’s polymer persistence length, which should be pretty small for alkanes, which should be pretty small (6A ish), so there are some things that you’ll expect to see there.

      • MS: we’ve also looked as alkane properties as function of chain length. We have tried a number of things, in some cases getting closer to QM sent things in the wrong direction. It looks like we’ll need to co-tune the vdW.

      • MS – Re kT, when dealing with bulk, you’ll see bigger effects.

      • JR – Means it’s important to look at dispersion component.

      • MS – JH is looking into this, with MT and LW’s help.

  • MS: Could also look into how some of the smee-fit FFs do on these alkane profiles.

    • JR: QM may not be the right thing to look into becasue it doesn’t have temperature.

    • MS – Yeah, condensed phase data includes T though, and we use that.

  • MS – So next steps could be

    • Run some benchmarking on lipid torsiondrives

      • MS – … could be this ins’t the right experiment to do… but good to see if it works on QM torsiondrives… If we have a general procedure that has trouble on common functionality that’s important.;.. or procedure needs to be f

      • LW – smee-fit FFs were fit to different level of theory (SPICE2)

      • JR – I’d fit to quantum… Getting relative energies right, cis-trans

      • (general) SPICE dataset is QM

      • PB – I think wb97M-d3bj/def2-tzvppd

    • CBy – With alkanes and larger training set, is steric congestion an issue? I ask because, with alkanes and their torsions, you start to get a lot of 1-5 and 1-6 energies, and that is steep with the r^12 component. And you’ll see that more with gauche confs… I wonder if there are other things that interact with the fit and… Should we split out some torsions/data for trans vs. gauche? Wondering how to test hypothesis. This is a general problem with bespoke fitting, where the results end up not being very generalizable.

    • MS – FFs should be weighted toward things that come up a lot

    • CC – There are other FFs that get alkanes right without needing to scale 1-5s. Question is why aren’t we getting that solution with our optimizer?

    • CB – Will MS’s hypothesis distinguish between a multiple-minimum problem vs. a prior-restraint problem?

      • MS + JH – JH weakened prior by factor of ~100 and it didn’t help.

      •  

  • BS: if you split this into two different CCCC torsions? What patterns would you use?

    • CBy: would look at gradients to split population in half

  • JW: it sounds like we do well on pure alkane data if we train only to it.

    • (mixed discussion) CBy: you could also try titrating data in to evaluate multiple minima.

    • (mixed discussion on experiment hypotheses)

    • (in chat): As a sanity check, what if we added a torsion even more specific to alkanes than "[#6X4:1]-[#6X4:2]-[#6X4:3]-[#6X4:4]"  - Like expecting an additional [#6X4] on either side - and then did a retraining to (sage 2.2 dataset + alkane dataset)? This would help determine whether something in the sage training set is "contaminating" the parameter that we think should only be hitting pure alkanes

    • MS: I would guess that cyclohexane is really messing things up.

    • DM: you could ensure you’re just capturing non-ring bonds, either by excluding it from the training data, or by specifying the SMIRKS.

    • MS – We can do that expt after determining whether the opt is getting “stuck” Possible that torsiondrives have really high barriers that are messing things up.

  • JR: wouldn't the angle term be highly coupled to the torsion term for the rings?

    • MS: yes

  • CBy: for brainstorming new torsion types, you could also try regulating substitution around the central bond carbons. Demand they be [#6X4H2], and could regulate substituents at the end to ensure there aren’t weird sterics happening.

    • JW – Agree

  • MS (shows alkane torsion energy profiles, see recording ~45 mins)

    • CC: ForceBalance attenuates torsion weights above a particular barrier

  • JH – (shows torsion energy profiles, see recording ~47 mins) – FF fit to alkanes improves energies

    • CBy – Mols in sage 2.2 training set look quite congested.

    • MS – So this may suggest we should start fits without sterically congested mols.

    • … (see recording for remainder of meeting)

    •  

    •  

    •  

    •  

    •  

    •  

    •  

    •  

 

 

 

 

Action items

Decisions