2023-10-26 Force Field Release Meeting notes

 Date

Oct 26, 2023

 Participants

  • @Lily Wang

  • @Michael Shirts

  • Bill Swope

  • @Brent Westbrook (Unlicensed)

  • @Chapin Cavender

  • @David Mobley

  • @Alexandra McIsaac

  • @Pavan Behara

  • @Jeffrey Wagner

  • @Trevor Gokey

  • @Michael Gilson

  • @Anika Friedman

  • @Christopher Bayly

  • @Willa Wang

Recording: https://docs.google.com/presentation/d/1nSz_Um-mFuntYz4NKAMleydzxMXHQ9jEr9lQuCRyYTM/edit?usp=drive_link

 Discussion topics

Item

Notes

Item

Notes

Short GNN update

 

  • DM – For acetonitrile, sicne that’s the bad case we’d looked at first, is it the case that nitriles are all bad?

    • LW – They didn’t pop up as an outlier when I was previously looking at bad functional groups in molecules.

    • MS – Should we issue a warning when doing very small molecules? I recently had horrible results for chlorogorm.

  • CBy – Do we know whether the outliers in the ESP RMSE slide - Are those for polar parts or nonpolar parts? It’s particularly important to get the ESPs around the polar parts right. I expect that SFEs will look fine if the errors are coming from the non-maximal parts of the surface. We could see this by making a scatter plot of ESP points.

  • BS – Do you know how the espaloma charges were fitted? I know they could have longer distance charge transfer. How were those parameters determined?

    • LW – Not totally sure for newest version. For the old version they were trained to Am1BCC point charges.

    • CBy – Ina discussion at CUP 4 or 5 years ago, the idea was to train to predict hardness+electronegativity and then do an equilibration, so they’d be like AM1BCC charges

  • MG – If you look at the outliers, they’re less than .01 charge - Are these significant?

    • LW – Mind that these are ESP points, and units are atomic units.

  • CBy – Again, the maximum and minimum points on the surface are the most important ones to match.

    • MS – So is this like, “if it’s a small magnitude ESP, it’s less important to match. But if it’s high magnitude, it’s important to match”? Can it be quantified?

    • CBy – Generally yes. But even the absolute magnitude can be misleading, it’s really the maximum/minimum points on the surface for a particular molecule.

    • MG – Evidence for this?

    • CBy – Some work I’d done in the Kollman group - Changing ESP 2% in high magnitude regions has a big effect on calculated properties.

    • MS – Right, major terms in free energy scale as charge^2, so that seems plausible.

  • DM – So plan to include very small mols in training data is a good idea?

    • (General) – Yes

  •  

  •  

  •  

  •  

  •  

Short Vsites update

  • DM – This looks good, glad to see the training coming out differently here.

  • JW – The Cl sigma hole vsite distance (1.4 A) still seems kinda big

    • MS + CBy – That does seem big

    • BS – Could look a mackerell vsite numbers

    • DM – DCole had pretty hefty vsites - around 1 e- charge

    • BS – Since the vsites don’t have LJ params, this could be dangerous/unstable.

      • CBy – Agree

    • PB – MacKerell paper has a sigma hole vsite at 1.64A https://www.sciencedirect.com/science/article/abs/pii/S0968089616304576

    • (General) – Weird, that seems really big

    • DM – Previously I’d modeled sigma holes by putting an opposite charge on the inside of the Cl (on the Cl-C bond). I’d done another experiment with 0-LJ hydrogens where I pulled two acetic acids together very close in OpenMM and checked when they’d become unstable…

    • CBy – …

    • DM – So, try fitting with “negative” distance?

    • MG – If we’re confident that we don’t want the charge so far out, we could regularize to prevent that.

  • PB – IIRC, you had the best results with chlorine?

    • LW – Right, This refit was the same as the original fit, since the Cl did so well that we didn’t need to change it. Though it’s possible that this is a flat minimum and we can get similar performance for a number of values.

  • CBy – Were any of these systems involving water?

    • LW – No, but did include carbonyl.

  • LW – So I’ll rerun benchmarks on pyridine vsites and try different starting distance values for Cl vsites

XFF discussion (cont)

  • PB – smee repo is here: https://github.com/SimonBoothroyd/smee/blob/main/examples/parameter-gradients.ipynb

  • DM – Are tehre things we can do to make invesitating these ideas easier? Should Ir each out to XFF authors for datasets?

    • LW – That’d be great.

  • LW – would be good to have way more parameters - TG’s project is promising on this front. But we’re still short of XFF’s scale. For example, is there interest in massively expanding the number of parameters?

    • MS – I’d expect it to be asymptotic - We probably don’t need huge numbers of parameters, but something like a few more percentage would almost certainly be an improvement. Do TG/BW have suggestions for parameters to split?

  • CBy – What about split-stage fitting?

    • LW – Right now everything’s done together - We fit all valence terms to minimum energy geometries.

    • CBy – Could be large benefit in areas eg. non-ring-specific parameters including strained rings in training, hypervalent sulfur sharing terms with normal sulfur, etc.

  • Jw – MAybe we could fit on larger scale if we didn’t need to do minimizations in torsion profile fits - Those seem to be the most expensive things.

    • PB – LPW did some fitting straight to single points in ff15. Did well enough for proteins but I’m still experimenting with small mols.

    • LW – would this be faster?

    • PB – Yes, this should make it much faster. Currently torsionprofile fits take ~200 sec

  • CBy – Did XFF peopel look at other kinda of datasets?

    • DM – They fragmented a bunch of Chembl compounds. They tried to keep all the unique fragments.

    • CBy – Are we looking at totally different sorts of targets?

    • LW – Pretty much the same stuff, just larger scale.

    • PB – They did include a “combined conformer” thing where, if you have 3 torsions, they add data where they constrain two dihedrals and include a 2-ish D torsion

    • LW – Yeah, they had kinda “1.5 D” torsiondrives, where multiple torsions were scanned sparsely.

    • CBy – Do we think they got something from that that we’re missing?

    • LW – Unsure.

    • CBy – They have a lot more data and a lot more parameters, but my feeling is that they may be overfitting. I really think we can do better with far fewer parameters.

    • MS – Looking forward to BW’s comparison of Sage to Espaloma.

  • DM – Happy to provide additional compute time or undergrads if those are of interest.

  • LW – So possibilities for future work:

    • Asking for xtalpi’s dataset

    • Looking at smee and seeing whether orders of magnitude more data helps

    • “Do more parameters help?” - BW and LM are working in this direction

    • “Split stage training” - Should we action this?

      • DM – Depends on whether we have personnel-time available.

      • LW – Should be fairly straightforward.

 

 

 

 

 

 

 Action items

 Decisions