2023-10-12 Force Field Release Meeting notes

 Date

Oct 12, 2023

 Participants

  • @Pavan Behara

  • @Chapin Cavender

  • @John Chodera

  • @Anika Friedman

  • @Michael Gilson

  • @Trevor Gokey

  • @Alexandra McIsaac

  • @David Mobley

  • @Michael Shirts

  • @Matt Thompson

  • @Jeffrey Wagner

  • @Lily Wang

  • @Brent Westbrook

Slides:

Recording: https://drive.google.com/file/d/1xAvRUaH_XMeVp9muKaZq9Srl9qyEBi1k/view?usp=drive_link

 Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

GNN update

LW

LW will link slides here

  • JW - I’m worried that log(RMSD) < -1 or -2 isn’t informative, so it’s hard to tell if these differences between FFs are meaningful

  • JW - Is the conclusion that the valence refit starting from NAGL is similar to the valence refit starting from Sage?

  • LW - Yes. I’m also trying vanilla NAGL with no refit, but that benchmark is still running

  • MS – Would be interesting to see the difference between running a bunch of small mols, then quantify difference from OE as a function of mol weight.

    • LW – We pulled form a number of databases when we made the training/test data. I profiled the differences in mol weight inside and outside the training set and they are different outside of the size range of the training set.

    • MS – I remember looking at chloroform and seeing that Sage 2.0 got the density quite wrong. I think chloroform and tetrachloromethane were bad. Not sure if due to electrostatics.

    • JC - Does this matter? RMSE to Sage/AM1-BCC is tiny.

    • LW - Yes, the mean difference is small, but we’re worried that some chemistries like cyanide might be particularly bad

    • DM – My position is “is this error sufficiently large to justify retraining the whole model?”

    • JC - Is the geometry of these bad dimers thermodynamically relevant for a FF?

    • LW - I don’t know. I think these are high energy conformers.

    • JC - It might be better to sample conformers from an MD simulation of the dimers. Could also check reaction field energies with ZAP.

    • JW - This seems like a bad problem to me. The NAGL vs OpenEye charges are very different.

    • LW - NAGL does better on nitriles in larger molecules

    • MS - Small molecules might be too small too have a meaningful graph for the message passing algorithm

    • CC – As a workaround, could we do a cutoff to run explicit AM1BCC on small mols?

    • LW - Yes, but that gives up on the goal of having a self-consistent solution to replace OpenEye/AmberTools

    • MS – Would like to see how much this matters for solvation free energies

    • LW – Yeah, would like to do this with solvents other than water.

    • JW - I still think it’s useful to retrain with small molecules. How much human time would this take?

    • LW - Probably an hour. Mostly computer time, probably a week to train and more weeks to benchmark.

    • MG - If MS’s concern that small molecules don’t have a big enough graph is true, retraining on small molecules may not help much

    • MS – Wonder why it has trouble with small molecules when big molecules are OK. Were there small mols in the training set?

    • LW – We’d filtered them out, following the method of Riniker. We wanted to make sure we got enough chemical complexity to make the model useful.

    • MG - Does the Riniker method work for small molecules?

    • LW - I don’t know. We could ask her.

    • MT - Are there any GCNN methods that can handle small molecules up to biopolymers?

    • MS – Not sure. Have we tried espaloma-charge?

    • JC – Make sure to use espaloma-0.3

    • PB – How is comparison to interaction energies?

    • LW – Error is high, there are factors in QM that make the dimer geometries very high energy in MM

  • LW – Action items:

    • Give ZAP a try

    • See how espaloma-0.3 performs on small molecules

    • See how riniker method performs on small molecules

    • Solvation free energy benchmarks

    • Get a dataset of training to small molecules

  • JW - I don’t know why we need to check these things to decide to retrain NAGL on small molecules, which seems like less work.

    • LW – The benchmarking of other methods are less work than retraining NAGL with small mols in the training set.

  • CC – If the riniker method and espaloma perform poorly on small mols, what’s the plan then?

    • LW – If neither of them perform well, then we have to ask whether small mols aren’t well-suited for graph methods.

VSite update

LW

LW will link slides here

  • MG – “Training data” = “vsites were trained on this”?

    • LW – Vsites were trained to ESPs, then we refit vdW parameters to the mols with vsites.

    • MG – So the retraining with a better potential that includes virtual sites makes it harder to fit the LJ parameters.

    • LW – The left two plots don’t have vsites, but the rightmost plot has vsites and ?underwent retraining?

  • JW - Slide “Directions”, what are the colors on these plots?

    • LW - Identity of the dimer

    • LW – Takeaway is that all the bad points have non-aromatic nitrogens

  • MS – So, by introducing vsites, we took errors that were quite small, and made them very big. And what we’re seeing in all of these is that the mixture energies become a lot more favorable compared to pure compounds.

  • MG – It sounds like what may be happening is that, if you have charges too close to the LJ envelope, you get interactons that are too strong.

  • DM – But when you do this, …

  • MG – What are the magnitudes of the vsites and how far are they from the atom centers?

    • LW – ~0.3 A.

    • MG – Hm, that’s still pretty close to the parent.

    • LW – magnitude >1 electron charge

    • (General) – That’s quite large

    • DM – Where did prior for that come from?

      • LW – DCole’s work.

    • DM – We want something like BCCs that moves a small fraction of the charge from the parent atom to the virtual site

    • MS – The Mixing favorability that we see would be because of the large dipole that forms.

    • MG – When you add the vsite, does the fit to QM ESP improve significantly?

      • LW – Yes, it improves a lot.

    • DM – I wonder if there’s a distant shell in the QM potential where we’d catch how incorrect this is

      • MS – Yeah, that could catch bad dipole energies.

    • MG – Could be good to check against dipole moment/include it in fitting to keep things in check.

    • MG – Does DCole use dipole moment constraints?

  • DM - Are all LJ types being fit?

    • LW - All LJ types that show up in these molecules

  • DM - Slide “H_mix benchmarks: Cl”. This is great. If we can fit to dimer energies and do well on physical properties, that’s better than fitting to physical proprties.

  • DM - Since Br is worse than Cl, is the charge magnitude of the virtual site larger for Br?

    • LW – Br and Cl vsite magnitudes are small - 0.06 and 0.07 respectively

  • LW – Some interesting ideas in cgenff for improving vsite fitting that we could look at.

  • (Future work slide)

    • DM – This largely looks good. I’d also add the checks on vsite charge magnitude/includion of dimer data to keep things on track. It may be something that doesn’t appear in single mol/dimer QM, but it would appear in a solvent sim.

  •  

XFF preprint

LW

LW will link slides here

  • JW - Are they using GAFF with RESP, i.e. the “correct” way?

    • LW - Yes for Figures 1 and 2 but AM1-BCC for Figure 3 since the latter was pulled from QCPortal

  • JW - Surprised how well XtalPi did given that they didn’t change nonbonded parameters from GAFF

  • JW - Can we run their BFE benchmark with our force fields?

    • (General) - We can ask and should pursue this

  • DM – I wonder how much of their improvement came from their number of parameters vs the amount of their training data.

    • LW – They’re using about 100x the training data we are. And ours isn’t as systematic as fragmenting from Chembl.

    • JC – I think the current issue is the sparsity of data that we have, and the fact that our fitting to optimizations requires us to rerun MM optimizations. I think we could be using non-optimum geometries.

    • MS – This does beg the question of whether we should be considering having ~50% more parameters. Looking at the number of paramters vs. data, are we over/underdetermined in our current fitting?

      • LW – I think things are pretty well balanced for now.

      • JC – Could TG’s method be used to split every parameter into, say, 2 parameters with roughly equal population?

      • MS – One thing we were gonna do was to compare espaloma to some of these to see if we saw parameters with big distributions. Did that ever happen?

      • LW – BW’s been working on the comparison to espaloma parameter distribution.

      • JC – We did the experiment where we fitted an espaloma model to the Sage training set and showed that it did quite well (?)

      • MS – Espaloma-guided parameter splitting could be really useful.

      • BW – No conclusions yet, no clear bimodal distribution. There are cases where the espaloma value distribution isn’t even centered around the sage values, which are interesting. But more work to do before I report back.

      • LW – Another thing BW’s been working on is splitting out some torsions to cover more chemistries.

      • PB – We have gen3 dataset where Hyesu and Simon combinatorially combined functional groups and did torsion scans about central bonds. Also, for S and N, we can clearly bin chemistries that go into tetrahedral vs. trigonal planar. And can look at functional group-wise parameter assignment, and use that to guide making things tetrahedral vs. planar.

      • LW – With the way some of the SMIRKS are currently specified, not all the parameters have enough data to get trained well, so in some directions we are actually short on data.

    • MG – So broadly, would you say there’s nothing fundamentally new in the paper? Just more data and parameters?

      • LW – Their training is done somewhat differently. And we start with mod seminario whereas they end with it. And some difference in fitting torsions using energy vs. geometry

      • PB – They’re just using MSM derived values for bonds and angles, and…

    • DM – One thing I want to figure out is which specific experiments we want to do as a followup. Eg. I think BW’s work to make torsions better split would be valuable.

    • JC – We should make larger datasets comparable to what they did. Eg. Optimization trajectories.

      • LW – Agree, we should have larger training sets available.

      • PB – We’re stuck with forcebalance fitting, which takes weeks for protein FF and days for small molecule.

      • JW – I think that we should have the compute power to make larger datasets once I’m done reviving QCSubmit. We should strongly consider a higher-throughput fitting tool (“forcebalance replacement”) for the next roadmap year.

  •  

 

 

 

 

 

 

 

 

 

 Action items

 Decisions