2025-03-26 FF fitting meeting

2025-03-26 FF fitting meeting

Participants

  • Bill Swope

  • @Jennifer Clark

  • @Pavan Behara

  • @Chapin Cavender

  • @Trevor Gokey

  • Peter Eastman

  • @Jeffrey Wagner

  • @Lily Wang

  • Barbara Morales

 

Recording: Video Conferencing, Web Conferencing, Webinars, Screen Sharing

Passcode: #tN@1%11

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

QM methods for FF data generation

Peter Eastman

  • PE – Following up from meeting a few weeks ago, was told you were planning to move to a more accurate method eventually. Generally things are a tradeoff between accuracy and cost. Current OpenfF level of theory is fast but inaccurate. All the 3C methods give a nice tradeoff for accuracyt. Omega b97x-3c would be quite accurate. All the 3C methods were designed by starting with an existing method, then make it faster by making basis set faster, then add in corrections where the simplification don’t help. Most accurate would be omegab97x, picks a small basis set, uses ECPs and empliracal corrections, then gets a good accuracy in the end.

  • JC – Where are these supported?

  • PE – Latest version of psi4.

  • LW – PB, do you remember if you paper a few years ago looked at 3c methods?

    • PB – Psi4 didn’t have them implemented at the time. Only HF-3c.

  • BS – What are they trained against?

    • PE – All trained against an existing method and try to make it faster without losing accuracy

    •  

  • PB (in chat): ωB97X-3c: A composite range-separated hybrid DFT method with a molecule-optimized polarized valence double-ζ basis set

  • JC: so the -3c class seems good?

    • PE: good tradeoff between speed and accuracy. But you should run your own benchmarks

  • JC: do they involve transition metals?

    • PE: there’s a particular paper that benchmarks basis sets. I haven’t done much with transition metals, but my understanding is you generally want a bigger basis set than with organics.

    • JC: my review suggests the opposite with some specific functionals, which is interesting.

    •  

  •  

  • JC: presents a Bunch of Slides showing B3LYP works surprisingly well in comparison to other methods

    • PE: you really want one method that you use for everything. You can’t compare energies between different levels, it’s hard to mix and match data. You want to look at both transition metals and organics.

  • PB – Is this in the context of singlepoints or opts…?

    • JC – Good Q - Most of these are comparing opt geos.

    • PB – The TM datasets you’re generating are singlepoints or opts?

    • JC – Probably a mix - right now have some of each. Initial opts were with xtb so we’ll be going through another round of geometry opt with a better method+basis

    • LW – Right, we’ll aim to do opts with a reasonably good level of theory, but then singlepoints with a very accurate level of theory.

  • LW – Overall somewhat surprised that the answer is still B3LYP

  • BS – Torsion scans are really important too. Focusing only on lowest energy structure mightnot make for good training data.

  • CC – Peter, re: you want one level of theory for everything, you could compare relative energies, no? E.g. if X and Y are conformers and A and B are QC methods, it is valid to compare E_A(Y) - E_A(X) with E_B(Y) - E_B(X).

    • PE – Maybe. It’s something you have to be careful about though.

    • JC: even grid spacing would alter ability to compare

    • CC – Agree that there will be some error there, but may be smaller than error from the approximations we’re already making. So we may be able to do different levels of theory for different classes of compounds.

    • JC – Yeah, for training purposes you could do one level of theory across the board, or you could do different levels and hope that things average out. But would require validation to justify choices. But don’t like apply different levels of theory to confs of the same mol.

    • PE: forces should be significantly more robust than energies, which have an arbitrary zero point. It has an absolute meaning in the way that energy doesn’t

  • TG – It’d be interesting to take the review and see if there’s progress on long range corrections (like CAM method). I think that’s a reason omegab97d works well.

  • TG – Also, B3LYP is a favorite because it’s fast and stable (nearly always converges). So. if we change methods, you have to consider cases where another method might not converge for a mol that B3LYP does converge for.

    • JC – Good point

  • CC – If we’re going to spend more compute time on QM, we should consider doing implicit solvent. That may have a bigger effect than better basis.

  • LW – Good point, we had a start on this for NAGL2 dataset, but had some trouble with dataset.

  •  

  •  

  •  

  •  

vdW update

TG

  • JW: what are the odds that the next split is triple-bonded carbons?

    • … (eventually): Sage 2.0 already has as 6X2 type IIRC

  • LW: how much do the parameters change in the first initial fit?

    • TG – They don’t change much but some get set to minimum values

    • LW – Somewhat interested in ones that weren’t fit in sage, like S and F.

    • TG – There aren’t many halogens in training set, so they may need more representaiton for the objective contribution to drive a change.

    • TG – Also seeing oscillatory behavior in subsequent steps where vdW params all go up in one epoch, and down in the next.

  • JC – If it is oscillatory, could that be a metric that tells us that splitting won’t help?

    • TG – I think the correspondance is that it’ll oscillate to converge on some value, but not necessarily that further splitting doesn’t have value.

    •  

    •  

    • .

Equilibration update

LW

  • JW – So this is bigger than first round?

    • LW – Yes

  • BM – What’s the limit for viscosity?

    • LW – 0.3 Pa*s

    • BM – Great, MS and I talked about 0.1 Pa*s, so 0.3 is pretty good

  • BM – Any special treatment for mixtures with water?

    • LW – No

  • BS – When you check for equil, are you checking whether mixtures are miscible, and whether they stay well mixed in sim? I’ve seen sims begin to form globules/seaprated phases

    • LW – I assume that they are miscible since they’re in thermoml. When we were looking into triethanolamine I looked manually and didn’t see phase separations. I’ll check into this.

  • PB – …

  • BM – And all this was just Sage with new ThermoML data?

    • LW – Yes

  • CC – Using NAGL for charge model?

    • LW – For validaiton we’re doing it both with AM1BCC and NAGL.

    •  

  •  

  •  

Action items

Decisions