QM methods for FF data generation | Peter Eastman | PE – Following up from meeting a few weeks ago, was told you were planning to move to a more accurate method eventually. Generally things are a tradeoff between accuracy and cost. Current OpenfF level of theory is fast but inaccurate. All the 3C methods give a nice tradeoff for accuracyt. Omega b97x-3c would be quite accurate. All the 3C methods were designed by starting with an existing method, then make it faster by making basis set faster, then add in corrections where the simplification don’t help. Most accurate would be omegab97x, picks a small basis set, uses ECPs and empliracal corrections, then gets a good accuracy in the end. JC – Where are these supported? PE – Latest version of psi4. LW – PB, do you remember if you paper a few years ago looked at 3c methods? BS – What are they trained against? PB (in chat): ωB97X-3c: A composite range-separated hybrid DFT method with a molecule-optimized polarized valence double-ζ basis set JC: so the -3c class seems good? JC: do they involve transition metals? PE: there’s a particular paper that benchmarks basis sets. I haven’t done much with transition metals, but my understanding is you generally want a bigger basis set than with organics. JC: my review suggests the opposite with some specific functionals, which is interesting.
JC: presents a Bunch of Slides showing B3LYP works surprisingly well in comparison to other methods PB – Is this in the context of singlepoints or opts…? JC – Good Q - Most of these are comparing opt geos. PB – The TM datasets you’re generating are singlepoints or opts? JC – Probably a mix - right now have some of each. Initial opts were with xtb so we’ll be going through another round of geometry opt with a better method+basis LW – Right, we’ll aim to do opts with a reasonably good level of theory, but then singlepoints with a very accurate level of theory.
LW – Overall somewhat surprised that the answer is still B3LYP BS – Torsion scans are really important too. Focusing only on lowest energy structure mightnot make for good training data. CC – Peter, re: you want one level of theory for everything, you could compare relative energies, no? E.g. if X and Y are conformers and A and B are QC methods, it is valid to compare E_A(Y) - E_A(X) with E_B(Y) - E_B(X). PE – Maybe. It’s something you have to be careful about though. JC: even grid spacing would alter ability to compare CC – Agree that there will be some error there, but may be smaller than error from the approximations we’re already making. So we may be able to do different levels of theory for different classes of compounds. JC – Yeah, for training purposes you could do one level of theory across the board, or you could do different levels and hope that things average out. But would require validation to justify choices. But don’t like apply different levels of theory to confs of the same mol. PE: forces should be significantly more robust than energies, which have an arbitrary zero point. It has an absolute meaning in the way that energy doesn’t
TG – It’d be interesting to take the review and see if there’s progress on long range corrections (like CAM method). I think that’s a reason omegab97d works well. TG – Also, B3LYP is a favorite because it’s fast and stable (nearly always converges). So. if we change methods, you have to consider cases where another method might not converge for a mol that B3LYP does converge for. CC – If we’re going to spend more compute time on QM, we should consider doing implicit solvent. That may have a bigger effect than better basis. LW – Good point, we had a start on this for NAGL2 dataset, but had some trouble with dataset.
|
vdW update | TG | JW: what are the odds that the next split is triple-bonded carbons? LW: how much do the parameters change in the first initial fit? TG – They don’t change much but some get set to minimum values LW – Somewhat interested in ones that weren’t fit in sage, like S and F. TG – There aren’t many halogens in training set, so they may need more representaiton for the objective contribution to drive a change. TG – Also seeing oscillatory behavior in subsequent steps where vdW params all go up in one epoch, and down in the next.
JC – If it is oscillatory, could that be a metric that tells us that splitting won’t help?
|