DM – For acetonitrile, sicne that’s the bad case we’d looked at first, is it the case that nitriles are all bad?
LW – They didn’t pop up as an outlier when I was previously looking at bad functional groups in molecules.
MS – Should we issue a warning when doing very small molecules? I recently had horrible results for chlorogorm.
CBy – Do we know whether the outliers in the ESP RMSE slide - Are those for polar parts or nonpolar parts? It’s particularly important to get the ESPs around the polar parts right. I expect that SFEs will look fine if the errors are coming from the non-maximal parts of the surface. We could see this by making a scatter plot of ESP points.
BS – Do you know how the espaloma charges were fitted? I know they could have longer distance charge transfer. How were those parameters determined?
LW – Not totally sure for newest version. For the old version they were trained to Am1BCC point charges.
CBy – Ina discussion at CUP 4 or 5 years ago, the idea was to train to predict hardness+electronegativity and then do an equilibration, so they’d be like AM1BCC charges
MG – If you look at the outliers, they’re less than .01 charge - Are these significant?
LW – Mind that these are ESP points, and units are atomic units.
CBy – Again, the maximum and minimum points on the surface are the most important ones to match.
MS – So is this like, “if it’s a small magnitude ESP, it’s less important to match. But if it’s high magnitude, it’s important to match”? Can it be quantified?
CBy – Generally yes. But even the absolute magnitude can be misleading, it’s really the maximum/minimum points on the surface for a particular molecule.
MG – Evidence for this?
CBy – Some work I’d done in the Kollman group - Changing ESP 2% in high magnitude regions has a big effect on calculated properties.
MS – Right, major terms in free energy scale as charge^2, so that seems plausible.
…
DM – So plan to include very small mols in training data is a good idea?
(General) – Yes
Short Vsites update
DM – This looks good, glad to see the training coming out differently here.
JW – The Cl sigma hole vsite distance (1.4 A) still seems kinda big
MS + CBy – That does seem big
BS – Could look a mackerell vsite numbers
DM – DCole had pretty hefty vsites - around 1 e- charge
BS – Since the vsites don’t have LJ params, this could be dangerous/unstable.
DM – Previously I’d modeled sigma holes by putting an opposite charge on the inside of the Cl (on the Cl-C bond). I’d done another experiment with 0-LJ hydrogens where I pulled two acetic acids together very close in OpenMM and checked when they’d become unstable…
CBy – …
DM – So, try fitting with “negative” distance?
MG – If we’re confident that we don’t want the charge so far out, we could regularize to prevent that.
PB – IIRC, you had the best results with chlorine?
LW – Right, This refit was the same as the original fit, since the Cl did so well that we didn’t need to change it. Though it’s possible that this is a flat minimum and we can get similar performance for a number of values.
CBy – Were any of these systems involving water?
LW – No, but did include carbonyl.
LW – So I’ll rerun benchmarks on pyridine vsites and try different starting distance values for Cl vsites
DM – Are tehre things we can do to make invesitating these ideas easier? Should Ir each out to XFF authors for datasets?
LW – That’d be great.
LW – would be good to have way more parameters - TG’s project is promising on this front. But we’re still short of XFF’s scale. For example, is there interest in massively expanding the number of parameters?
MS – I’d expect it to be asymptotic - We probably don’t need huge numbers of parameters, but something like a few more percentage would almost certainly be an improvement. Do TG/BW have suggestions for parameters to split?
CBy – What about split-stage fitting?
LW – Right now everything’s done together - We fit all valence terms to minimum energy geometries.
CBy – Could be large benefit in areas eg. non-ring-specific parameters including strained rings in training, hypervalent sulfur sharing terms with normal sulfur, etc.
Jw – MAybe we could fit on larger scale if we didn’t need to do minimizations in torsion profile fits - Those seem to be the most expensive things.
PB – LPW did some fitting straight to single points in ff15. Did well enough for proteins but I’m still experimenting with small mols.
LW – would this be faster?
PB – Yes, this should make it much faster. Currently torsionprofile fits take ~200 sec
CBy – Did XFF peopel look at other kinda of datasets?
DM – They fragmented a bunch of Chembl compounds. They tried to keep all the unique fragments.
CBy – Are we looking at totally different sorts of targets?
LW – Pretty much the same stuff, just larger scale.
PB – They did include a “combined conformer” thing where, if you have 3 torsions, they add data where they constrain two dihedrals and include a 2-ish D torsion
LW – Yeah, they had kinda “1.5 D” torsiondrives, where multiple torsions were scanned sparsely.
CBy – Do we think they got something from that that we’re missing?
LW – Unsure.
CBy – They have a lot more data and a lot more parameters, but my feeling is that they may be overfitting. I really think we can do better with far fewer parameters.
MS – Looking forward to BW’s comparison of Sage to Espaloma.
DM – Happy to provide additional compute time or undergrads if those are of interest.
LW – So possibilities for future work:
Asking for xtalpi’s dataset
Looking at smee and seeing whether orders of magnitude more data helps
“Do more parameters help?” - BW and LM are working in this direction
“Split stage training” - Should we action this?
DM – Depends on whether we have personnel-time available.