Atlassian uses cookies to improve your browsing experience, perform analytics and research, and conduct advertising. Accept all cookies to indicate that you agree to our use of cookies on your device. Atlassian cookies and tracking notice, (opens new window)
DM – For acetonitrile, sicne that’s the bad case we’d looked at first, is it the case that nitriles are all bad?
LW – They didn’t pop up as an outlier when I was previously looking at bad functional groups in molecules.
MS – Should we issue a warning when doing very small molecules? I recently had horrible results for chlorogorm.
CBy – Do we know whether the outliers in the ESP RMSE slide - Are those for polar parts or nonpolar parts? It’s particularly important to get the ESPs around the polar parts right. I expect that SFEs will look fine if the errors are coming from the non-maximal parts of the surface. We could see this by making a scatter plot of ESP points.
BS – Do you know how the espaloma charges were fitted? I know they could have longer distance charge transfer. How were those parameters determined?
LW – Not totally sure for newest version. For the old version they were trained to Am1BCC point charges.
CBy – Ina discussion at CUP 4 or 5 years ago, the idea was to train to predict hardness+electronegativity and then do an equilibration, so they’d be like AM1BCC charges
MG – If you look at the outliers, they’re less than .01 charge - Are these significant?
LW – Mind that these are ESP points, and units are atomic units.
CBy – Again, the maximum and minimum points on the surface are the most important ones to match.
MS – So is this like, “if it’s a small magnitude ESP, it’s less important to match. But if it’s high magnitude, it’s important to match”? Can it be quantified?
CBy – Generally yes. But even the absolute magnitude can be misleading, it’s really the maximum/minimum points on the surface for a particular molecule.
MG – Evidence for this?
CBy – Some work I’d done in the Kollman group - Changing ESP 2% in high magnitude regions has a big effect on calculated properties.
MS – Right, major terms in free energy scale as charge^2, so that seems plausible.
…
DM – So plan to include very small mols in training data is a good idea?
(General) – Yes
Short Vsites update
DM – This looks good, glad to see the training coming out differently here.
JW – The Cl sigma hole vsite distance (1.4 A) still seems kinda big
MS + CBy – That does seem big
BS – Could look a mackerell vsite numbers
DM – DCole had pretty hefty vsites - around 1 e- charge
BS – Since the vsites don’t have LJ params, this could be dangerous/unstable.
DM – Previously I’d modeled sigma holes by putting an opposite charge on the inside of the Cl (on the Cl-C bond). I’d done another experiment with 0-LJ hydrogens where I pulled two acetic acids together very close in OpenMM and checked when they’d become unstable…
CBy – …
DM – So, try fitting with “negative” distance?
MG – If we’re confident that we don’t want the charge so far out, we could regularize to prevent that.
PB – IIRC, you had the best results with chlorine?
LW – Right, This refit was the same as the original fit, since the Cl did so well that we didn’t need to change it. Though it’s possible that this is a flat minimum and we can get similar performance for a number of values.
CBy – Were any of these systems involving water?
LW – No, but did include carbonyl.
LW – So I’ll rerun benchmarks on pyridine vsites and try different starting distance values for Cl vsites
DM – Are tehre things we can do to make invesitating these ideas easier? Should Ir each out to XFF authors for datasets?
LW – That’d be great.
LW – would be good to have way more parameters - TG’s project is promising on this front. But we’re still short of XFF’s scale. For example, is there interest in massively expanding the number of parameters?
MS – I’d expect it to be asymptotic - We probably don’t need huge numbers of parameters, but something like a few more percentage would almost certainly be an improvement. Do TG/BW have suggestions for parameters to split?
CBy – What about split-stage fitting?
LW – Right now everything’s done together - We fit all valence terms to minimum energy geometries.
CBy – Could be large benefit in areas eg. non-ring-specific parameters including strained rings in training, hypervalent sulfur sharing terms with normal sulfur, etc.
Jw – MAybe we could fit on larger scale if we didn’t need to do minimizations in torsion profile fits - Those seem to be the most expensive things.
PB – LPW did some fitting straight to single points in ff15. Did well enough for proteins but I’m still experimenting with small mols.
LW – would this be faster?
PB – Yes, this should make it much faster. Currently torsionprofile fits take ~200 sec
CBy – Did XFF peopel look at other kinda of datasets?
DM – They fragmented a bunch of Chembl compounds. They tried to keep all the unique fragments.
CBy – Are we looking at totally different sorts of targets?
LW – Pretty much the same stuff, just larger scale.
PB – They did include a “combined conformer” thing where, if you have 3 torsions, they add data where they constrain two dihedrals and include a 2-ish D torsion
LW – Yeah, they had kinda “1.5 D” torsiondrives, where multiple torsions were scanned sparsely.
CBy – Do we think they got something from that that we’re missing?
LW – Unsure.
CBy – They have a lot more data and a lot more parameters, but my feeling is that they may be overfitting. I really think we can do better with far fewer parameters.
MS – Looking forward to BW’s comparison of Sage to Espaloma.
DM – Happy to provide additional compute time or undergrads if those are of interest.
LW – So possibilities for future work:
Asking for xtalpi’s dataset
Looking at smee and seeing whether orders of magnitude more data helps
“Do more parameters help?” - BW and LM are working in this direction
“Split stage training” - Should we action this?
DM – Depends on whether we have personnel-time available.