MS – Would be interesting to see the difference between running a bunch of small mols, then quantify difference from OE as a function of mol weight.
LW – We pulled form a number of databases when we made the training/test data. I profiled the differences in mol weight inside and outside the training set and they are different outside of the size range of the training set.
MS – I remember looking at chloroform and seeing that Sage 2.0 got the density quite wrong. I think chloroform and tetrachloromethane were bad. Not sure if due to electrostatics.
JC - Does this matter? RMSE to Sage/AM1-BCC is tiny.
LW - Yes, the mean difference is small, but we’re worried that some chemistries like cyanide might be particularly bad
DM – My position is “is this error sufficiently large to justify retraining the whole model?”
JC - Is the geometry of these bad dimers thermodynamically relevant for a FF?
LW - I don’t know. I think these are high energy conformers.
JC - It might be better to sample conformers from an MD simulation of the dimers. Could also check reaction field energies with ZAP.
JW - This seems like a bad problem to me. The NAGL vs OpenEye charges are very different.
LW - NAGL does better on nitriles in larger molecules
MS - Small molecules might be too small too have a meaningful graph for the message passing algorithm
CC – As a workaround, could we do a cutoff to run explicit AM1BCC on small mols?
LW - Yes, but that gives up on the goal of having a self-consistent solution to replace OpenEye/AmberTools
MS – Would like to see how much this matters for solvation free energies
LW – Yeah, would like to do this with solvents other than water.
JW - I still think it’s useful to retrain with small molecules. How much human time would this take?
LW - Probably an hour. Mostly computer time, probably a week to train and more weeks to benchmark.
MG - If MS’s concern that small molecules don’t have a big enough graph is true, retraining on small molecules may not help much
MS – Wonder why it has trouble with small molecules when big molecules are OK. Were there small mols in the training set?
LW – We’d filtered them out, following the method of Riniker. We wanted to make sure we got enough chemical complexity to make the model useful.
MG - Does the Riniker method work for small molecules?
LW - I don’t know. We could ask her.
MT - Are there any GCNN methods that can handle small molecules up to biopolymers?
MS – Not sure. Have we tried espaloma-charge?
JC – Make sure to use espaloma-0.3
PB – How is comparison to interaction energies?
LW – Error is high, there are factors in QM that make the dimer geometries very high energy in MM