Torsion multiplicity update slides: | BW | Added new training and benchmarking data to support new parameters TM FF fit based on sage 2.2, performs equally well/better than 2.2 on existing industry benchmark dataset. Hard to know what to make of new benchmarks, small dataset so weird distributions AMI: found that one good way to visualize changes is plotting the QM vs MM value So far, just duplicating parent torsion, but may need to change periodicity etc for new parameters BS: Slide 21, top left figure is just torsion parameter, but the rest are contributions from bonds, angles, etc? BW: yes BS: torsion energy term should be a small correction to torsion provided by nonbonded terms, so sometimes the FF term doesn’t look at all like the actual torsion drive, which shows full energy. Most of torsion energy comes from 1,4 interactions, so actual “torsion” term can be negative, have the wrong phase, etc BW: yeah, that’s been an issue with looking at these, because i don’t really know what to do with the torsion parameter plots BS: you can look at the torsion energy through non-bonded parameters
PB: are you going to look at residuals for next step? e.g. take out torsion and look at total energy (MM - torsion) LW: what were the huge outliers in 2.1? BS: has anyone looked at how many molecules provide training data for each parameter? BW: yes that’s how i came up with new training data, looked at torsions that had low coverage. still some where there are only one/a few molecules BS: is there a rule of thumb for how many data points you need to really train it? LW: also complicated because more data means the fit takes longer, so being conservative PB: don’t train a parameter if it has less than maybe 5 records PB: gen 3 torsion training set constructed by hand to minimize number of molecules BW: i fragmented chembl for this, would be cool to construct molecules like this LW: similar to how XFF constructed their training data AMI: yes, looking at some of the parameters added in 2.1 especially some of the bonds and angles are also missing training data
|