10 min? | Eventually separate out ring torsions | David Mobley and @Pavan Behara | YTZ suggests we eventually separate ring and non-ring torsions; this is probably a good idea – if not in general, for some torsions at least. This way, during somewhat naive ML-based training, we can treat the torsions differently and not wind up fitting the Slides Slide 1 DM – Is number 2 always a ring torsion? CBy – It wouldn’t have to be, but it’d be super unusual to have two cyclopropyls separated yb one bond. DM – It may be good to label these as “always in ring” vs “never in ring”
CB – Are there any torsions that COULDN’T be in a ring? Wouldn’t it be safer to just look for !@ in the central bond? DM – That may be too strict CB – Anything with a triple bond also can’t be in a ring. DM – Anything with two connections, where one of them is a hydrogen or a halogen
JW – We’re describing two approaches here, some with no false positives, some with no false negatives. Which is more important? How do these weigh in our outcomes? CBy – Is this torsion binning really a priority for OpenFF, and not just YTZ? Do we really expect a dramatic improvement in FF quality by separating them out? I think that EXOCYCLIC torsions from rings are very important. But ENDOCYCLIC isn’t so important (DM shows “Figure 1: Buckling on 1,2,3,4-tetraphenylbenzene”) – Isn’t this a failure of endocyclic rings? CBy – That’s a chemical perception problem – That torsion is being misinterpreted as a single bond. CBy – One of PB’s parameters has a =,: ( ? ) and I don’t think that’s… (lost the thread here) DM – ? CBy – I think you’re right, I think you’ve shown that in-ring parameters aren’t important… But I still am not sure that there’s a need to split out in-ring torsions to improve FF quality, or that it’s worth the added complexity. DM – I’m not suer, but it’d by useful for YTZ, and it’s a useful experiment for us to do. CBy – I think ring torsions are critically important to treat differently for EXOCYCLIC torsions. Substituents off of 3- and 4-membered rings will have weird valence angles. I’m not sure how much they DO deviate, maybe it’s a small unimportant amount, but they definitely do deviate. So we could split out torsions where the 1-2 atoms are in a 3- or 4-membered rings. DM – I’m wondering whether a fitting experiment where ring torsions are split out would be a at all a good use of time CBy – I don’t think that this will be a great use of time, except for 3- and 4-membered rings. Are we thinking of splitting based on the CENTRAL bond of the torsion, or the TERMINAL ones? Otherwise I don’t think this is a good plan. DM – I can ask YTZ whether he has more data on this CBy – I’m interested to see the case where being in a ring has more of an effect than just
CC – GLYCAM fixed a puckering problem by adjusting charges, not torsions JW – Shoudl we draw a distinction between 3- and 4-membered rings, SATURATED 5- and 6-membered rings, and UNSATURATED 5- and 6-membered rings? Seems like the arguments here would apply different to each of these CBy – For some 5- and 6-membered rings, I’d think that standard (regardless-of-ring) parameters should be fine. But if we start slicing at this level, it seems like we’ll begin approaching something like a protein FF, where we split things off from general physics-based terms at the cost of extensibility. SB – Agree with CBy – I don’t think there should be separate terms for some rings until we have clear evidence from our data. DM – So, you’re suggesting we look at structural deviations/violations for ring members/substituents? SB – Yes. This would inform both our fitting approach and our datasets. CC – Agree CBy – Generally, I think using in-ring torsions to parameterize out-of-ring torsions is pathological (bad), whereas using out-of-ring torsions to fit in-ring torsions is good. PB – We looked at linear and other ? alkanes while we were debugging the sulfonamide issue, and didn’t see evidence of the need for the ring/non-ring splitting. SB – I have some bandwidth and can look at this in the next few days. SB will look into whether torsions that appear in a mix of in-ring and out-of-ring contexts have a large number of violations as a result of their use in both contexts.
SB – Generally, I’m interested in answering “what sort of data should torsions get fit to?”, Right now we take training data BOTH from torsiondrives AND from optgeo. I wonder if we should only fit to torsiondrives? Or whether we should fit to a third kind of data? DM – I’m not sure. In general I’d think that the training should be agnostic (“physics is physics”). I can see the argument, though, that taking minimized structures would overrepresent the minimum of the torsions and could overwhelm the specific torsiondrive data. SB – That’s a great way to put it. PB will be looking into this, and I’m wondering whether we should do single-stage fits vs. multi-stage fits. This may be where the vibfreq fitting went wrong, where it added noise to torsion parameters CBy – We need to watch out for the phenomenon where everyone blames torsions for FF inaccuracies because they’re trained last. By the time they’re trained, errors from vdW and electrostatics are already baked in, and torisons just have to clean them up. In the case of something like a steric clash in a torsiondrive, the error in the sterics will likely dominate+mislead the torsion fit. CC – That coupling that CBy described may be the REASON we SHOULD split out rings vs. non-ring torsions. CBy – Important to keep track of which part of the vdW range is being fit – In physical property calcs its the soft part of the vdW curve, but for torsions it’s the hard part. SB – Two comments: JHorton is looking into double-exponential potentials, which have additional degrees of freedom, which can be co-optimized with torsions to get a quality fit a strawman: Should we be changing the dihedral fuctional form to not be periodic, but instead to directly parameterize LJ on top of it. That is, the LJ 1-4 scale could be literally 0, and all of it could be baked into the torsion profile. Did we ever investigate that?
CBy – Philospohically, an endpoint of a great FF would be that the 1-4 scale factors of a FF should tend toward 1.0. But in reality we know that the 1-4 scale is always under 1. Why is it never greater than 1? I think it’s because the LJ potential is ALWAYS too repulsive. I also think that there’s a factor from charge penetration that could be better modeled. SB – That makes sense. To rephrase one of my points: “We assume that the chemical environment that we define for our long range interactions is also the chemical environment that appropriate for short range/torsion interactions as well”. There are other places where we decouple our chemical environments, for example we could end up with BCC parameters that are different from our vdW parameters or unique torsion atoms, and we accept that that’s OK. CBy – When stuff is crowded around a partially negative oxygen atom, we recognize that there are big errors arising from treating that oxygen as a point charge…. Not sure whether to handle this on the physics side or the cheminformatics side
JW – Is there an important thread between SB saying “we already effectively have different atom types for torsions vs. vdW vs. BCCs and that’s OK” and CBayly saying “The ideal 1-4 scaling factor is 1.0”?
|