...
Torsion parameters in OpenFF describe the energy of rotating around the central bond of the torsion. A central bond can have many different torsions applied. The number of torsions applied is the multiplicity. In general, a torsional parameter should only apply to bonds of a particular multiplicity. However, previous analysis of our parameters has uncovered many torsions that are not specific enough: Analysis of Torsion multiplicity
View file | ||
---|---|---|
|
The tar file is a copy of the Google Drive folder linked in “Analysis of Torsion multiplicity” minus the very large “old_images” directory.
Goal
We should split out our torsions to only apply to single multiplicities. Even if this does not substantially improve benchmarks, it is a philosophical error. If it makes benchmarks substantially worse, we should re-assess.
...
Similarly, restricting the results to the records strictly not affected by the new parameters, shows the same trend. If anything, these may look slightly farther away from the original Sage 2.1.0 values.
...
However, as the tables above demonstrate, many of the new parameters have higher average errors than their parents. For example, t122 is applied 9572 times with Sage 2.1.0 and has an average error of 0.16 kcal/mol. Its child parameters in the TM force field, t122b, c, and f, are applied a total of 9572 times and have average errors of 0.21, 0.17, and 0.10 kcal/mol, respectively. Only the last of these is lower than the original t122 value, and it represents the lowest count anyway, so a weighted average gives a higher overall error of 0.20 kcal/mol. The trends are less clear for the Sage vs Sage-TM comparison. Table 3 shows that the addition of the TM training data leads to a decrease in the average error for t122, t130, and t143, but an increase for t164 and t142.
Another factor demonstrated by these tables is that many of the new child parameters are not covered by the benchmarking data set in this case. t143, for example, has a, b, c, d, e, and f variants in the TM force field, but only a, b, and e appear in this data. In contrast, only t143c was not covered by the training set.
Conclusion
Overall, I think the results look comparable to the Sage 2.1.0 benchmarks, at least on the DDE, RMSD, and TFD metrics. Combining this with the philosophical argument of separating the handling of torsions with different multiplicity values, I think these new torsion parameters are ready for inclusion in the main Sage force field. Further, augmenting the training data with additional coverage for the new (and existing) parameters should only improve the quality of the resulting force field.
...