...
Similarly, restricting the results to the records strictly not affected by the new parameters, shows the same trend. If anything, these may look slightly farther away from the original Sage 2.1.0 values.
...
However, as the tables above demonstrate, many of the new parameters have higher average errors than their parents. For example, t122 is applied 9572 times with Sage 2.1.0 and has an average error of 0.16 kcal/mol. Its child parameters in the TM force field, t122b, c, and f, are applied a total of 9572 times and have average errors of 0.21, 0.17, and 0.10 kcal/mol, respectively. Only the last of these is lower than the original t122 value, and it represents the lowest count anyway, so a weighted average gives a higher overall error of 0.20 kcal/mol.
Another factor demonstrated by these tables is that many of the new child parameters are not covered by the benchmarking data set in this case. t143, for example, has a, b, c, d, e, and f variants in the TM force field, but only a, b, and e appear in this data. In contrast, only t143c was not covered by the training set.
Conclusion
Overall, I think the results look comparable to the Sage 2.1.0 benchmarks, at least on the DDE, RMSD, and TFD metrics. Combining this with the philosophical argument of separating the handling of torsions with different multiplicity values, I think these new torsion parameters are ready for inclusion in the main Sage force field. Further, augmenting the training data with additional coverage for the new (and existing) parameters should only improve the quality of the resulting force field.
...