These three records, 36967197, 36998992, and 36998994 have the following SMILES strings, respectively (in a file because CONFLUENCE wouldn’t stop converting them to links):
View file | ||
Other differences
Similar trends are observed for my run of the Sage 2.1.0 refitting with Pavan’s original input files, but my updated environment with ForceBalance 1.9.5 instead of 1.9.3 used in Pavan’s environment. The figure below shows the DDE differences greater than 8 kcal/mol, which account for 176/68837 records.
On the other hand, there are 8222 entries with differences greater than 1 kcal/mol or about 12%. I think this means that many molecules have small differences rather than a few molecules exhibiting huge differences. The differences are fairly evenly split between being better in the new and old versions, however. The absolute value of the new DDE is lower in 27111 cases and lower in the old data in 35071. The two are equal in the remaining 6655 cases. Restricting the plot above to the cases where the old data is better produces a very similar distribution, but obviously reduces the counts. Notably, this also removes the greatest outlier with a deviation greater than 100 kcal/mol.
The records for the cluster around 80 kcal/mol are 36975868, 36983564, and 36997441. These are clearly much worse in the new force field because the original Sage values are --7.8, --6.0, and --11.6, compared to the new values of --96.6, --94.7, and --101.1, respectively.
I also plotted a CDF for these, but there wasn’t much to gain from it. The old force field increases slightly faster than the new, as expected.