Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The YAMMBS benchmarks were calculated as follows. The MoleculeStore object was generated with this commit of the YAMMBS package, which optimizes the molecules using a convergence threshold of 5e-9 kJ/mol/nm^2. The ddE’s, TFD’s, and ICRMSD’s were calculated using this commit of the YAMMBS package, which has revised the ddE definition to be consistent with the old benchmarking script (defined with respect to the lowest energy QM conformer, and ddE’s between the lowest energy conformers are excluded) and uses mapped SMILES to create molecules for TFD analysis. The RMSD’s have been calculated using the same commit as the ddE and TFD, but with a modified RMSD function that uses all atoms to calculate the RMSD, in order to be consistent with the RMSD definition in the old benchmarks. YAMMBS typically uses a heavy-atom-only RMSD, which is much faster to compute.

Molecules identified as “problematic” were removed--inverted 7-membered rings with bad QM structures, and molecules containing S with 4 single bonds that are not covered correctly.

The code and files used in this are available here

Github link macro
linkhttps://github.com/amcisaac/compare_benchmarks

...

The overall DDE’s are nearly identical between YAMMBS and the old benchmark (“og” in the figures).

...

dde_zoom.jpgImage Added

dde.jpgImage Modified

...

However, examining individual records reveals a number of cases where the ddE’s are very different. There are 564 563 records with the ddE difference > 5 kcal/mol

...

When comparing the all-atom RMSD’s between the two codes, the overall RMSD benchmarks are nearly identical.

...

...

...

However, there are still large deviations when examining individual records. There are 222 219 records with the difference in RMSD > 0.4 A.

...

The overall TFD metric is nearly the same between the two sets of scripts.

...

tfd_smiles_zoom.jpgImage Added

tfd_smiles.jpgImage Modified

...

However, there are a few records with large deviations in TFD between scripts. There are 56 55 records with a TFD difference > 0.1.

tfd_smiles_dif.jpgImage Modified

ICRMSD

Bonds

...

...

bondrmse_smiles_zoom.jpgImage Removed

Actually, the bond RMSE is pretty consistent on an individual-record basis, with the exception of the one outlier molecule identified above.

bondrmse_smiles_dif.jpgImage Modified

Angles

...

anglermse_smiles_zoom.jpgImage Added

anglermse_smiles.jpgImage Modified

...

The record-by-record comparison for angle RMSE is also relatively good, though 82 18 records have an angle RMSE difference larger than 1 degree.

anglermse_smiles_dif.jpgImage Modified

Dihedrals

dihedralrmse_smiles_ecdf.jpgImage Modified

dihedralrmse_smiles_zoom.jpgImage Added

dihedralrmse_smiles.jpgImage Modified

...

The dihedral angles show large differences when comparing individual records. There are 239 229 records with a difference in dihedral angle RMSE of > 10 deg

dihedralrmse_smiles_dif.jpgImage Modified

Impropers

The overall behavior of the impropers is concerning, with yammbs showing significantly worse results. I couldn’t find a reason in the code for this. Lily suggested it could be a difference in the number of times the impropers are applied during evaluation?

improperrmse_smiles_ecdf.jpgImage Modified

improperrmse_smiles_zoom.jpgImage Added

improperrmse_smiles.jpgImage Modified

...

Same situation for impropers; 332 301 records have a difference of > 10 deg.

...

Comparing individual records with large deviations

As seen above, many individual records have large deviations between the two codes for each metric. One might expect if a given record has bad agreement in RMSD, it would also have bad agreement in other metrics, and vice versa. However, there isn’t a lot of overlap between the records that have large deviations in different metrics. Below are the overlaps between different sets. For example, “overlap DDE/RMSD” is reporting the number of records that both have a large discrepency in DDE and a large discrepency in RMSD between the two codes.

Code Block
DDE big dif 563
RMSD big dif 219
TFD big dif 55
Bond big dif 5
Angle big dif 18
Dihedral big dif 229
Improper big dif 301

Overlap DDE/RMSD 36
Overlap DDE/TFD 10
Overlap RMSD/TFD 2928

Overlap DDE/bond 0
Overlap DDE/angle 0
Overlap DDE/proper 4443
Overlap DDE/improper 43

Overlap RMSD/bond 0
Overlap RMSD/angle 10
Overlap RMSD/proper 106
Overlap RMSD/improper 52

Overlap TFD/bond 0
Overlap TFD/angle 0
Overlap TFD/proper 22
Overlap TFD/improper 21

After looking into some of the large deviations for each metric, it seems like molecules with a large DDE deviation typically have nearly identical structural metrics between the old and new codes, suggesting perhaps the charges or something to do with the energy evaluation could be the cause, rather than optimizing to a different structure.

For molecules with large deviation in TFD, RMSD, or ICRMSD’s, typically other structural metrics and ddE were also off, although not always off enough to be considered “large” and appear in the overlap above. E.g. a molecule with a difference in dihedral RMSE between OG and YAMMBS has a ddE difference of ~3 kcal/mol and an RMSD difference of 0.1 A, which is lower than the 5 kcal/mol cutoff for the ddE or the 0.4 A cutoff for the RMSD to be “very different”, but still suggests we may be looking at a different structure.

However, the same is not true for the Improper RMSE, which often has identical ddE and structural metrics but only a difference in improper RMSE. This caused Lily to suggest it could be a difference in accounting, e.g. how many times the improper parameter was applied during evaluation of the RMS error.