v1.2.0-RC1 fitting result summary

Fitting Data and Results

Optimized geometries

Hessians

Torsion scans

Optimized geometries

Hessians

Torsion scans

4,745

1189

710

  • Input force field : SMIRNOFF99Frosst w/ some parameter modification (same initial guess with v1.1.0)

  • The objective function decreased from 3.619e+04 to 6.877e+03 in 57 steps.

*Note: forcebalance v1.7.1 and openforcefield v0.6.0 were used for the target generation and the optimization.

Benchmark Results

Benchmark data

For the calculation, full benchmark set was used (25168 optimized geometries, plus relative energies of 2005 molecules). Detailed of the molecule selection can be found here: release-1-benchmarking/QM_molecule_selection

(1) Comparison of objective values from single point calculations on benchmark full set

Two types of benchmarks were done to compare the performances: (1) QM vs MM optimized geometries and (2) the relative energies between conformers at “QM optimized geometries”. The final objective function value(X2) from FB single point calculation gives a brief overview of the agreement between QM and MM. The lower X2 is, the better the force field reproduces QM structures and energetics.

 

objective value (X2)

 

objective value (X2)

initial guess

29,469

v1.1.0

20,097

v1.2.0-preliminary (link: http://doi.org/10.5281/zenodo.3781313 )

16,939

v1.2.0-RC1

16,713

(2) QM vs MM optimized geometries

To investigate the improved performance in reproducing QM optimized geometries, the weighted root-mean-square error (WRMSE) of each molecule, which is weighted root-mean-square deviation of internal coordinates of MM optimized geometry from QM optimized geometry was calculated and compared.

( Metrics for bond, angle, improper torsion are set to be 0.05 Angstrom, 8 degrees and 20 degrees respectively and torsion contributions were intentionally excluded.

y values in the plots(Δ WRMSE) are the change in the WRMSE before and after valence parameter optimization; Negative y value indicates improved performance after the optimization. Red dots below the black curve indicate better performances of v1.2.0-RC1 over v1.1.0 in reproducing QM optimized geometries.

y values in the plots(Δ WRMSE) are the differences in the WRMSE between v1.2.0-RC1 and v1.1.0; Negative y value indicates better reproduction of QM optimized geometries with v1.2.0-RC1 compared to v1.1.0. The average change in WRMSE is -0.027, indicating that overall, v1.2.0-RC1 better performs in reproducing QM optimized geometry than v1.1.0.

Especially, molecules having deprotonated phosphonate group(RP(=O)(OH)(O^-)) showed significant improvement in v1.2.0-RC1, fixing a known issue with protonated phosphorus-connected oxygens which has plagued AMBER-family force fields.

QM optimized geometry of CC(O)([P@@](=O)(O)[O-])[P@](=O)(O)[O-]. ( transparent orange: MM optimized geometry with v1.1.0 force field, transparent green: v1.2.0-RC1 force field)

Here’s one of molecules which has two deprotonated phosphonate groups. In the optimized geometry from the v1.1.0 force field(transparent orange in the figure), the hydroxyl hydrogen is located much closer to the negatively charged oxygen (1.10 Å) than in the QM optimized geometry (> 2.4 Å), forming an overly strong intramolecular hydrogen bond. The v1.2.0-RC1 force field corrects this error and gives a closer agreement with QM optimized geometry by having a larger equilibrium angle for phosphorus-centered angle term(a38: [*:1]~[#15:2]~[*:3]).

QM optimized geometry of CC1(O[C@@H]2CO[C@@]3([C@H]([C@@H]2O1)OC(O3)(C)C)COS(=O)(=O)N)C. ( transparent orange in the left figure: MM optimized geometry with v1.1.0 force field, transparent green in the right figure: MM optimized geometry with v1.2.0-RC1 force field)

For some conformers of CC1(O[C@@H]2CO[C@@]3([C@H]([C@@H]2O1)OC(O3)(C)C)COS(=O)(=O)N)C, v1.1.0 introduced a weird spatial orientation of covalent bonds to tetravalent sulfur, resulting in a much smaller angle for N-S-O(63.3 degree) than in QM optimized geometry (97.0 degree). This issue is fixed in v1.2.0-RC1.

(3) the relative energies between conformers at QM optimized geometries

To investigate the improved performance of the new parameter set in reproducing QM relative energies between conformers, QM vs MM relative energies between conformers “at QM optimized geometries” were calculated. Two different ways to calculate MM relative energies were used. For the left figure, MM relative energies were calculated by taking a difference between MM energy at each point and MM energy at the QM minimum. And for the right figure, MM relative energies were obtained by taking a difference between MM energy at each point and MM energy at QM minimum. Both distribution plots show lower MAD(mean absolute deviation) compared to v1.1.0, indicating a slight, but not notable improvement of the new parameter set in reproducing QM energetics.

(4) Improvement in reproducing QM optimized geometries of molecules having #7X2-#7X3

Benchmark analysis done by @Victoria Lim (Deactivated)(For further details, please see http://doi.org/10.5281/zenodo.3774680) shows that the v1.1.0 fails to describe (1) molecules having #7X2-#7X3(single bond between divalent nitrogen and trivalent nitrogen); (2) azetidines; and (3) octahydrotetracenes.

We speculated that the poor performances on the chemical moieties are due to the poor chemical coverage of the training set used in Parsley fitting. In the previous release, periodicities for some N-N rotations were modified to properly describe N-N bonds which are under conjugation effect and the change included the modification of components t128. However since there were no exocyclic #7X2-#7X3 rotations in the training set, a proper fitting of t128 couldn’t be carried out, resulting in unphysical optimized k values.

ddE vs TFD, ddE vs RMSD scatter plots of exocyclic divalent N- trivalent N containing molecules

Our new training set for this release now includes exocyclic rotations about #7X2-#7X3 and the new parameter set fitted to this new set shows clear improvement in lowering TFD and RMSD which indicates it is improved in reproducing QM optimized geometries while showing no significant improvement in ddE.

QM optimized geometry of Cc1c(sc(n1)N/N=C/c2ccccn2)C. ( transparent orange: MM optimized geometry with v1.1.0 force field, transparent green: MM optimized geometry with v1.2.0-RC1 force field)

One example of molecules with exocyclic #7X2-#7X3. v1.2.0-RC1 could successfully reproduce planar structure of Cc1c(sc(n1)N/N=C/c2ccccn2)C.

Conclusion

Overall, preliminary benchmarking of our Parsley v1.2.0-RC1 indicates significantly improved performance generally, as well as especially for certain phosphonate-containing molecules and for exocyclic divalent and trivalent nitrogen-containing molecules. We are excited to see how this impacts performance on a broader range of benchmarks and properties once released.