v1.2.0 summary

Comparison between `openff-1.2.0-RC1` vs. `openff-1.2.0-RC2`

Direct parameter comparison: There are some angle (a3, a6, a15) terms with noticeably different k/ angle value between RC1 and RC2. Further inspection on the parameters will be carried out after 1.2.0 release.
Comparison of RMSD for each parameter: RC2 decreases RMSDs for t135, t146, t34 and t97 while increasing RMSDs for t54 and t55.

1. Benchmarking with benchmark full set

For the details about the test set: release-1-benchmarking/QM_molecule_selection

Two types of benchmarks were done using ForceBalance to compare the performances: (1) QM vs MM optimized geometries and (2) the relative energies between conformers at “QM optimized geometries”. The final objective function value(X2) from FB single point calculation gives a brief overview of the agreement between QM and MM. The lower X2 is, the better the force field reproduces QM structures and energetics. Objective value from RC1 is lower than the objective value from RC2.

	objective value (X2)

	objective value (X2)
v1.1.0	20,097
v1.2.0-RC1	16,713
v1.2.0-RC2	16,910

(1) QM vs MM optimized geometries

To investigate the improved performance in reproducing QM optimized geometries, the residual of each molecule, a sum of squared weighted residual of internal coordinates was calculated and plotted.

scatter plots of (a) residual from v1.2.0-RC1 - residual from v1.1.0 and (b)residual from v1.2.0-RC2 - residual from v1.1.0 for benchmark full set

Each y value in the plot (delta residual) is residual from v1.2.0-RC1 (v1.2.0-RC2) - residual from v1.1.0. Negative y value means residual from v1.2.0-RC1 (v1.2.0-RC2) is lower than that from v1.1.0, indicating an improved performance of the candidate parameter set over v1.1.0. The mean of residual difference is -1.364 for v1.2.0-RC1, -1.264 for v1.2.0-RC2. Overall, both candidates show improvement in reproducing QM optimized geometries over v1.1.0.

(2) QM vs MM relative energies between conformers at QM optimized geometries

Distributions of QM vs MM relative energies between conformers at QM optimized geometries for benchmark full set. (a) MM relative energies were calculated by taking a difference between MM energy at each point and MM energy at the QM minimum. (b) MM relative energies were obtained by taking a difference between MM energy at each point and MM energy at QM minimum.

Both distribution plots show lower MAD(mean absolute deviation) compared to v1.1.0. For the full set, RC2 gives a slightly lower MAD compared to RC1.

2. Benchmarking with FDA-approved inhibitor set

To check performance of the parameter sets on more pharmaceutically relevant molecules, the same benchmark calculation was carried out with FDA-approved inhibitor set. To see the details of the set, please check here (https://github.com/openforcefield/qca-dataset-submission/tree/master/2019-09-08-fda-optimization-dataset-1)

(1) QM vs MM optimized geometries

scatter plots of (a) residual from v1.2.0-RC1 - residual from v1.1.0 and (b)residual from v1.2.0-RC2 - residual from v1.1.0 for FDA-approved inhibitor set

The result shows a similar trend with the result from benchmark full set. Overall v1.2.0-RCs performed better in reproducing QM optimized geometries over v1.1.0.

(2) QM vs MM relative energies between conformers at QM optimized geometries

Distributions of QM vs MM relative energies between conformers at QM optimized geometries for FDA-approved inhibitor set. (a) MM relative energies were calculated by taking a difference between MM energy at each point and MM energy at the QM minimum. (b) MM relative energies were obtained by taking a difference between MM energy at each point and MM energy at QM minimum.

Both distribution plots show lower MAD(mean absolute deviation) compared to v1.1.0. For the FDA set, RC1 gives a slightly lower MAD compared to RC2.

Conclusion

Overall, both candidates, v1.2.0-RC1 and v1.2.0-RC2, show improvement in reproducing QM optimized geometries over v1.1.0, fixing some optimization issues that v1.1.0 have with certain functional groups(phosphonates, N-N, sulfamates).
There are some slight differences in parameter values (some angle terms) and in performance for the test sets (benchmark full set, FDA-approved inhibitor set) between the two candidates.
For benchmark full set, v1.2.0-RC1 gives a slightly better performance in reproducing QM optimized geometries than RC2, while RC2 gives lower MAD in reproducing QM energetics.
For FDA-approved inhibitor set, RC1 performs slightly better in both reproducing QM optimized geometries and QM relative energies between conformers.
Overall, RC1 performs better than RC2 in reproducing QM optimized geometries.

v1.2.0 Summary ( for release note)

link to the release note: https://github.com/openforcefield/openforcefield-forcebalance/releases/tag/v1.2.0

Description

We are pleased to announce the release of Parsley version 1.2.0.

The major change we made in the new release is a new quantum chemical dataset, which is utilized in training valence parameters in the force field.

Parsley 1.2.0 has been fitted to the new, carefully designed quantum chemical dataset and the successive benchmark results showed significantly improved performance in reproducing QM optimized geometries especially for certain phosphonate-containing molecules and for exocyclic divalent and trivalent nitrogen-containing molecules over Parsley 1.1.0 .

Fitting Data and Results

Fitting targets: 2nd generation training sets (link for the details of the training set generation scheme: http://doi.org/10.5281/zenodo.3777278)

Optimized geometries	Hessians	Torsion scans

Optimized geometries	Hessians	Torsion scans
4,745	1189	710

Input force field : SMIRNOFF99Frosst w/ some parameter modification (same initial guess with v1.1.0)
The objective function decreased from 3.619e+04 to 6.877e+03 in 57 steps. After the optimization, additional Removal of redundancy in t108 SMIRKS pattern was done: [#6X3:1]-@[#16X2,#16X1-1,#16X3+1:2]-@[#6X3,#7X2;r5:3]=@[#6,#7;r5:4] → [#6X3:1]-@[#16X2,#16X3+1:2]-@[#6X3,#7X2;r5:3]=@[#6,#7;r5:4]

Note: forcebalance v1.7.1 and openforcefield v0.6.0 were used for the target generation and the optimization.

Benchmark Results

Benchmark data

For the calculation, full benchmark set was used (25168 optimized geometries, plus relative energies of 2005 molecules). Detailed of the molecule selection can be found here: release-1-benchmarking/QM_molecule_selection

(1) Comparison of objective values from single point calculations on benchmark full set

Two types of benchmarks were done to compare the performances: (1) QM vs MM optimized geometries and (2) the relative energies between conformers at “QM optimized geometries”. The final objective function value(X2) from FB single point calculation gives a brief overview of the agreement between QM and MM. The lower X2 is, the better the force field reproduces QM structures and energetics.

(2) QM vs MM optimized geometries

To investigate the improved performance in reproducing QM optimized geometries, we computed and plotted the residual for each molecule. This gives a sum of squared weighted deviations for internal coordinates:

Equation of residual

scatter plot of residual from `v1.2.0` - residual from `v1.1.0`

Each y value in the plot (delta residual) is residual from v1.2.0 - residual from v1.1.0. Negative y value means residual from v1.2.0 is lower than that from v1.1.0, indicating an improved performance of the candidate parameter set over v1.1.0. The mean of residual difference is -1.364, indicating improvement in reproducing QM optimized geometries over v1.1.0.

Especially, molecules having deprotonated phosphonate group(RP(=O)(OH)(O^-)) showed significant improvement in v1.2.0, fixing a known issue with protonated phosphorus-connected oxygens which has plagued AMBER-family force fields.

QM optimized geometry of CC(O)([P@@](=O)(O)[O-])[P@](=O)(O)[O-]. ( transparent orange: MM optimized geometry with v1.1.0 force field, transparent green: v1.2.0 force field)

Here’s one of molecules which has two deprotonated phosphonate groups. In the optimized geometry from the v1.1.0 force field(transparent orange in the figure), the hydroxyl hydrogen is located much closer to the negatively charged oxygen (1.10 Å) than in the QM optimized geometry (> 2.4 Å), forming an overly strong intramolecular hydrogen bond. The v1.2.0 force field corrects this error and gives a closer agreement with QM optimized geometry by having a larger equilibrium angle for phosphorus-centered angle term(a38: [*:1]~[#15:2]~[*:3]).

QM optimized geometry of CC1(O[C@@H]2CO[C@@]3([C@H]([C@@H]2O1)OC(O3)(C)C)COS(=O)(=O)N)C. ( transparent orange in the left figure: MM optimized geometry with v1.1.0 force field, transparent green in the right figure: MM optimized geometry with v1.2.0 force field)

For some conformers of CC1(O[C@@H]2CO[C@@]3([C@H]([C@@H]2O1)OC(O3)(C)C)COS(=O)(=O)N)C, v1.1.0 introduced a weird spatial orientation of covalent bonds to tetravalent sulfur, resulting in a much smaller angle for N-S-O(63.3 degree) than in QM optimized geometry (97.0 degree). This issue is fixed in v1.2.0.

(3) the relative energies between conformers at QM optimized geometries

To investigate the improved performance of the new parameter set in reproducing QM relative energies between conformers, QM vs MM relative energies between conformers “at QM optimized geometries” were calculated. Two different ways to calculate MM relative energies were used. For the left figure, MM relative energies were calculated by taking a difference between MM energy at each point and MM energy at the QM minimum. And for the right figure, MM relative energies were obtained by taking a difference between MM energy at each point and MM energy at the MM minimum. Both distribution plots show lower MAD(mean absolute deviation) compared to v1.1.0, indicating a slight, but not notable improvement of the new parameter set in reproducing QM energetics.

Distributions of QM vs MM relative energies between conformers at QM optimized geometries. (a) MM relative energies were calculated by taking a difference between MM energy at each point and MM energy at the QM minimum. (b) MM relative energies were obtained by taking a difference between MM energy at each point and MM energy at QM minimum.

Benchmark analysis done by @Victoria Lim (Deactivated)(For further details, please see http://doi.org/10.5281/zenodo.3774680) shows that the v1.1.0 fails to describe (1) molecules having #7X2-#7X3(single bond between divalent nitrogen and trivalent nitrogen); (2) azetidines; and (3) octahydrotetracenes.

We speculated that the poor performances on the chemical moieties are due to the poor chemical coverage of the training set used in Parsley fitting. In the previous release, periodicities for some N-N rotations were modified to properly describe N-N bonds which are under conjugation effect and the change included the modification of components t128. However since there were no exocyclic #7X2-#7X3 rotations in the training set, a proper fitting of t128 couldn’t be carried out, resulting in unphysical optimized k values.

ddE w.r.t. RMSD and ddE w.r.t. TFD of divalent and trivalent nitrogen-containing molecules

Our new training set for this release now includes exocyclic rotations about #7X2-#7X3 and the new parameter set fitted to this new set shows clear improvement in lowering TFD and RMSD which indicates it is improved in reproducing QM optimized geometries while showing slight sacrifice in ddE.

QM optimized geometry of Cc1c(sc(n1)N/N=C/c2ccccn2)C. ( transparent orange: MM optimized geometry with v1.1.0 force field, transparent green: MM optimized geometry with v1.2.0 force field)

One example of molecules with exocyclic #7X2-#7X3. v1.2.0 could successfully reproduce planar structure of Cc1c(sc(n1)N/N=C/c2ccccn2)C.