Geometry optimization settings for benchmarks
A crucial first step of forcefield benchmarking is optimizing the geometry of a given conformer with the force field, starting with a conformer structure that has already been optimized with QM. This is to compare not only how well the MM-optimized structure agrees with its QM-optimized parent, but also to ensure the structure is relaxed before assessing the relative energies. Ideally, we would like to do a local minimization, as there are usually many other minima on the PES that correspond to other conformers, though it can be challenging to ensure a minimizer is not finding a more global minimum.
While working on benchmarking energy ordering of conformers, it became clear that the geometry optimization settings used for our local minimizations during benchmarking can have a dramatic effect on the structures (and energies) produced. In attempting to reproduce the structures generated during the industry benchmarking project https://openforcefield.atlassian.net/wiki/x/EQBOLQ, I found that using OpenMM’s optimizer produced structures that were much closer to the QM parent than were obtained during that project, though the energies were much lower for the structures produced by the industry benchmarking project, suggesting OpenMM was finding more local minima.
Subset of 500 “problem” molecules
First, I investigated this problem by selecting a subset of 500 conformers that did not result in the same structure using Sage 2.0.0 + AmberTools AM1BCC charges + OpenMM’s default geometry optimizer (L-BFGS algorithm, Cartesian coordinates, convergence criterion is RMS gradient < 10 kJ/mol/nm) and using Sage 2.0.0 + AmberTools AM1BCC charges + GeomeTRIC (Levenberg-Marquardt algorithm, delocalized internal coordinates, and Gaussian’s convergence criteria detailed here, including RMS gradient < 14.74 kJ/mol/nm as well as criteria for energy, max gradient component, and displacement of atoms).
Between OpenMM and GeomeTRIC, there are several notable differences: the optimization algorithm, the coordinate system, and the convergence criteria. Neither package has the option to change the algorithm, so this was not explored. First, I switched GeomeTRIC to Cartesian coordinates, as delocalized coordinates are believed to more easily locate global minima. I also modified GeomeTRIC’s convergence criteria, removing the energy, max gradient component, and displacement criteria but leaving the default RMS gradient value. The logic behind this choice was that the GeomeTRIC default criteria are too strict to locate a local minimum, as not all criteria may be satisfied, leading the optimizer to continue until a more global solution is found. Finally, I tried a combination of GeomeTRIC with both Cartesian coordinates and using only the RMS gradient as the convergence criterion. I conducted these experiments on 500 conformers, but only 368 of the conformers converged with all methods. The results of this experiment are below.
Method | Number of structures that match QM | Number of structures that disagree between the method and the Industry Benchmarking Project | Number of conformers where the Industry Benchmarking Project result is closer to QM structure | Number of conformers where the new method result is closer to the QM structure | Number of conformers where the Industry Benchmarking Project result is lower energy | Number of conformers where the new method result is lower energy |
---|---|---|---|---|---|---|
OpenMM defaults | 324 | 345 | 0 | 345 | 329 | 16 |
GeomeTRIC with Cartesian coordinates | 266 | 266 | 0 | 285 | 228 | 57 |
GeomeTRIC with RMS gradient convergence | 226 | 212 | 0 | 212 | 172 | 40 |
GeomeTRIC with Cartesian coordinates and RMS gradient convergence | 351 | 344 | 0 | 344 | 291 | 53 |
GeomeTRIC with Cartesian coordinates and using only the RMS gradient as a convergence criterion produced the best result, however OpenMM integrates more easily into our existing workflow and provides results that are nearly as good. After this study, we decided to use OpenMM with its default convergence settings.
Comparison between OpenMM defaults and Internal Benchmarking scripts
For our internal benchmarking scripts we do not use GeomeTRIC, we use OpenMM with a convergence criterion of RMS gradient < 5e-9 kJ/mol/nm and a max number of steps of 1500. These settings were initially selected due to an error in the OpenMM documentation, which stated that the tolerance
parameter for minimizeEnergy
is an energy cutoff in units of kJ/mol, when in reality it is a cutoff in the RMS gradient in units of kJ/mol/nm, making 5e-9 extremely tight. As a result, I compared the OpenMM default optimization settings, previously identified as working well on a subset of the dataset, to our existing workflow using the much tighter convergence. The results can be seen in the table below. It was not clear to me whether the unconstrained or constrained version of Sage 2.0.0 was used in the industry benchmarking project, so for the OpenMM default settings, both are presented. These results are for about 9600 molecules with 73000 conformers.
Method | Total number of structural matches | Molecules with >50% of conformers matching | Molecules with 100% of conformers matching |
---|---|---|---|
Industry benchmarking project (GeomeTRIC defaults) | 36692 | 5495 | 2328 |
OpenMM, 5e-9 tolerance | 45393 | 6398 | 3229 |
OpenMM, 10 tolerance, unconstrained | 55800 | 8418 | 4256 |
OpenMM, 10 tolerance, constrained | 53510 | 8128 | 3867 |
Based on this analysis, it seems that OpenMM’s default convergence performs much better than OpenMM with a very tight convergence criterion.
We further investigated this to ensure that the looser convergence criterion was genuinely leading to better local minima, rather than just cutting off the optimization early. Looking at about 5 example case studies revealed that OpenMM with the default settings was converging to a local minimum, and that the tighter convergence criterion typically locates a more global minimum, though it usually does not converge and just ends after 1500 iterations. Results from one representative case study are below:
tolerance=5e-9, maxIter=1500
. The molecule shown is molID = 1, conformer 6 using the YAMMBs benchmarking code and industry dataset.tolerance=5e-9, maxIter=1500
. The default settings lead to a local minimum structure that agrees well with QM, whereas the tight convergence criterion leads the optimizer to find a more global solution.Based on this analysis, we have decided to stick with OpenMM’s default convergence setting of 10 kJ/mol/nm in the RMS gradient for future benchmarking endeavors.