Sage differences between FB 1.9.3, FB 1.9.5, and 1.9.6
Background
As our software stack updates, our goal is to update from the previous software stack for fitting (ForceBalance 1.9.3 + OpenFF Toolkit 0.10.x) to versions that support Interchange (ForceBalance 1.9.5 + OpenFF 0.13.x, to begin with). This page documents differences between fits using these two stack and diagnoses the causes. Please see Reproducibility for more information.
All files were uploaded to . As this is not intended to be a public work, however, documentation is minimal.
Summary
I attempted to figure out the reasons for all differences between FB 1.9.3, FB 1.9.5, and FB 1.9.6. While I refer to these environments under those names throughout this investigation, they are shorthand for the environments described in the section below, and most differences originate outside ForceBalance. These differences can be summed up below:
The biggest and most influential change was Difference 1: a bug where Interchange’s charge caching was mis-applying charges to subsequent conformers of the same molecule when the atom order changed.
In Difference 2, the geometries optimize to the same or very similar structures, but the objective functions are very different. This suggests that we could improve the ForceBalance target by using the best RMSD measure instead of just RMSD.
In Difference 3, re-ordering the forces in OpenMM systems can result in surprisingly large changes in geometry. From the OpenFF toolkit 0.10.x to Interchange we changed the order in which forces are output.
In Difference 4, differences in precision explain the remaining differences between the FB 1.9.3 and FB 1.9.5 stacks.
In Difference 5, all remaining differences between 1.9.5 and 1.9.6 can be explained by a minor change in tolerance in the ForceBalance geometry optimization.
Investigation
This next section below is not comprehensive in detail, and is partly a working document I used to keep track of what other differences to look at. This is meant to be largely a one-off investigation, although it suggests that having a suite of fitting comparisons for each new software stack may be a good idea. Likely this will be most use in the future in copying Python scripts and looking at individual targets for an idea of what their objective terms used to be.
Details
The inputs used are those available at .
The environments used are specified in the attached input files. For each zipped experiment, the full environment is included.
Environments:
fb-193: fb-193-tk-010-oe-2022 environment
ForceBalance 1.9.3
OpenFF Toolkit 0.10.4
fb-193-reordered: fb-193-tk-010-oe-2022-reordered environment
The Toolkit code was edited to output forces in the same order as Interchange
create_openmm_systems
was edited here:return_topology = kwargs.pop("return_topology", False) # 2023-11-22 edit original = self._parameter_handlers order = [ 'vdW', 'Electrostatics', 'LibraryCharges', 'ToolkitAM1BCC', 'ProperTorsions', 'ImproperTorsions', 'Angles', 'Bonds', 'Constraints' ] self._parameter_handlers = {} for key in order: self._parameter_handlers[key] = original[key] for key in original: if key not in order: self._parameter_handlers[key] = original[key] # end 2023-11-22 # Make a deep copy of the topology so we don't accidentally modify it topology = copy.deepcopy(topology)
fb-195: fb-195-tk-013-oe-2022-interchange-replace-cache environment
Interchange 0.3.10, edited:
to modify the charge caching
to turn off the switching function, as this was a systematic difference between environments ( )
fb-195-switching: basically the fb-195-tk-013-oe-2022-interchange-replace-cache environment
Interchange 0.3.10, edited:
to modify the charge caching
to turn switching back on
fb-196: fb-196-ic-0318-oe-2022-crit environment
ForceBalance 1.9.6
Interchange 0.3.18
fb-196-crit: fb-196-ic-0318-oe-2022-crit environment
ForceBalance 1.9.6, with
crit
modified back to earlier 1e-4 version.Everything else the same as fb-196
ForceBalance 1.9.3 vs 1.9.5
Difference 1: Switching to Interchange with cached atom charges
The vast majority of differences originated from the opt-geo-target fits and were because of this bug:
Running a single-point calculation of the entire target set brought the objective function differences down substantially:
ForceBalance | OpenFF Toolkit | Interchange | Total objective function | Time for calculation |
---|---|---|---|---|
1.9.3 | 0.10.4 | N/A | 10926.15026275 | 1462.2 s |
1.9.5 | 0.13.0 | 0.3.10 + charge caching modification | 10921.48679230 | 920.6 s |
1.9.6 | 0.14.5 | 0.3.18 | 10909.6117273 | 2915.5 s |
1.9.6 (crit switched to 1.9.5 version) | 0.14.5 | 0.3.18 | 10921.48679230 | 963.2 s |
Remaining differences after Difference 1
Investigating individual targets highlighted these differences, still all geometry optimization targets:
Batch | Target name | Term_193 | Term_195 | Diff |
opt-geo-batch-109 | 19095414-4 | 973.288 | 678.565 | -294.723 |
opt-geo-batch-113 | 19095588-8 | 800.755 | 488.803 | -311.952 |
opt-geo-batch-123 | 19095890-3 | 432.754 | 424.808 | -7.946 |
opt-geo-batch-136 | 110312530-8 | 796.962 | 1886.142 | 1089.18 |
opt-geo-batch-19 | 18434757-17 | 1352.305 | 1744.044 | 391.739 |
opt-geo-batch-20 | 18434815-6 | 83.565 | 126.754 | 43.189 |
opt-geo-batch-20 | 18434816-9 | 71.022 | 169.281 | 98.259 |
opt-geo-batch-48 | 18437974-17 | 2097.572 | 442.121 | -1655.451 |
opt-geo-batch-71 | 19094129-20 | 470.715 | 117.418 | -353.297 |
opt-geo-batch-78 | 19094395-8 | 327.768 | 862.083 | 534.315 |
opt-geo-batch-83 | 95602606-12 | 352.848 | 609.256 | 256.408 |
opt-geo-batch-83 | 19094564-15 | 662.573 | 411.628 | -250.945 |
(also attached: , where single_point_all compares all targets, and single_point_differences highlights all differences over 1. These CSVs also include the associated SMILES which were too long to paste here).
Pasted below is the RMSD comparisons of all targets with different objective functions. The first three values columns show atom-to-atom RMSD (via MDAnalysis’s rmsd
); the last three atoms show symmetry-corrected RMSDs (via rdkit's AllChem.GetBestRMS
).
Opt batch | Target | 1.9.3-QM | 1.9.5-QM | 1.9.3-1.9.5 | 1.9.3-QM Best | 1.9.5-QM Best | 1.9.3–1.9.5 Best |
---|---|---|---|---|---|---|---|
48 | 18437974-17 | 0.9 | 0.75 | 0.55 | 0.69 | 0.69 | 0 |
71 | 19094129-20 | 1.31 | 0.91 | 0.79 | 1.26 | 0.91 | 0.7 |
113 | 19095588-8 | 3.02 | 1.85 | 2.25 | 2.98 | 1.83 | 2.18 |
109 | 19095414-4 | 1.64 | 1.12 | 1.19 | 1.64 | 1.11 | 1.18 |
83 | 19094564-15 | 1.69 | 1.69 | 0.21 | 1.68 | 1.69 | 0.21 |
123 | 19095890-3 | 2.26 | 2.26 | 0.27 | 2.26 | 2.26 | 0.27 |
20 | 18434815-6 | 0.96 | 0.99 | 0.26 | 0.96 | 0.99 | 0.26 |
20 | 18434816-9 | 0.96 | 0.97 | 0.26 | 0.96 | 0.97 | 0.26 |
83 | 95602606-12 | 1.56 | 1.55 | 0.22 | 1.56 | 1.55 | 0.22 |
19 | 18434757-17 | 1.73 | 1.78 | 0.31 | 1.67 | 1.68 | 0.3 |
78 | 19094395-8 | 4 | 4 | 0.37 | 4 | 4 | 0 |
136 | 110312530-8 | 1.09 | 1.13 | 0.69 | 1.09 | 1.09 | 0 |
Difference 2: Different interpretations of internal coordinate RMSDs
The yellow rows highlighted apparently optimize to structures with symmetry-corrected RMSDs of 0, but with atom-wise RMSD values that are somewhat high. I did not look into this further as the objectives weren’t systematically better or worse. However, this suggests that the FB target should do its best to look for the best symmetry-corrected RMSD and do its best.
Difference 3: Section ordering
In several cases, re-ordering the output of the XML changed the optimization and objective function substantially. These are highlighted in blue above.
Case study: opt-geo-batch-113: 19095588-8
Wrote intermediate OpenMM XMLs out of the OpenMM system (by modifying ForceBalance to do so)
Running through the ForceBalance target, the OpenMM XML written out by FB 1.9.3 minimizes to the 1.9.3 structure, in the 1.9.5 environment; the XML written by FB 1.9.5/Interchange minimizes to the 1.9.5 structure, in the 1.9.3 environment
In Interchange, the forces are ordered: NBForce, Torsions, Angles, Bonds
In 1.9.3, the forces are ordered: Torsions, NBForce, Bonds, Angles
I can get the 1.9.3 environment to reproduce the 1.9.5 objective by reordering the OpenMM system creation using the Interchange order, i.e. using the fb-193-reordered environment to write forces out like Interchange.
Comparing remaining differences with reordering
I have highlighted in gray the differences between these four systems where the more modern stacks (FB 1.9.5, 1.9.6) align with either fb-193 or fb-193-reordered. As the order of the forces is the only difference between the 193 stacks, to me this implies that these systems are prone to different optimization pathways given the slightest difference, so they weren’t investigated further.
Batch | Target | fb-193 | fb-193-reordered | fb-195 | fb-196 |
---|---|---|---|---|---|
108 | 19095393-10 | 85.713 | 85.713 | 85.713 | 46.043 |
109 | 19095414-4 | 973.288 | 678.567 | 678.565 | 678.593 |
113 | 19095588-8 | 800.755 | 488.803 | 488.803 | 488.806 |
115 | 19095648-2 | 241.176 | 592.293 | 241.176 | 241.209 |
123 | 19095890-3 | 432.754 | 859.172 | 424.808 | 424.811 |
125 | 19095964-29 | 557.615 | 557.896 | 557.836 | 527.51 |
127 | 95602811-22 | 828.404 | 371.009 | 828.402 | 828.471 |
131 | 95602486-3 | 169.287 | 395.607 | 169.287 | 169.288 |
132 | 110312272-7 | 44.947 | 44.947 | 44.947 | 19.051 |
132 | 110312279-12 | 19.782 | 19.782 | 19.782 | 11.507 |
134 | 110312364-2 | 35.291 | 35.291 | 35.291 | 24.732 |
134 | 110312437-22 | 536.681 | 831.782 | 536.684 | 541.105 |
136 | 110312530-8 | 796.962 | 1886.145 | 1886.142 | 1885.838 |
142 | 1760118-25 | 32.291 | 32.291 | 32.291 | 5.104 |
142 | 1760122-29 | 37.563 | 37.565 | 37.563 | 10.582 |
158 | 1760623-13 | 52.588 | 52.588 | 52.588 | 10.515 |
160 | 1760689-13 | 10.233 | 10.233 | 10.233 | 2.921 |
162 | 1760764-24 | 14.742 | 14.742 | 14.742 | 6.441 |
162 | 1760765-26 | 16.341 | 16.341 | 16.341 | 4.872 |
165 | 1760859-25 | 184.378 | 184.378 | 184.378 | 8.331 |
166 | 1760895-28 | 16.898 | 16.898 | 16.898 | 5.847 |
181 | 36784452-26 | 85.997 | 85.997 | 85.997 | 43.28 |
19 | 18434757-17 | 1352.305 | 2589.068 | 1744.044 | 1743.914 |
20 | 18434815-6 | 83.565 | 83.565 | 126.754 | 126.789 |
20 | 18434816-9 | 71.022 | 71.022 | 169.281 | 169.253 |
21 | 18434921-7 | 1805.074 | 1389.509 | 1805.075 | 1805.138 |
28 | 18435721-28 | 333.911 | 333.91 | 333.91 | 320.96 |
36 | 18436551-7 | 20.981 | 20.981 | 20.981 | 9.047 |
48 | 18437974-17 | 2097.572 | 442.1 | 442.121 | 441.091 |
5 | 18433236-22 | 654.976 | 819.474 | 654.976 | 654.977 |
51 | 18438135-12 | 794.575 | 1409.209 | 794.575 | 794.554 |
51 | 18438149-18 | 100.882 | 108.452 | 100.882 | 100.882 |
58 | 18438954-4 | 661.058 | 661.059 | 661.06 | 9.408 |
64 | 18439704-15 | 17.706 | 17.706 | 17.706 | 7.98 |
71 | 19094129-20 | 470.715 | 470.715 | 117.418 | 117.415 |
75 | 19094250-10 | 480.939 | 150.779 | 480.94 | 480.97 |
78 | 19094395-8 | 327.768 | 327.768 | 862.083 | 862.083 |
79 | 19094430-15 | 119.851 | 119.701 | 120.066 | 101.136 |
79 | 19094441-17 | 84.345 | 84.096 | 84.232 | 69.599 |
83 | 19094564-15 | 662.573 | 411.628 | 411.628 | 411.699 |
83 | 95602606-12 | 352.848 | 609.252 | 609.256 | 609.277 |
91 | 19094843-7 | 523.362 | 1019.271 | 523.364 | 523.564 |
96 | 19095011-20 | 1561.274 | 2592.095 | 1561.278 | 1561.667 |
96 | 19095014-23 | 239.718 | 1177.13 | 239.723 | 239.679 |
Determining thresholds for investigation
It is common for structures to optimize to very slightly different geometries, especially due to differences in precision. I found the following trends comparing the 1.9.3 environment with 1.9.5:
Difference in objective term | RMSD difference | Best RMSD difference |
---|---|---|
0.000-0.100 | 8e-08 to 6e-04 | 0e+00 to 6e-04 |
0.100-0.200 | 6e-05 to 8e-04 | 6e-05 to 8e-04 |
0.200-0.300 | 5e-04 to 8e-04 | 5e-04 to 8e-04 |
0.300-0.400 | 4e-04 to 4e-04 | 4e-04 to 4e-04 |
7.500-7.600 | 1e-01 to 1e-01 | 1e-01 to 1e-01 |
ForceBalance 1.9.5 remaining differences
Batch | Target | fb-193 | fb-193-reordered | fb-195 | fb-196 |
---|---|---|---|---|---|
123 | 19095890-3 | 432.754 | 859.172 | 424.808 | 424.811 |
19 | 18434757-17 | 1352.305 | 2589.068 | 1744.044 | 1743.914 |
20 | 18434815-6 | 83.565 | 83.565 | 126.754 | 126.789 |
20 | 18434816-9 | 71.022 | 71.022 | 169.281 | 169.253 |
71 | 19094129-20 | 470.715 | 470.715 | 117.418 | 117.415 |
78 | 19094395-8 | 327.768 | 327.768 | 862.083 | 862.083 |
Difference 4: differences in precision
This difference, as detailed in the case study below, originates from minor errors in precision between the systems. Every difference in the above remaining table is explained by this difference. An example of the differences is on GitHub at . This file compares the case study system below (opt-geo-batch-20, 18434815-6) generated from fb-193-reordered and compared to fb-196. (fb-195 and fb-196 generate identical systems; in the 1.9.6 diff, however, I had modified the switching function manually to be turned off for another experiment).
Case study: opt-geo-batch-20, 18434815-6
The main differences between the 1.9.3 and 1.9.5 systems here are again at the level of precision
Minimizing the 1.9.5 system in the 1.9.3 environments, minimized to the 1.9.5 geometry
Minimizing the 1.9.3 systems in the 1.9.5 environment, minimized to the 1.9.3 geometry
Broad diagnosis:
I copied the ForceBalance opt-geo target to run-through one-by-one, and minimized each system (as output by an edited version of ForceBalance) in each environment.
Scripts:
Python:
Submission scripts:
Outputs:
ForceBalance 1.9.6 remaining differences
Batch | Target | fb-193 | fb-193-reordered | fb-195 | fb-196 | fb-196-crit |
---|---|---|---|---|---|---|
108 | 19095393-10 | 85.713 | 85.713 | 85.713 | 46.043 | 85.713 |
125 | 19095964-29 | 557.615 | 557.896 | 557.836 | 527.51 | 557.836 |
127 | 95602811-22 | 828.404 | 371.009 | 828.402 | 828.471 | 828.402 |
132 | 110312272-7 | 44.947 | 44.947 | 44.947 | 19.051 | 44.947 |
132 | 110312279-12 | 19.782 | 19.782 | 19.782 | 11.507 | 19.782 |
134 | 110312364-2 | 35.291 | 35.291 | 35.291 | 24.732 | 35.291 |
134 | 110312437-22 | 536.681 | 831.782 | 536.684 | 541.105 | 536.684 |
136 | 110312530-8 | 796.962 | 1886.145 | 1886.142 | 1885.838 | 1886.142 |
142 | 1760118-25 | 32.291 | 32.291 | 32.291 | 5.104 | 32.291 |
142 | 1760122-29 | 37.563 | 37.565 | 37.563 | 10.582 | 37.563 |
158 | 1760623-13 | 52.588 | 52.588 | 52.588 | 10.515 | 52.588 |
160 | 1760689-13 | 10.233 | 10.233 | 10.233 | 2.921 | 10.233 |
162 | 1760764-24 | 14.742 | 14.742 | 14.742 | 6.441 | 14.742 |
162 | 1760765-26 | 16.341 | 16.341 | 16.341 | 4.872 | 16.341 |
165 | 1760859-25 | 184.378 | 184.378 | 184.378 | 8.331 | 184.378 |
166 | 1760895-28 | 16.898 | 16.898 | 16.898 | 5.847 | 16.898 |
181 | 36784452-26 | 85.997 | 85.997 | 85.997 | 43.28 | 85.997 |
19 | 18434757-17 | 1352.305 | 2589.068 | 1744.044 | 1743.914 | 1744.044 |
20 | 18434815-6 | 83.565 | 83.565 | 126.754 | 126.789 | 126.754 |
20 | 18434816-9 | 71.022 | 71.022 | 169.281 | 169.253 | 169.281 |
21 | 18434921-7 | 1805.074 | 1389.509 | 1805.075 | 1805.138 | 1805.075 |
28 | 18435721-28 | 333.911 | 333.91 | 333.91 | 320.96 | 333.91 |
36 | 18436551-7 | 20.981 | 20.981 | 20.981 | 9.047 | 20.981 |
48 | 18437974-17 | 2097.572 | 442.1 | 442.121 | 441.091 | 442.121 |
5 | 18433236-22 | 654.976 | 819.474 | 654.976 | 654.977 | 654.976 |
51 | 18438135-12 | 794.575 | 1409.209 | 794.575 | 794.554 | 794.575 |
58 | 18438954-4 | 661.058 | 661.059 | 661.06 | 9.408 | 661.06 |
64 | 18439704-15 | 17.706 | 17.706 | 17.706 | 7.98 | 17.706 |
75 | 19094250-10 | 480.939 | 150.779 | 480.94 | 480.97 | 480.94 |
79 | 19094430-15 | 119.851 | 119.701 | 120.066 | 101.136 | 120.066 |
79 | 19094441-17 | 84.345 | 84.096 | 84.232 | 69.599 | 84.232 |
83 | 19094564-15 | 662.573 | 411.628 | 411.628 | 411.699 | 411.628 |
83 | 95602606-12 | 352.848 | 609.252 | 609.256 | 609.277 | 609.256 |
91 | 19094843-7 | 523.362 | 1019.271 | 523.364 | 523.564 | 523.364 |
96 | 19095011-20 | 1561.274 | 2592.095 | 1561.278 | 1561.667 | 1561.278 |
96 | 19095014-23 | 239.718 | 1177.13 | 239.723 | 239.679 | 239.723 |
Difference 5: optimization energy tolerance
In ForceBalance 1.9.6, the energy tolerance in geometry optimization was tweaked from a default of 1e-4 to 1e-2 :
# The unit of force is small, so using the same criterion as for energy minimization can lead to convergence failure.
if 'force' in LocalEnergyMinimizer.minimize.__doc__:
crit = max(crit, 1e-2)
else:
crit = max(crit, 1e-8)
Reverting this change (in fb-196-crit) gave back all the 1.9.5 results.
Case study: opt-geo-batch-58, 18438954-4
Optimizing the 1.9.5 system in 1.9.6 (with 1.9.5 energy tolerance) re-optimized to the 1.9.5 geometry
Optimizing the 1.9.5 system in 1.9.6 (with 1.9.6 energy tolerance) re-optimized to the 1.9.6 geometry
Optimizing the 1.9.6 system in 1.9.5 (with 1.9.5 energy tolerance) re-optimized to the 1.9.5 geometry
Optimizing the 1.9.6 system in 1.9.5 (with 1.9.6 energy tolerance) re-optimized to the 1.9.6 geometry
This change covered all cases in the 1.9.6 table above.