Sage differences between FB 1.9.3, FB 1.9.5, and 1.9.6

1 Background
2 Summary
3 Investigation
- 3.1 Details
- 3.2 Environments:
4 ForceBalance 1.9.3 vs 1.9.5
5 Comparing remaining differences with reordering
- 5.1 Determining thresholds for investigation
6 ForceBalance 1.9.5 remaining differences
- 6.1 Difference 4: differences in precision
  - 6.1.1 Case study: opt-geo-batch-20, 18434815-6
7 ForceBalance 1.9.6 remaining differences
- 7.1 Difference 5: optimization energy tolerance
  - 7.1.1 Case study: opt-geo-batch-58, 18438954-4

Background

As our software stack updates, our goal is to update from the previous software stack for fitting (ForceBalance 1.9.3 + OpenFF Toolkit 0.10.x) to versions that support Interchange (ForceBalance 1.9.5 + OpenFF 0.13.x, to begin with). This page documents differences between fits using these two stack and diagnoses the causes. Please see Reproducibility for more information.

All files were uploaded to . As this is not intended to be a public work, however, documentation is minimal.

Summary

I attempted to figure out the reasons for all differences between FB 1.9.3, FB 1.9.5, and FB 1.9.6. While I refer to these environments under those names throughout this investigation, they are shorthand for the environments described in the section below, and most differences originate outside ForceBalance. These differences can be summed up below:

The biggest and most influential change was Difference 1: a bug where Interchange’s charge caching was mis-applying charges to subsequent conformers of the same molecule when the atom order changed.
In Difference 2, the geometries optimize to the same or very similar structures, but the objective functions are very different. This suggests that we could improve the ForceBalance target by using the best RMSD measure instead of just RMSD.
In Difference 3, re-ordering the forces in OpenMM systems can result in surprisingly large changes in geometry. From the OpenFF toolkit 0.10.x to Interchange we changed the order in which forces are output.
In Difference 4, differences in precision explain the remaining differences between the FB 1.9.3 and FB 1.9.5 stacks.
In Difference 5, all remaining differences between 1.9.5 and 1.9.6 can be explained by a minor change in tolerance in the ForceBalance geometry optimization.

Investigation

This next section below is not comprehensive in detail, and is partly a working document I used to keep track of what other differences to look at. This is meant to be largely a one-off investigation, although it suggests that having a suite of fitting comparisons for each new software stack may be a good idea. Likely this will be most use in the future in copying Python scripts and looking at individual targets for an idea of what their objective terms used to be.

Details

The inputs used are those available at .

The environments used are specified in the attached input files. For each zipped experiment, the full environment is included.

Environments:

fb-193: fb-193-tk-010-oe-2022 environment
- ForceBalance 1.9.3
- OpenFF Toolkit 0.10.4
fb-193-reordered: fb-193-tk-010-oe-2022-reordered environment
- The Toolkit code was edited to output forces in the same order as Interchange
- create_openmm_systems was edited here:
- return_topology = kwargs.pop("return_topology", False) # 2023-11-22 edit original = self._parameter_handlers order = [ 'vdW', 'Electrostatics', 'LibraryCharges', 'ToolkitAM1BCC', 'ProperTorsions', 'ImproperTorsions', 'Angles', 'Bonds', 'Constraints' ] self._parameter_handlers = {} for key in order: self._parameter_handlers[key] = original[key] for key in original: if key not in order: self._parameter_handlers[key] = original[key] # end 2023-11-22 # Make a deep copy of the topology so we don't accidentally modify it topology = copy.deepcopy(topology)
fb-195: fb-195-tk-013-oe-2022-interchange-replace-cache environment
- Interchange 0.3.10, edited:
  - to modify the charge caching
  - to turn off the switching function, as this was a systematic difference between environments ( )
fb-195-switching: basically the fb-195-tk-013-oe-2022-interchange-replace-cache environment
- Interchange 0.3.10, edited:
  - to modify the charge caching
  - to turn switching back on
fb-196: fb-196-ic-0318-oe-2022-crit environment
- ForceBalance 1.9.6
- Interchange 0.3.18
fb-196-crit: fb-196-ic-0318-oe-2022-crit environment
- ForceBalance 1.9.6, with crit modified back to earlier 1e-4 version.
- Everything else the same as fb-196

ForceBalance 1.9.3 vs 1.9.5

Difference 1: Switching to Interchange with cached atom charges

The vast majority of differences originated from the opt-geo-target fits and were because of this bug:

Running a single-point calculation of the entire target set brought the objective function differences down substantially:

ForceBalance	OpenFF Toolkit	Interchange	Total objective function	Time for calculation

ForceBalance	OpenFF Toolkit	Interchange	Total objective function	Time for calculation
1.9.3	0.10.4	N/A	10926.15026275	1462.2 s
1.9.5	0.13.0	0.3.10 + charge caching modification	10921.48679230	920.6 s
1.9.6	0.14.5	0.3.18	10909.6117273	2915.5 s
1.9.6 (crit switched to 1.9.5 version)	0.14.5	0.3.18	10921.48679230	963.2 s

Remaining differences after Difference 1

Investigating individual targets highlighted these differences, still all geometry optimization targets:

Batch	Target name	Term_193	Term_195	Diff
opt-geo-batch-109	19095414-4	973.288	678.565	-294.723
opt-geo-batch-113	19095588-8	800.755	488.803	-311.952
opt-geo-batch-123	19095890-3	432.754	424.808	-7.946
opt-geo-batch-136	110312530-8	796.962	1886.142	1089.18
opt-geo-batch-19	18434757-17	1352.305	1744.044	391.739
opt-geo-batch-20	18434815-6	83.565	126.754	43.189
opt-geo-batch-20	18434816-9	71.022	169.281	98.259
opt-geo-batch-48	18437974-17	2097.572	442.121	-1655.451
opt-geo-batch-71	19094129-20	470.715	117.418	-353.297
opt-geo-batch-78	19094395-8	327.768	862.083	534.315
opt-geo-batch-83	95602606-12	352.848	609.256	256.408
opt-geo-batch-83	19094564-15	662.573	411.628	-250.945

(also attached: , where single_point_all compares all targets, and single_point_differences highlights all differences over 1. These CSVs also include the associated SMILES which were too long to paste here).

Pasted below is the RMSD comparisons of all targets with different objective functions. The first three values columns show atom-to-atom RMSD (via MDAnalysis’s rmsd); the last three atoms show symmetry-corrected RMSDs (via rdkit's AllChem.GetBestRMS).

Opt batch	Target	1.9.3-QM	1.9.5-QM	1.9.3-1.9.5	1.9.3-QM Best	1.9.5-QM Best	1.9.3–1.9.5 Best

Opt batch	Target	1.9.3-QM	1.9.5-QM	1.9.3-1.9.5	1.9.3-QM Best	1.9.5-QM Best	1.9.3–1.9.5 Best
48	18437974-17	0.9	0.75	0.55	0.69	0.69	0
71	19094129-20	1.31	0.91	0.79	1.26	0.91	0.7
113	19095588-8	3.02	1.85	2.25	2.98	1.83	2.18
109	19095414-4	1.64	1.12	1.19	1.64	1.11	1.18
83	19094564-15	1.69	1.69	0.21	1.68	1.69	0.21
123	19095890-3	2.26	2.26	0.27	2.26	2.26	0.27
20	18434815-6	0.96	0.99	0.26	0.96	0.99	0.26
20	18434816-9	0.96	0.97	0.26	0.96	0.97	0.26
83	95602606-12	1.56	1.55	0.22	1.56	1.55	0.22
19	18434757-17	1.73	1.78	0.31	1.67	1.68	0.3
78	19094395-8	4	4	0.37	4	4	0
136	110312530-8	1.09	1.13	0.69	1.09	1.09	0

Difference 2: Different interpretations of internal coordinate RMSDs

The yellow rows highlighted apparently optimize to structures with symmetry-corrected RMSDs of 0, but with atom-wise RMSD values that are somewhat high. I did not look into this further as the objectives weren’t systematically better or worse. However, this suggests that the FB target should do its best to look for the best symmetry-corrected RMSD and do its best.

Difference 3: Section ordering

In several cases, re-ordering the output of the XML changed the optimization and objective function substantially. These are highlighted in blue above.

Case study: opt-geo-batch-113: 19095588-8

Wrote intermediate OpenMM XMLs out of the OpenMM system (by modifying ForceBalance to do so)
Running through the ForceBalance target, the OpenMM XML written out by FB 1.9.3 minimizes to the 1.9.3 structure, in the 1.9.5 environment; the XML written by FB 1.9.5/Interchange minimizes to the 1.9.5 structure, in the 1.9.3 environment
In Interchange, the forces are ordered: NBForce, Torsions, Angles, Bonds
In 1.9.3, the forces are ordered: Torsions, NBForce, Bonds, Angles
I can get the 1.9.3 environment to reproduce the 1.9.5 objective by reordering the OpenMM system creation using the Interchange order, i.e. using the fb-193-reordered environment to write forces out like Interchange.

Comparing remaining differences with reordering

I have highlighted in gray the differences between these four systems where the more modern stacks (FB 1.9.5, 1.9.6) align with either fb-193 or fb-193-reordered. As the order of the forces is the only difference between the 193 stacks, to me this implies that these systems are prone to different optimization pathways given the slightest difference, so they weren’t investigated further.

Batch	Target	fb-193	fb-193-reordered	fb-195	fb-196

Batch	Target	fb-193	fb-193-reordered	fb-195	fb-196
108	19095393-10	85.713	85.713	85.713	46.043
109	19095414-4	973.288	678.567	678.565	678.593
113	19095588-8	800.755	488.803	488.803	488.806
115	19095648-2	241.176	592.293	241.176	241.209
123	19095890-3	432.754	859.172	424.808	424.811
125	19095964-29	557.615	557.896	557.836	527.51
127	95602811-22	828.404	371.009	828.402	828.471
131	95602486-3	169.287	395.607	169.287	169.288
132	110312272-7	44.947	44.947	44.947	19.051
132	110312279-12	19.782	19.782	19.782	11.507
134	110312364-2	35.291	35.291	35.291	24.732
134	110312437-22	536.681	831.782	536.684	541.105
136	110312530-8	796.962	1886.145	1886.142	1885.838
142	1760118-25	32.291	32.291	32.291	5.104
142	1760122-29	37.563	37.565	37.563	10.582
158	1760623-13	52.588	52.588	52.588	10.515
160	1760689-13	10.233	10.233	10.233	2.921
162	1760764-24	14.742	14.742	14.742	6.441
162	1760765-26	16.341	16.341	16.341	4.872
165	1760859-25	184.378	184.378	184.378	8.331
166	1760895-28	16.898	16.898	16.898	5.847
181	36784452-26	85.997	85.997	85.997	43.28
19	18434757-17	1352.305	2589.068	1744.044	1743.914
20	18434815-6	83.565	83.565	126.754	126.789
20	18434816-9	71.022	71.022	169.281	169.253
21	18434921-7	1805.074	1389.509	1805.075	1805.138
28	18435721-28	333.911	333.91	333.91	320.96
36	18436551-7	20.981	20.981	20.981	9.047
48	18437974-17	2097.572	442.1	442.121	441.091
5	18433236-22	654.976	819.474	654.976	654.977
51	18438135-12	794.575	1409.209	794.575	794.554
51	18438149-18	100.882	108.452	100.882	100.882
58	18438954-4	661.058	661.059	661.06	9.408
64	18439704-15	17.706	17.706	17.706	7.98
71	19094129-20	470.715	470.715	117.418	117.415
75	19094250-10	480.939	150.779	480.94	480.97
78	19094395-8	327.768	327.768	862.083	862.083
79	19094430-15	119.851	119.701	120.066	101.136
79	19094441-17	84.345	84.096	84.232	69.599
83	19094564-15	662.573	411.628	411.628	411.699
83	95602606-12	352.848	609.252	609.256	609.277
91	19094843-7	523.362	1019.271	523.364	523.564
96	19095011-20	1561.274	2592.095	1561.278	1561.667
96	19095014-23	239.718	1177.13	239.723	239.679

Determining thresholds for investigation

It is common for structures to optimize to very slightly different geometries, especially due to differences in precision. I found the following trends comparing the 1.9.3 environment with 1.9.5:

Difference in objective term	RMSD difference	Best RMSD difference

Difference in objective term	RMSD difference	Best RMSD difference
0.000-0.100	8e-08 to 6e-04	0e+00 to 6e-04
0.100-0.200	6e-05 to 8e-04	6e-05 to 8e-04
0.200-0.300	5e-04 to 8e-04	5e-04 to 8e-04
0.300-0.400	4e-04 to 4e-04	4e-04 to 4e-04
7.500-7.600	1e-01 to 1e-01	1e-01 to 1e-01

ForceBalance 1.9.5 remaining differences

Batch	Target	fb-193	fb-193-reordered	fb-195	fb-196

Batch	Target	fb-193	fb-193-reordered	fb-195	fb-196
123	19095890-3	432.754	859.172	424.808	424.811
19	18434757-17	1352.305	2589.068	1744.044	1743.914
20	18434815-6	83.565	83.565	126.754	126.789
20	18434816-9	71.022	71.022	169.281	169.253
71	19094129-20	470.715	470.715	117.418	117.415
78	19094395-8	327.768	327.768	862.083	862.083

Difference 4: differences in precision

This difference, as detailed in the case study below, originates from minor errors in precision between the systems. Every difference in the above remaining table is explained by this difference. An example of the differences is on GitHub at . This file compares the case study system below (opt-geo-batch-20, 18434815-6) generated from fb-193-reordered and compared to fb-196. (fb-195 and fb-196 generate identical systems; in the 1.9.6 diff, however, I had modified the switching function manually to be turned off for another experiment).

Case study: opt-geo-batch-20, 18434815-6

The main differences between the 1.9.3 and 1.9.5 systems here are again at the level of precision
Minimizing the 1.9.5 system in the 1.9.3 environments, minimized to the 1.9.5 geometry
Minimizing the 1.9.3 systems in the 1.9.5 environment, minimized to the 1.9.3 geometry

Broad diagnosis:

I copied the ForceBalance opt-geo target to run-through one-by-one, and minimized each system (as output by an edited version of ForceBalance) in each environment.

Scripts:

Python:
Submission scripts:
Outputs:

ForceBalance 1.9.6 remaining differences

Batch	Target	fb-193	fb-193-reordered	fb-195	fb-196	fb-196-crit

Batch	Target	fb-193	fb-193-reordered	fb-195	fb-196	fb-196-crit
108	19095393-10	85.713	85.713	85.713	46.043	85.713
125	19095964-29	557.615	557.896	557.836	527.51	557.836
127	95602811-22	828.404	371.009	828.402	828.471	828.402
132	110312272-7	44.947	44.947	44.947	19.051	44.947
132	110312279-12	19.782	19.782	19.782	11.507	19.782
134	110312364-2	35.291	35.291	35.291	24.732	35.291
134	110312437-22	536.681	831.782	536.684	541.105	536.684
136	110312530-8	796.962	1886.145	1886.142	1885.838	1886.142
142	1760118-25	32.291	32.291	32.291	5.104	32.291
142	1760122-29	37.563	37.565	37.563	10.582	37.563
158	1760623-13	52.588	52.588	52.588	10.515	52.588
160	1760689-13	10.233	10.233	10.233	2.921	10.233
162	1760764-24	14.742	14.742	14.742	6.441	14.742
162	1760765-26	16.341	16.341	16.341	4.872	16.341
165	1760859-25	184.378	184.378	184.378	8.331	184.378
166	1760895-28	16.898	16.898	16.898	5.847	16.898
181	36784452-26	85.997	85.997	85.997	43.28	85.997
19	18434757-17	1352.305	2589.068	1744.044	1743.914	1744.044
20	18434815-6	83.565	83.565	126.754	126.789	126.754
20	18434816-9	71.022	71.022	169.281	169.253	169.281
21	18434921-7	1805.074	1389.509	1805.075	1805.138	1805.075
28	18435721-28	333.911	333.91	333.91	320.96	333.91
36	18436551-7	20.981	20.981	20.981	9.047	20.981
48	18437974-17	2097.572	442.1	442.121	441.091	442.121
5	18433236-22	654.976	819.474	654.976	654.977	654.976
51	18438135-12	794.575	1409.209	794.575	794.554	794.575
58	18438954-4	661.058	661.059	661.06	9.408	661.06
64	18439704-15	17.706	17.706	17.706	7.98	17.706
75	19094250-10	480.939	150.779	480.94	480.97	480.94
79	19094430-15	119.851	119.701	120.066	101.136	120.066
79	19094441-17	84.345	84.096	84.232	69.599	84.232
83	19094564-15	662.573	411.628	411.628	411.699	411.628
83	95602606-12	352.848	609.252	609.256	609.277	609.256
91	19094843-7	523.362	1019.271	523.364	523.564	523.364
96	19095011-20	1561.274	2592.095	1561.278	1561.667	1561.278
96	19095014-23	239.718	1177.13	239.723	239.679	239.723

Difference 5: optimization energy tolerance

In ForceBalance 1.9.6, the energy tolerance in geometry optimization was tweaked from a default of 1e-4 to 1e-2 :

        # The unit of force is small, so using the same criterion as for energy minimization can lead to convergence failure.
        if 'force' in LocalEnergyMinimizer.minimize.__doc__:
            crit = max(crit, 1e-2)
        else:
            crit = max(crit, 1e-8)

Reverting this change (in fb-196-crit) gave back all the 1.9.5 results.

Case study: opt-geo-batch-58, 18438954-4

Optimizing the 1.9.5 system in 1.9.6 (with 1.9.5 energy tolerance) re-optimized to the 1.9.5 geometry
Optimizing the 1.9.5 system in 1.9.6 (with 1.9.6 energy tolerance) re-optimized to the 1.9.6 geometry
Optimizing the 1.9.6 system in 1.9.5 (with 1.9.5 energy tolerance) re-optimized to the 1.9.5 geometry
Optimizing the 1.9.6 system in 1.9.5 (with 1.9.6 energy tolerance) re-optimized to the 1.9.6 geometry
This change covered all cases in the 1.9.6 table above.