Benchmarking experimental fits

Background

Comparing the experimental fits (with interpolated parameters) that are created from openff-1.3.0 and trained on a dataset of 500+ molecules from Gen1, Gen2 datasets along with substituted phenyl and few others (listed here) that exclude:

benchmark set of molecules (link),
sterics with LJ energy greater than 3.4 kcal/mol
in-ring torsions are in the parameters 1a, 1b

Fit4: Parameters optimized

['TIG0', 'TIG1a', 'TIG1b'] - General torsions
['TIG1c', 'TIG1d', 'TIG2', 'TIG3', 'TIG4', 'TIG5a', 'TIG5b', 'TIG6', 'TIG7', 'TIG8'] - interpolated

Fit4.1: For each of the interpolated parameter a general torsion parameter is created where the central bond can be a single, aromatic or double bond. Due to lack of enough training data only a subset of those are trained and here are the parameters optimized.

Parameters optimized

['TIG0', ‘TIG1a', ‘TIG1b',
‘TIG3p’, ‘TIG3r', ‘TIG4p', ‘TIG5ap’, ‘TIG5bp’, ‘TIG1cp’, ‘TIG6p’, ‘TIG7p’, ‘TIG8p’, ‘TIG2p’, 'TIG2r’, 'TIG1dp’ ] - General torsions

So, fit4.1 has two parameters extra when compared to fit4, and all are general torsion parameters. The total number of torsion parameters in [openff-1.3.0, fit4, fit4.1] are [167, 170, 172] respectively. And the parameters that are replaced with interpolated parameters are [t43, t44, t45, t48, t69, t69a, t70d, t76, t77, t78] based on smarts patterns.

On the training set the objective function values are:

	Obj. function value
Fit 4 (interpolated)	881.18
Fit 4.1 (non-interpolated)	899.60
Openff-1.3.0 (iteration 0)	1292.62

Performance on Benchmark set of molecules

Taking these optimized forcefields (optimized only the newly introduced parameters, rest all torsion parameter values remain the same as 1.3.0), performance on the benchmark set of molecules is evaluated using Lim and Hahn’s set of scripts. The resulting plots are added below and the key takeaways are:

1.3.0 is better in TFD when compared to 1.2.0, and fit4, fit4.1
1.2.0 is slightly better in ddE, 1.3.0, fit4, fit4.1 almost coincide
1.2.0 is better in rmsd, followed by 1.3.0. Fits 4 and 4.1 are closer to each other, no significant difference

Benchmarking experimental fits

Background

Performance on Benchmark set of molecules

For clarity adding more plots with subset of data: