Fitting TIG* parameters

Simpler Fits

Simple_fit1

Replacing t43, t44, t45 with an interpolated torsion parameter for the smarts pattern: โ€œ[*:1]~[#6X3:2]~[#6X3:3]~[*:4]โ€ and fitting this FF to 170 targets from 14 datasets (listed below under fit0) that have a dihedral that matches this pattern. The objective function value is compared to the zeroth iteration of various other FFs and fits:

FF

X2 (obj. fn value)

FF

X2 (obj. fn value)

simple_fit1

1.656500e+02

openff_unconstrained-1.3.0

1.97777e+02

openff_unconstrained-1.2.0

2.13936e+02

fit4

1.71601e+02

fit4.1

1.83293e+02

ย 

ย 

Fit7

With slight changes to Fit4, TIG0 is converted to an interpolated parameter, and TIG1a and TIG1b are removed. Number of targets 928. Starting parameters are from simple_fit1 and fit4 values. Here are the parameters optimized:

['TIG0', 'TIG1c', 'TIG1d', 'TIG2', 'TIG3', 'TIG4', 'TIG5a', 'TIG5b', 'TIG6', 'TIG7', 'TIG8'] - interpolated

FF

X2 (obj. fn value)

FF

X2 (obj. fn value)

Fit7

1.547049e+03

Fit4 (zeroth iter)

1.65230e+03

simple_fit1 (zeroth iter)

1.96850e+03

openff_1.3.0

2.00101e+03

ย 

ย 

Fit4: Parameters optimized

  • ['TIG0', 'TIG1a', 'TIG1b'] - General torsions

  • ['TIG1c', 'TIG1d', 'TIG2', 'TIG3', 'TIG4', 'TIG5a', 'TIG5b', 'TIG6', 'TIG7', 'TIG8'] - interpolated

On the training set the objective function values are:

ย 

Obj. function value

ย 

Obj. function value

Fit 4 (interpolated)

881.18

Fit 4.1 (non-interpolated)

899.60

Openff-1.3.0 (iteration 0)

1292.62

Fit 7 (iteration 0)

909.87

Fit4.1: For each of the interpolated parameter a general torsion parameter is created where the central bond can be a single, aromatic or double bond (denoted by letters p,q,r at the end of parameter id). Due to lack of enough training data that match those patterns only a subset of those are trained and here are the parameters optimized.

Parameters optimized

  • ['TIG0', โ€˜TIG1a', โ€˜TIG1b',
    โ€˜TIG3pโ€™, โ€˜TIG3r', โ€˜TIG4p', โ€˜TIG5apโ€™, โ€˜TIG5bpโ€™, โ€˜TIG1cpโ€™, โ€˜TIG6pโ€™, โ€˜TIG7pโ€™, โ€˜TIG8pโ€™, โ€˜TIG2pโ€™, 'TIG2rโ€™, 'TIG1dpโ€™ ] - General torsions

Fit 0

Input FF:

Parameters to optimize:

  • ['TIG0', 'TIG1a', 'TIG1b'] - General torsions

  • ['TIG1c', 'TIG1d', 'TIG2', 'TIG3', 'TIG4', 'TIG5a', 'TIG5b', 'TIG6', 'TIG7', 'TIG8'] - interpolated

Targets:

  1. 'Fragment Stability Benchmark'

  2. 'OpenFF Gen 2 Torsion Set 1 Roche 2'

  3. 'OpenFF Gen 2 Torsion Set 2 Coverage 2'

  4. 'OpenFF Gen 2 Torsion Set 3 Pfizer Discrepancy 2'

  5. 'OpenFF Gen 2 Torsion Set 4 eMolecules Discrepancy 2'

  6. 'OpenFF Gen 2 Torsion Set 5 Bayer 2'

  7. 'OpenFF Gen 2 Torsion Set 6 Supplemental 2'

  8. 'OpenFF Group1 Torsions'

  9. 'OpenFF Group1 Torsions 2'

  10. 'OpenFF Group1 Torsions 3'

  11. 'OpenFF Rowley Biaryl v1.0'

  12. 'OpenFF Substituted Phenyl Set 1'

  13. 'OpenFF-benchmark-ligand-fragments-v1.0'

  14. 'SMIRNOFF Coverage Torsion Set 1'

Total number of targets excluding Lim Mobley benchmarks = 2746

QCA tdr_objects to exclude are in this file

Fit 1

Input FF:

Without excluding the in-ring torsions

Parameters to optimize:

  • ['TIG0'] - General torsion

  • ['TIG1c', 'TIG1d', 'TIG2', 'TIG3', 'TIG4', 'TIG5a', 'TIG5b', 'TIG6', 'TIG7', 'TIG8'] - interpolated

Targets: same as in Fit 0

ย 

Fit 2

Breaking up the interpolated parameters into single, aromatic and double (wherever possible) bond general torsion terms. Naming these as extensions of earlier TIG parameters appended by p, q, r for single, aromatic and double bonds repsectively. Wherever a carbonyl carbon is implied on the central bond there are no central double bonds, so not all parameters will have โ€˜r' extension. Excluding the high torsion barrier filters TIG1a, 1b so that double and aromatic bonds wonโ€™t get filtered.

Input FF:

Parameters to optimize:

  • ['TIG0', โ€˜TIG1cp', โ€˜TIG1cq', โ€˜TIG1dp', โ€˜TIG1dq', โ€˜TIG1dr', โ€˜TIG2p', โ€˜TIG2qโ€™, โ€˜TIG2rโ€™, โ€˜TIG3pโ€™, โ€˜TIG3qโ€™, โ€˜TIG3rโ€™, โ€˜TIG4pโ€™, โ€˜TIG4qโ€™, โ€˜TIG4rโ€™, โ€˜TIG5apโ€™, โ€˜TIG5aqโ€™, โ€˜TIG5bpโ€™, โ€˜TIG5bqโ€™, โ€˜TIG5brโ€™, โ€˜TIG6pโ€™, โ€˜TIG6qโ€™, โ€˜TIG6rโ€™, โ€˜TIG7pโ€™, โ€˜TIG7qโ€™, โ€˜TIG7rโ€™, โ€˜TIG8pโ€™, 'TIG8qโ€™ ]

Targets: same as in Fit 0

Fit 3

Corrected the phase of non-interpolated parameters (from Fit 2)

ย 

Results of fits

Objective fn.

Full

Fit 0: TIG*

5.4766E+03

Fit 1: TIG* without filtering ring-torsions

5.4807E+03

Fit 2: non-interpolated with 2 phases

3.0181E+05

Fit 3: non-interpolated with 1 phase

6.6621e+03

Chayaโ€™s dataset only using fit0-FF

3.2455E+02

OpenFF_1.3.0 (Iter 0 on TIG dataset)

5.9620E+03

ย 

ย 

Iter 0 with CN, or CC central bonds only

ย 

CN only TIGs [1a, 1c, 1d, 2, 6, 7, 8] + [t43, 44, 45]

5.4844E+03

CC only TIGs [0, 1b, 3, 4, 5a, 5b] + [t69, 69a, 76, 77, 78]

5.8771E+03

Fit0 is better than 1.3.0 from the objective function values in the above table. Among CN and CC central bonds, CN has a lower objective function value and thus effect of CC is more dominant on the overall objective function.

Comparing MM Fits 0, 3 and 1.3.0 with QM

Fit 0 with all the TIG* parameters, and fit 3 is the non-interpolated version i.e., interpolated TIG params split into single, double and aromatic terms, compared with 1.3.0_unconstrained, and QM data.

Comparison is done on the training set of molecules, removing the ones with in-ring torsions and sorting the table based on the average of absolute difference in conformer energies between QM and MM_fit0. A full list of molecules sorted in ascending order of (QM - MM_fit0) can be seen at wbointerpolation/compare_forcefields.ipynb at main ยท MobleyLab/wbointerpolation

Here is a list of top 5 molecules that are in very good agreement with the QM energies for the fit0 interpolated parameters FF:

ย 



Torsion ID

Avg. abs(QM - MM_fit0) kcal/mol

Avg. abs(QM - MM_fit3) kcal/mol

Avg. abs(QM - MM_1.3.0) kcal/mol

Chemical Structure

QM-MM relative energies

491

{'tid': '1762178', 'assigned_params': {'fit0': 'TIG3', 'fit3': 'TIG3p', 'openff_unconstrained-1.3.0': 't47'}}

0.023835

0.405296

0.866851

6

{'tid': '21272427', 'assigned_params': {'fit0': 'TIG4', 'fit3': 'TIG4p', 'openff_unconstrained-1.3.0': 't43'}}

0.051240

0.397015

0.125842

76

{'tid': '21272438', 'assigned_params': {'fit0': 'TIG5b', 'fit3': 'TIG5bp', 'openff_unconstrained-1.3.0': 't43'}}

0.062916

0.274251

0.597345

628

{'tid': '21272422', 'assigned_params': {'fit0': 'TIG5b', 'fit3': 'TIG5bp', 'openff_unconstrained-1.3.0': 't43'}}

0.070763

9.416926

0.761913

626

{'tid': '21540566', 'assigned_params': {'fit0': 'TIG4', 'fit3': 'TIG4p', 'openff_unconstrained-1.3.0': 't43'}}

0.075898

0.410239

0.109622

ย 

Here is a list of last 5 molecules that have a higher difference in averaged MM energy with fit0 compared to QM:

ย 

Torsion ID

Avg. abs(QM - MM_fit0) kcal/mol

Avg. abs(QM - MM_fit3) kcal/mol

Avg. abs(QM - MM_1.3.0) kcal/mol

Chemical Structure

QM-MM relative energies

573

{'tid': '2703638', 'assigned_params': {'fit0': 'TIG3', 'fit3': 'TIG3p', 'openff_unconstrained-1.3.0': 't48'}}

5.264862

4.419863

4.908392

121

{'tid': '2703078', 'assigned_params': {'fit0': 'TIG2', 'fit3': 'TIG2r', 'openff_unconstrained-1.3.0': 't77'}}

5.694254

6.126509

6.142370

832

{'tid': '4269709', 'assigned_params': {'fit0': 'TIG3', 'fit3': 'TIG3p', 'openff_unconstrained-1.3.0': 't43'}}

6.086503

8.061331

6.024023

812

{'tid': '21272420', 'assigned_params': {'fit0': 'TIG4', 'fit3': 'TIG4p', 'openff_unconstrained-1.3.0': 't47'}}

6.263980

6.102699

6.772570

532

{'tid': '19953581', 'assigned_params': {'fit0': 'TIG3', 'fit3': 'TIG3p', 'openff_unconstrained-1.3.0': 't43'}}

6.369817

7.591619

5.529416





ย