After using the bespoke workflow to optimize the torsions of some of the TYK2 ligands we want to establish how much the parameters have improved the overall description of the molecule. What's more, we also want to see how well the parameters fit molecule fragments transfer back to the parent molecule. In these tests, we hope to establish how the optimization has affected the force field before running FEP calculations.

Bespoke-fit

Each of the TYK2 molecules was fragmented using fragmenter. The WBO threshold was set to 0.03 and all non-rotor ring substituents were removed. A new smirks pattern was then made for each unique combination of 4 atoms running through the central bond fragmented around. 1D Torsion drives were then ran for each rotatable central bond on the public QCArchive using our default specification.

The smirks pattern included atoms up to 1 bond away from the 4 atoms in the dihedral to ensure that the patterns were transferable between fragments but were unique enough to not be miss applied. This is important as it ensures that smirks patterns are optimized in the presence of the parameters they will be used within production and allows self-consistent optimisation of interdependent torsions.

The Forcebalance target TorsionProfile was used with the L1 penalty. The upper energy limit was changed to 10 from 5 and the k value prior was set to 6. This is important as it allows the parameters to move sufficiently away from the starting values in cases where the original guess is poor. What's more due to the introduction of many more torsion terms (parsley uses just 4 k values compared to the 68 made in the bespoke workflow) this also stops the regularization term dominating the error and allows for a much better fit.

Molecule 7

This molecule is a particularly challenging case for the bespoke torsion workflow due to the use of fragmentation resulting in two very similar molecules.

Molecule 7

The difficulty here is that the bespoke smirks made for the fragments can easily be misapplied between the torsions. In this case, the bespoke torsion for the O=C-N-H torsion even with an extra layer of atoms is the same for both fragments. For example, the pattern is [#8AH0X1x0!r+0:1]=!@[#6AH0X3x0!r+0:2](-!@[#6aH0X3x2r6+0])-!@[#7AH1X3x0!r+0:3](-!@[#6aH0X3x2r6+0])-!@[#1AH0X1x0!r+0:4] and the the atoms matched by this in the fragments are shown below.

However, due to each unique dihedral passing through a central bond getting its own smirks the other dihedral terms are unique to the fragments and offer the extra degrees of freedom required to get a good bespoke fit for each fragment. In this case one of the other smirks is [#6aH0X3x2r6+0:1](:@[#6aH1X3x2r6+0])(:@[#6aH1X3x2r6+0])-!@[#6AH0X3x0!r+0:2](=!@[#8AH0X1x0!r+0])-!@[#7AH1X3x0!r+0:3](-!@[#1AH0X1x0!r+0])-!@[#6aH0X3x2r6+0:4](:@[#6aH1X3x2r6+0]):@[#7aH0X2x2r6+0] which hits the following atoms.

Validation

In order to validate any performance changes due to the refitting, a simple conformer energy ranking test was used. After running the bespoke workflow high-temperature MD was run on molecule 7 and each of the fitting fragments (using openff_unconstrained-1.0.0). 1000 confirmations were then extracted and the single point energies were calculated for all 3 molecules on a local QCArchive instance using multiple specifications (default(B3LYP-D3BJ/DZVP), openff-1.0.0/1.2.1/1.3.0 gaff-2.11 and ani2x, all mm specs were unconstrained.) The single point energies were then also recalculated using the bespoke forcefield output at the end of the bespoke workflow.

The Improvement of Openff

Here we see that OpenFF has consistently improved with each release in this test however Gaff-2.11 has the best performance across all of the molecules.

Molecule

Openff-1.0.0

Openff-1.2.1

Openff-1.3.0

Gaff-2.11

Molecule 7
Slope 0.546
RMSE 8.575 kcal/mol
Intercept 7.772
Std Error 0.014 kcal/mol
Slope 0.594
RMSE 8.398 kcal/mol
Intercept 6.090
Std Error 0.014 kcal/mol
Slope 0.655
RMSE 8.222 kcal/mol
Intercept 4.212
Std Error 0.014 kcal/mol
Slope 0.799
RMSE 4.964 kcal/mol
Intercept 4.384
Std Error 0.013 kcal/mol
Slope 0.550
RMSE 11.111 kcal/mol
Intercept 0.974
Std Error 0.013 kcal/mol
Slope 0.606
RMSE 10.086 kcal/mol
Intercept 0.573
Std Error 0.012 kcal/mol
Slope 0.674
RMSE 7.862 kcal/mol
Intercept 1.249
Std Error 0.012 kcal/mol
Slope 0.852
RMSE 7.381 kcal/mol
Intercept -2.805
Std Error 0.013 kcal/mol
Slope 0.623
RMSE 5.855 kcal/mol
Intercept 2.884
Std Error 0.014 kcal/mol
Slope 0.660
RMSE 4.595 kcal/mol
Intercept 3.942
Std Error 0.013 kcal/mol
Slope 0.718
RMSE 4.129 kcal/mol
Intercept 3.575
Std Error 0.013 kcal/mol
Slope 0.858
RMSE 3.045 kcal/mol
Intercept 3.135
Std Error 0.012 kcal/mol

Bespoke-fit set 1

Here we look at the performance of the bespoke-fit forcefield starting from each of the Openff generations.

Molecule

Openff-1.0.0

Openff-1.2.1

Openff-1.3.0

Molecule 7
Slope 0.593
RMSE 7.717 kcal/mol
Intercept 7.073
Std Error 0.013 kcal/mol
Slope 0.633
RMSE 7.568 kcal/mol
Intercept 5.683
Std Error 0.013
Slope 0.644
RMSE 7.736 kcal/mol
Intercept 5.132
Std Error 0.013
Slope 0.592
RMSE 10.139 kcal/mol
Intercept 0.833
Std Error 0.012 kcal/mol
Slope 0.635
RMSE 9.611 kcal/mol
Intercept 0.185
Std Error 0.011
Slope 0.652
RMSE 8.880 kcal/mol
Intercept 0.611
Std Error 0.011
Slope 0.651
RMSE 5.190 kcal/mol
Intercept 3.193
Std Error 0.013 kcal/mol
Slope 0.678
RMSE 4.257 kcal/mol
Intercept 4.140
Std Error 0.013
Slope 0.710
RMSE 4.246 kcal/mol
Intercept 3.416
Std Error 0.013

Validation set 2

Here we performed the same validation as above but this time starting from a trajectory again extracted from high temp MD but using the Gaff-2.11 forcefield. Due to the gaff barrier highlights being more accurate than that of Parsley we see that the range of relative energies is much smaller as we are trapped in a local minimum so this may need to be repeated from several starting locations.

Molecule

Openff-1.0.0

Openff-1.2.1

Openff-1.3.0

Gaff-2.11

Molecule 7
Slope 0.709
RMSE 3.804 kcal/mol
Intercept 3.828
Std Error 0.020
Slope 0.728
RMSE 3.518 kcal/mol
Intercept 4.506
Std Error 0.019
Slope 0.794
RMSE 3.775 kcal/mol
Intercept 4.776
Std Error 0.019
Slope 0.699
RMSE 3.231 kcal/mol
Intercept 6.139
Std Error 0.014
Slope 0.770
RMSE 5.295 kcal/mol
Intercept -1.999
Std Error 0.019
Slope 0.779
RMSE 5.166 kcal/mol
Intercept -2.052
Std Error 0.018
Slope 0.808
RMSE 4.714 kcal/mol
Intercept -1.856
Std Error 0.018
Slope 0.736
RMSE 3.489 kcal/mol
Intercept 0.158
Std Error 0.014
Slope 0.793
RMSE 3.883 kcal/mol
Intercept -0.776
Std Error 0.020
Slope 0.805
RMSE 3.937 kcal/mol
Intercept -1.095
Std Error 0.019
Slope 0.871
RMSE 3.237 kcal/mol
Intercept -0.818
Std Error 0.019
lope 0.783
RMSE 2.888 kcal/mol
Intercept -0.017
Std Error 0.013

Bespoke-fit set 2

Here we look at the performance of the bespoke-fit forcefield starting from each of the Openff generations and the second test set extracted from high-temperature MD using Gaff-2.11.

Molecule

Openff-1.0.0

Openff-1.2.1

Openff-1.3.0

Molecule 7
Slope 0.734
RMSE 3.664 kcal/mol
Intercept 3.900
Std Error 0.020
Slope 0.737
RMSE 3.425 kcal/mol
Intercept 4.251
Std Error 0.018
Slope 0.769
RMSE 3.469 kcal/mol
Intercept 4.523
Std Error 0.018
Slope 0.781
RMSE 4.906 kcal/mol
Intercept -1.697
Std Error 0.019
Slope 0.777
RMSE 4.481 kcal/mol
Intercept -1.254
Std Error 0.018
Slope 0.779
RMSE 4.058 kcal/mol
Intercept -0.767
Std Error 0.017
Slope 0.831
RMSE 3.069 kcal/mol
Intercept -0.014
Std Error 0.020
Slope 0.805
RMSE 3.937 kcal/mol
Intercept -1.095
Std Error 0.019
Slope 0.881
RMSE 2.745 kcal/mol
Intercept -0.225
Std Error 0.019

Validation set 3

Here we performed the same validation as above but this time starting from a trajectory extracted from an MD simulation with the k values for all rotatable torsions set to 0. This should allow the simulation to cross high energies barriers easily without any basis to the starting conformation.

Molecule

Openff-1.0.0

Openff-1.2.1

Openff-1.3.0

Gaff-2.11

Molecule 7
Slope 0.512
RMSE 16.901 kcal/mol
Intercept -3.202
Std Error 0.011
Slope 0.625
RMSE 15.132 kcal/mol
Intercept -4.510
Std Error 0.011
Slope 0.705
RMSE 13.141 kcal/mol
Intercept -4.526
Std Error 0.013

Slope 0.718
RMSE 12.359 kcal/mol
Intercept -4.176
Std Error 0.011
Slope 0.441
RMSE 16.249 kcal/mol
Intercept -1.064
Std Error 0.011
Slope 0.558
RMSE 14.058 kcal/mol
Intercept -1.969
Std Error 0.012

Slope 0.650
RMSE 9.674 kcal/mol
Intercept 0.253
Std Error 0.012

Slope 0.702
RMSE 9.835 kcal/mol
Intercept -1.509
Std Error 0.011

Slope 0.476
RMSE 13.882 kcal/mol
Intercept -2.333
Std Error 0.009
Slope 0.575
RMSE 11.564 kcal/mol
Intercept -2.139
Std Error 0.009
Slope 0.687
RMSE 9.988 kcal/mol
Intercept -2.924
Std Error 0.010
Slope 0.798
RMSE 5.571 kcal/mol
Intercept -0.715
Std Error 0.009

Using conformer specific charges set 2

I have also run the test on the gaff-2.11 generated conformers but used charges calculated specifically for the conformer.

Molecule

Openff-1.0.0

Openff-1.2.1

Openff-1.3.0

Average Charges

Charge std

Molecule 7

Using The TorsionDrive Trajectory

In the above fits, it was noted that future improvements in the performance could probably be hand by optimizing the bond and angle terms which were untouched during the first bespoke fit. One idea is to use the optimization trajectories extracted from the torsiondrive and fit to the relative energies of these conformers using the Abinitio forcebalance target. Following this, I have extracted the ~1700 single point calculations that make up the final torsion drive profile originally used for fitting and combined both fitting targets together. Here we allow the bond and angle force constants to optimize along with the bespoke torsion terms which are introduced as part of the bespoke workflow. Note that no bespoke bond and angle terms were introduced the parsley terms were just allowed to optimize.

Molecule

Openff-1.0.0

Molecule 7
Slope 0.576
RMSE 9.216 kcal/mol
Intercept 5.566
Std Error 0.014
Slope 0.559
RMSE 9.416 kcal/mol
Intercept 2.596
Std Error 0.012
Slope 0.632
RMSE 9.367 kcal/mol
Intercept -1.765
Std Error 0.013

Ani2x

Recently ani2x was used to correct free energy calculations and it was believed it was due to ani correcting the torsion potentials. To investigate this I have also performed the energy ranking test using ani2x.

Molecule

Ani2x

Molecule 7