/
TYK2 Refit Validation

TYK2 Refit Validation

After using the bespoke workflow to optimize the torsions of some of the TYK2 ligands we want to establish how much the parameters have improved the overall description of the molecule. What's more, we also want to see how well the parameters fit molecule fragments transfer back to the parent molecule. In these tests, we hope to establish how the optimization has affected the force field before running FEP calculations.

Bespoke-fit

Each of the TYK2 molecules was fragmented using fragmenter. The WBO threshold was set to 0.03 and all non-rotor ring substituents were removed. A new smirks pattern was then made for each unique combination of 4 atoms running through the central bond fragmented around. 1D Torsion drives were then ran for each rotatable central bond on the public QCArchive using our default specification.

The smirks pattern included atoms up to 1 bond away from the 4 atoms in the dihedral to ensure that the patterns were transferable between fragments but were unique enough to not be miss applied. This is important as it ensures that smirks patterns are optimized in the presence of the parameters they will be used within production and allows self-consistent optimisation of interdependent torsions.

The Forcebalance target TorsionProfile was used with the L1 penalty. The upper energy limit was changed to 10 from 5 and the k value prior was set to 6. This is important as it allows the parameters to move sufficiently away from the starting values in cases where the original guess is poor. What's more due to the introduction of many more torsion terms (parsley uses just 4 k values compared to the 68 made in the bespoke workflow) this also stops the regularization term dominating the error and allows for a much better fit.

Molecule 7

This molecule is a particularly challenging case for the bespoke torsion workflow due to the use of fragmentation resulting in two very similar molecules.

Molecule 7

The difficulty here is that the bespoke smirks made for the fragments can easily be misapplied between the torsions. In this case, the bespoke torsion for the O=C-N-H torsion even with an extra layer of atoms is the same for both fragments. For example, the pattern is [#8AH0X1x0!r+0:1]=!@[#6AH0X3x0!r+0:2](-!@[#6aH0X3x2r6+0])-!@[#7AH1X3x0!r+0:3](-!@[#6aH0X3x2r6+0])-!@[#1AH0X1x0!r+0:4] and the the atoms matched by this in the fragments are shown below.

However, due to each unique dihedral passing through a central bond getting its own smirks the other dihedral terms are unique to the fragments and offer the extra degrees of freedom required to get a good bespoke fit for each fragment. In this case one of the other smirks is [#6aH0X3x2r6+0:1](:@[#6aH1X3x2r6+0])(:@[#6aH1X3x2r6+0])-!@[#6AH0X3x0!r+0:2](=!@[#8AH0X1x0!r+0])-!@[#7AH1X3x0!r+0:3](-!@[#1AH0X1x0!r+0])-!@[#6aH0X3x2r6+0:4](:@[#6aH1X3x2r6+0]):@[#7aH0X2x2r6+0] which hits the following atoms.

Validation

In order to validate any performance changes due to the refitting, a simple conformer energy ranking test was used. After running the bespoke workflow high-temperature MD was run on molecule 7 and each of the fitting fragments (using openff_unconstrained-1.0.0). 1000 confirmations were then extracted and the single point energies were calculated for all 3 molecules on a local QCArchive instance using multiple specifications (default(B3LYP-D3BJ/DZVP), openff-1.0.0/1.2.1/1.3.0 gaff-2.11 and ani2x, all mm specs were unconstrained.) The single point energies were then also recalculated using the bespoke forcefield output at the end of the bespoke workflow.

The Improvement of Openff

Here we see that OpenFF has consistently improved with each release in this test however Gaff-2.11 has the best performance across all of the molecules.

  • All force fields agree with QM for the parent molecule on the lowest energy structure.

  • All force fields disagree with QM on fragment 1 on the lowest energy structure

  • OpenFF-1.0.0 is the only force field to disagree with QM on the lowest energy structure this is however corrected in 1.2.1 and 1.30.

Molecule

Openff-1.0.0

Openff-1.2.1

Openff-1.3.0

Gaff-2.11

Molecule

Openff-1.0.0

Openff-1.2.1

Openff-1.3.0

Gaff-2.11

Molecule 7
Slope 0.546 RMSE 8.575 kcal/mol Intercept 7.772 Std Error 0.014 kcal/mol
Slope 0.594 RMSE 8.398 kcal/mol Intercept 6.090 Std Error 0.014 kcal/mol
Slope 0.655 RMSE 8.222 kcal/mol Intercept 4.212 Std Error 0.014 kcal/mol
Slope 0.799 RMSE 4.964 kcal/mol Intercept 4.384 Std Error 0.013 kcal/mol
Slope 0.550 RMSE 11.111 kcal/mol Intercept 0.974 Std Error 0.013 kcal/mol
Slope 0.606 RMSE 10.086 kcal/mol Intercept 0.573 Std Error 0.012 kcal/mol
Slope 0.674 RMSE 7.862 kcal/mol Intercept 1.249 Std Error 0.012 kcal/mol
Slope 0.852 RMSE 7.381 kcal/mol Intercept -2.805 Std Error 0.013 kcal/mol
Slope 0.623 RMSE 5.855 kcal/mol Intercept 2.884 Std Error 0.014 kcal/mol
Slope 0.660 RMSE 4.595 kcal/mol Intercept 3.942 Std Error 0.013 kcal/mol
Slope 0.718 RMSE 4.129 kcal/mol Intercept 3.575 Std Error 0.013 kcal/mol
Slope 0.858 RMSE 3.045 kcal/mol Intercept 3.135 Std Error 0.012 kcal/mol

Bespoke-fit set 1

Here we look at the performance of the bespoke-fit forcefield starting from each of the Openff generations.

  • We see that the optimizations of torsions alone in the bespoke workflow are able to consistently improve the performance of the base-line force field.

  • Optimization of torsions alone seems to give around the same performance improvement as a forcefield release.

  • Improvements in the fragment performance seem to transfer well into the parent molecule.

  • Other factors such as bond/angle and improper parameters are also affecting the performance.

  • can we get further performance improvements by also refitting improper torsions directly linked to the targeted rotatable bond?

  • openff-1.3.0 seems to already be very good for these molecules with respect to the torsion terms as we see a negligible improvement with the torsion optimization.

  • we could also see some improvement by also fitting to the optimized geometries and vibrational frequencies of the fragments.

Molecule

Openff-1.0.0

Openff-1.2.1

Openff-1.3.0

Molecule

Openff-1.0.0

Openff-1.2.1

Openff-1.3.0

Molecule 7
Slope 0.593 RMSE 7.717 kcal/mol Intercept 7.073 Std Error 0.013 kcal/mol
Slope 0.633 RMSE 7.568 kcal/mol Intercept 5.683 Std Error 0.013
Slope 0.644 RMSE 7.736 kcal/mol Intercept 5.132 Std Error 0.013
Slope 0.592 RMSE 10.139 kcal/mol Intercept 0.833 Std Error 0.012 kcal/mol
Slope 0.635 RMSE 9.611 kcal/mol Intercept 0.185 Std Error 0.011
Slope 0.652 RMSE 8.880 kcal/mol Intercept 0.611 Std Error 0.011
Slope 0.651 RMSE 5.190 kcal/mol Intercept 3.193 Std Error 0.013 kcal/mol
Slope 0.678 RMSE 4.257 kcal/mol Intercept 4.140 Std Error 0.013
Slope 0.710 RMSE 4.246 kcal/mol Intercept 3.416 Std Error 0.013

 

Validation set 2

Here we performed the same validation as above but this time starting from a trajectory again extracted from high temp MD but using the Gaff-2.11 forcefield. Due to the gaff barrier highlights being more accurate than that of Parsley we see that the range of relative energies is much smaller as we are trapped in a local minimum so this may need to be repeated from several starting locations.

  • We see all forcefields performance decrease with these lower energy conformers

  • All forcefields now do not agree with QM on the lowest energy conformer for the fragment molecules

Molecule

Openff-1.0.0

Openff-1.2.1

Openff-1.3.0

Gaff-2.11

Molecule

Openff-1.0.0

Openff-1.2.1

Openff-1.3.0

Gaff-2.11

Molecule 7
Slope 0.709 RMSE 3.804 kcal/mol Intercept 3.828 Std Error 0.020
Slope 0.728 RMSE 3.518 kcal/mol Intercept 4.506 Std Error 0.019
Slope 0.794 RMSE 3.775 kcal/mol Intercept 4.776 Std Error 0.019
Slope 0.699 RMSE 3.231 kcal/mol Intercept 6.139 Std Error 0.014
Slope 0.770 RMSE 5.295 kcal/mol Intercept -1.999 Std Error 0.019
Slope 0.779 RMSE 5.166 kcal/mol Intercept -2.052 Std Error 0.018
Slope 0.808 RMSE 4.714 kcal/mol Intercept -1.856 Std Error 0.018
Slope 0.736 RMSE 3.489 kcal/mol Intercept 0.158 Std Error 0.014
Slope 0.793 RMSE 3.883 kcal/mol Intercept -0.776 Std Error 0.020
Slope 0.805 RMSE 3.937 kcal/mol Intercept -1.095 Std Error 0.019
Slope 0.871 RMSE 3.237 kcal/mol Intercept -0.818 Std Error 0.019
lope 0.783 RMSE 2.888 kcal/mol Intercept -0.017 Std Error 0.013

Bespoke-fit set 2

Here we look at the performance of the bespoke-fit forcefield starting from each of the Openff generations and the second test set extracted from high-temperature MD using Gaff-2.11.

 

Molecule

Openff-1.0.0

Openff-1.2.1

Openff-1.3.0

Molecule

Openff-1.0.0

Openff-1.2.1

Openff-1.3.0

Molecule 7
Slope 0.734 RMSE 3.664 kcal/mol Intercept 3.900 Std Error 0.020
Slope 0.737 RMSE 3.425 kcal/mol Intercept 4.251 Std Error 0.018
Slope 0.769 RMSE 3.469 kcal/mol Intercept 4.523 Std Error 0.018
Slope 0.781 RMSE 4.906 kcal/mol Intercept -1.697 Std Error 0.019
Slope 0.777 RMSE 4.481 kcal/mol Intercept -1.254 Std Error 0.018
Slope 0.779 RMSE 4.058 kcal/mol Intercept -0.767 Std Error 0.017
Slope 0.831 RMSE 3.069 kcal/mol Intercept -0.014 Std Error 0.020
Slope 0.805 RMSE 3.937 kcal/mol Intercept -1.095 Std Error 0.019
Slope 0.881 RMSE 2.745 kcal/mol Intercept -0.225 Std Error 0.019

Validation set 3

Here we performed the same validation as above but this time starting from a trajectory extracted from an MD simulation with the k values for all rotatable torsions set to 0. This should allow the simulation to cross high energies barriers easily without any basis to the starting conformation.

  • Here we again see a large energy range for all of the molecules similar to set 1

  • We also see a general increase in accuracy with each version of the openff forcefields

  • All of the forcefields disagree with QM on the lowest energy structure in the validation sets

  • the performance of gaff-2.11 is comparable to openff-1.3.0 for the parent and fragment 1, but gaff is better for fragment 2

Molecule

Openff-1.0.0

Openff-1.2.1

Openff-1.3.0

Gaff-2.11

Molecule

Openff-1.0.0

Openff-1.2.1

Openff-1.3.0

Gaff-2.11

Molecule 7
Slope 0.512 RMSE 16.901 kcal/mol Intercept -3.202 Std Error 0.011
Slope 0.625 RMSE 15.132 kcal/mol Intercept -4.510 Std Error 0.011
Slope 0.705 RMSE 13.141 kcal/mol Intercept -4.526 Std Error 0.013

 

Slope 0.718 RMSE 12.359 kcal/mol Intercept -4.176 Std Error 0.011
Slope 0.441 RMSE 16.249 kcal/mol Intercept -1.064 Std Error 0.011
Slope 0.558 RMSE 14.058 kcal/mol Intercept -1.969 Std Error 0.012

 

Slope 0.650 RMSE 9.674 kcal/mol Intercept 0.253 Std Error 0.012

 

Slope 0.702 RMSE 9.835 kcal/mol Intercept -1.509 Std Error 0.011

 

Slope 0.476 RMSE 13.882 kcal/mol Intercept -2.333 Std Error 0.009
Slope 0.575 RMSE 11.564 kcal/mol Intercept -2.139 Std Error 0.009
Slope 0.687 RMSE 9.988 kcal/mol Intercept -2.924 Std Error 0.010
Slope 0.798 RMSE 5.571 kcal/mol Intercept -0.715 Std Error 0.009

Using conformer specific charges set 2

I have also run the test on the gaff-2.11 generated conformers but used charges calculated specifically for the conformer.

  • The performance is generally worse as the forcefield is not designed to be used this way and was not fit with this in mind

  • Overall this seems to cause the forcefield to overestimated the relative energy of a conformer

  • This does help with fragment 2 where we see the forcefield disagree with QM on the lowest energy conformer.

Molecule

Openff-1.0.0

Openff-1.2.1

Openff-1.3.0

Average Charges

Charge std

Molecule

Openff-1.0.0

Openff-1.2.1

Openff-1.3.0

Average Charges

Charge std

Molecule 7

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Using The TorsionDrive Trajectory

In the above fits, it was noted that future improvements in the performance could probably be hand by optimizing the bond and angle terms which were untouched during the first bespoke fit. One idea is to use the optimization trajectories extracted from the torsiondrive and fit to the relative energies of these conformers using the Abinitio forcebalance target. Following this, I have extracted the ~1700 single point calculations that make up the final torsion drive profile originally used for fitting and combined both fitting targets together. Here we allow the bond and angle force constants to optimize along with the bespoke torsion terms which are introduced as part of the bespoke workflow. Note that no bespoke bond and angle terms were introduced the parsley terms were just allowed to optimize.

  • We see that the performance of the model is increased but is not much better than just fitting the torsions in terms of

    Error rendering macro: No valid license found for LaTeX Math addon

  • We see that the MM relative energies are more consistently underestimated compared to QM after including the relative energy ranking.

  • Despite a very good fit in forcebalance this does not seem to transfer to these conformations from high-temperature MD. This may be due to the fact that these high energy conformations are not sampled during the TD optimization trajectory.

Molecule

Openff-1.0.0

Molecule

Openff-1.0.0

Molecule 7
Slope 0.576 RMSE 9.216 kcal/mol Intercept 5.566 Std Error 0.014
Slope 0.559 RMSE 9.416 kcal/mol Intercept 2.596 Std Error 0.012
Slope 0.632 RMSE 9.367 kcal/mol Intercept -1.765 Std Error 0.013

Ani2x

Recently ani2x was used to correct free energy calculations and it was believed it was due to ani correcting the torsion potentials. To investigate this I have also performed the energy ranking test using ani2x.

  • Ani2x has the best correlation with our default spec, and would be a suitable replacement to optimize our MM torsion potentials to

  • Would we need to achieve a similar correlation in this test to see the same improvement in FEP as using Ani2x?

Molecule

Ani2x

Molecule

Ani2x

Molecule 7

 

 

 

 

Related content