WBO Interpolated Parameters

This page contains an overview of the projects/studies aimed at improving the FF optimization process in general. Specific features and decisions made for each FF optimization cycle should be recorded in .

Objectives

Team

Objectives

Team

Main objective: Create general WBO interpolated parameters in the force field

Project driver: @David Mobley @Jessica Maat (Deactivated) @Pavan Behara

Objectives for 2020:

  •  

Team members: @David Dotson @Jeffrey Wagner @Chaya Stern (Deactivated) @Trevor Gokey

Current Projects

Current Projects

Driver

Project / Study

@Pavan Behara@Trevor Gokey @David Mobley @Jessica Maat (Deactivated)

  • ForceBalance optimization of wiberg bond order interpolated parameters

  • Analysis of QCA datasets for wbo versus torsion barrier height

 

Scientific Questions

Scientific Questions

  •  

Infrastructure Requirements

Desired features:

Software:

 

Action items

Experiment 3: Amide experiment @Jessica Maat (Deactivated)
Review placement, and redundancy of parameters in iteration 5 (openff-1.3.0-tig-3.0), esp. closeness of parameters TIG* with t70-b,c,d - @Jessica Maat (Deactivated)high priority
Review placement- check more general parameters are placed below a more specific parameter @Jessica Maat (Deactivated)
Remove t70d, check if there are overlaps with t70* with any TIG*, add sulfur to TIG1c
Changed TIG1c from [*:1]~[#6X3:2](=[#8X1])~[#7X3,#7X2-1:3]~[*:4] to [*:1]~[#6X3:2](=[#8X1,#16X1])~[#7X3,#7X2-1:3]~[*:4], verify this change is correct with @Pavan Behara
Create iteration 6
Do an initial fit with iteration 5(or 6) FF with Trevor's Help @Pavan Behara
Check parameter coverage by incrementing datasets from Gen 2 sets (Gen 2 are used for fitting Parsley), suggest datasets for fitting - @Pavan Behara
Pre-process datasets to tag molecules that overlap with Lim-Mobley_Benchmarking_set - @Pavan Behara
Check params with negative slope (4, 5b) or possibility to split (6,7):
Outcome: The range of wbo values these parameters have in the plots is 0.2, so it is difficult to ascertain any specific behavior now. There doesn’t seem to be any chemical series that can separate them into more child parameters
TIG4, TIG5b @Pavan Behara
TIG6, TIG7 @Jessica Maat (Deactivated)
Pass FFs and targets to Trevor
Provide another FF to TG by splitting the interpolated params to single, aromatic, double bonded general torsion terms @Pavan Behara
Analyze the outcome of the initial fits after getting back new FF from Trevor @Pavan Behara
Plot the residuals (QM - MM_intrinsic_torsion), instead of current analysis solely done with QM data. - @Jessica Maat (Deactivated)
Fitting series of new interpolated torsion to a bunch of molecules with as much data in the plots as possible @Pavan Behara@Trevor Gokey
By molecule fit, look at forcebalance repo@Jessica Maat (Deactivated)

 

 

 

Two experiments:

 

 

 

ForceBalance runs on WBO Interpolated Parameters

The substituted phenyl dataset parameter usage using OpenFF v1.3.0 for Rowley and Phenyl Datasets:

Rowley:

 

Phenyl:

The substituted phenyl dataset parameter usage using FF with only proposed interpolated parameters for Rowley and Phenyl Datasets:

Rowley:

Phenyl:

The Phenyl dataset parameter usage using FF with only proposed TIG interpolated parameters for Rowley and Phenyl Datasets:

Rowley:

Phenyl:

The Phenyl dataset parameter usage using FF with only proposed it1-7 interpolated parameters for Rowley and Phenyl Datasets:

Rowley:

Phenyl:

 

Proposed WBO interpolated parameters from Chris Bayly:

General parameters:

    TIG0      [*:1]~[#6X3:2]~[#6X3:3]~[*:4]

    TIG1      [*:1]~[#6X3:2]~[#7X3:3]~[*:4]

    TIG2      [*:1]~[#6X3:2]~[#7X2:3]~[*:4]

    TIG3      [#6X3:1]~[#6X3:2]~[#6X3:3]~[*:4]

    TIG4      [#7X2:1]~[#6X3:2]~[#6X3:3]~[*:4] 

    TIG5      [#7X3#8X2#16X2:1]~[#6X3:2]~[#6X3:3]~[*:4]

    TIG6      [#6X3:1]~[#6X3:2]~[#7X3:3]~[*:4]

    TIG7      [#7X2:1]~[#6X3:2]~[#7X3:3]~[*:4] 

    TIG8      [#7X3#8X2#16X2:1]~[#6X3:2]~[#7X3:3]~[*:4]

<Proper smirks="[*:1]~[#6X3:2]~[#6X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG0" idivf1="1.0"></Proper> <Proper smirks="[*:1]~[#6X3:2]~[#7X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG1" idivf1="1.0"></Proper> <Proper smirks="[*:1]~[#6X3:2]~[#7X2:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG2" idivf1="1.0"></Proper> <Proper smirks="[#6X3:1]~[#6X3:2]~[#6X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG3" idivf1="1.0"></Proper> <Proper smirks="[#7X2:1]~[#6X3:2]~[#6X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG4" idivf1="1.0"></Proper> <Proper smirks="[#7X3,#8X2,#16X2:1]~[#6X3:2]~[#6X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG5" idivf1="1.0"></Proper> <Proper smirks="[#6X3:1]~[#6X3:2]~[#7X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG6" idivf1="1.0"></Proper> <Proper smirks="[#7X2:1]~[#6X3:2]~[#7X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG7" idivf1="1.0"></Proper> <Proper smirks="[#7X3,#8X2,#16X2:1]~[#6X3:2]~[#7X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG8" idivf1="1.0"></Proper>

 

Parameters for substituted phenyl dataset:

    TIG9     [*:1]~[#6X3:2]-[#7X3:3]-[#6X4#1:4]

    TIG10      [*:1]~[#6X3:2]-[#8X2:3]-[#6X4#1:4]

    TIG11      [*:1]~[#6X3:2]-[#7X3:3]-[#6X3:4]=O

    TIG12      [*:1]~[#6X3:2]-[#8X2:3]-[#6X3:4]=O

    TIG13     [*:1]~[#6X3:2]-[#7X3:3](~[#8X1])[#8X1:4]

    TIG14      [*:1]~[#6X3:2]-[#6X3:3](-[#8H1])=[#8X1:4]

    TIG15      [*:1]~[#6X3:2]-[#6X3:3](~[#8X1])[#8X1:4] 

<Proper smirks="[*:1]~[#6X3:2]-[#7X3:3]-[#6X4,#1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG9" idivf1="1.0"></Proper> <Proper smirks="[*:1]~[#6X3:2]-[#8X2:3]-[#6X4,#1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG10" idivf1="1.0"></Proper> <Proper smirks="[*:1]~[#6X3:2]-[#7X3:3]-[#6X3:4]=O" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG11" idivf1="1.0"></Proper> <Proper smirks="[*:1]~[#6X3:2]-[#8X2:3]-[#6X3:4]=O" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG12" idivf1="1.0"></Proper> <Proper smirks="[*:1]~[#6X3:2]-[#7X3:3](~[#8X1])[#8X1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG13" idivf1="1.0"></Proper> <Proper smirks="[*:1]~[#6X3:2]-[#6X3:3](-[#8H1])=[#8X1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG14" idivf1="1.0"></Proper> <Proper smirks="[*:1]~[#6X3:2]-[#6X3:3](~[#8X1])[#8X1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG15" idivf1="1.0"></Proper>

Experiment 1: Adding TIG1-8, general WBO interpolated parameters to OpenFF v1.3.0:

Workflow:

  1. Start with TIG0 parameter

  2. Add parameter to FF hierarchy using the results from FF’s using only interpolated parameters and inspection of current FF parameters

  3. Generate the parameter usage plots for phenyl and Rowley dataset

  4. Compare plot to the parameter usage results using only interpolated parameters to make sure usage is reasonable

  5. Move to next parameter, and repeat at step #1 until all parameters are added to FF

Experiment Notes:

General WBO Interpolated Parameters

SMIRKS

Insertion in v1.3.0 OpenFF

General WBO Interpolated Parameters

SMIRKS

Insertion in v1.3.0 OpenFF

TIG0

[*:1]~[#6X3:2]~[#6X3:3]~[*:4]

Iteration 5: No change

  • Replace t43-t45

After addition to TIG0:

Rowley DS:

Substituted Phenyl DS:

Iteration 5:

Plot of TIG0 over all QCA data, only one point left after being drained by specific parameters that come after this

 

TIG1

Iteration 1: [*:1]~[#6X3:2]~[#7X3:3]~[*:4]

Iteration 2: [*:1]~[#6X3:2]~[#7X3, #7X2-1:3]~[*:4]

Iteration 4,5:

TIG1a [*:1]~[#6X3:2]@[#7X3,#7X3+1,#7X2,#7X2-1:3]~[*:4]

TIG1b [*:1]~[#6X3:2]@[#6X3:3]~[*:4]

TIG1c [*:1]~[#6X3:2](=[#8X1])~[#7X3,#7X2-1:3]~[*:4]

TIG1d [*:1]~[#6X3:2](~[#7X2])~[#7X3,#7X2-1,#7X2,#7X3+1:3]~[*:4]

Iteration 1:

  • Below t69 and t69a

Question:

-Should we replace t69 parameters with TIG1?

The parameters t69 and t69a:

<Proper smirks="[*:1]~[#7X3,#7X2-1:2]-!@[#6X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1="1.624344674592e+00 * mole**-1 * kilocalorie" id="t69" idivf1="1.0"/> <Proper smirks="[*:1]-[#7X3:2]-[#6X3$(*=[#8,#16,#7]):3]~[*:4]" periodicity1="2" periodicity2="1" phase1="180.0 * degree" phase2="0.0 * degree" k1="1.802103402795e+00 * mole**-1 * kilocalorie" k2="0.0 * mole**-1 * kilocalorie" id="t69a" idivf1="1.0" idivf2="1.0"/>

Rowley DS:

Substituted Phenyl DS:

 

Iteration 2 based on suggestions from @David Mobley :

  • Change tig0 to [*:1]~[#6X3:2]~[#7X3, #7X2-1:3]~[*:4]

  • Replace t69 with tig0

Rowley DS:

Substituted Phenyl DS:

 

Iteration 5:

 TIGs a and b correspond to ring torsions that correlate with high barriers and are placed at the bottom after t83

TIG1c and TIG1d are splits of TIG1 into two chemical series:

(1) carbonyl with a nitrogen and

(2) Formamidine type molecules

 

After splitting to 1c and 1d, and placing 1d after TIG2 here are the plots:

TIG2

Iteration 1: [*:1]~[#6X3:2]~[#7X2:3]~[*:4]

Iteration 2: [*:1]~[#6X3:2]~[#7X2, #7X3+1:3]~[*:4]

Iteration 5: No change

Iteration 1:

  • After t80, which is the most specific parameter with “[#6X3:2] [#7X2:3]”

*No change in plots from TIG1 experiment, was expected because Rowley and Phenyl dataset do not use TIG2 in experiment with only interpolated params.

 

Iteration 2 based on feedback from @David Mobley:

  • Generalize parameter to [*:1]~[#6X3:2]~[#7X2, #7X3+1:3]~[*:4]

  • Replace t76-78 with tig2

Substituted phenyl DS:

Rowley dataset:

*TIG2 is capturing a chemical series that t72 used to capture, this looks much better than previous parameter placement.

Iteration 5:

Plot of TIG2 over all QCA data

 

TIG3

[#6X3:1]~[#6X3:2]~[#6X3:3]~[*:4]

Iteration 5: No change

 

 

  • Replace t48 with TIG3

Parameter t48 is a more specific child parameter of TIG3:

<Proper smirks="[#6X3:1]=[#6X3:2]-[#6X3:3]=[#8X1:4]" periodicity1="2" periodicity2="3" phase1="180.0 * degree" phase2="0.0 * degree" k1="3.980018770084e-01 * mole**-1 * kilocalorie" k2="-3.9 38620201049e-01 * mole**-1 * kilocalorie" id="t48" idivf1="1.0" idivf2="1.0"/>

Rowley DS:

Substituted Phenyl DS:

Experiment proposal from @David Mobley :

  • Retain t48 below TIG3 and (maybe) add interpolation.

  • Run this experiment in fitting, or see how (once you have more data on this plots) it would separate things.

Iteration 5:

Plot of TIG3 over all QCA data

 

TIG4

[#7X2:1]~[#6X3:2]~[#6X3:3]~[*:4] 

Iteration 5: [#7X2:1]@[#6X3:2]~[#6X3:3]@[*:4]

  • After TIG3 parameter, which is the closest parameter to TIG4.

There are no obvious parameters in the FF that TIG4 could replace, t40 is closest but too far up in the hierarchy. This would make TIG4 come before TIG0 and so on.

t40 parameter:

Rowley DS:

Substituted Phenyl DS:

Iteration 5:

When this parameter is plotted over all QCA data some molecules not grouped with others make the slope negative. Making this parameter more specific to biaryl kind of molecules alleviates this problem. After adding the data from ‘OpenFF Group 1 Torsions’ we still see a downtrend for this parameter

 

TIG5

[#7X3, #8X2, #16X2:1]~[#6X3:2]~[#6X3:3]~[*:4]

Iteration 5:

TIG5a [#7X3,#8X2,#16X2:1]~[#6X3:2]~[#6X3:3](=[#8X1])~[*:4]

TIG5b [#7X3,#8X2,#16X2:1]@[#6X3:2]~[#6X3:3]@[*:4]

  • TIG5 was placed after TIG4, these parameters cover chemistries of sp2 carbon-carbon bonds.

Rowley DS:

Substituted Phenyl DS:

Iteration 5:

Similar to TIG1 split, here also dividing into two parameters that correspond to

(1) Carbonyls with a side carbon

(2) biaryl or an aryl on one side and a dangling chain on other side of the central bond

Post split the plots are:

5a on left, 5b on the right

 

TIG6

Iteration 1: [#6X3:1]~[#6X3:2]~[#7X3:3]~[*:4]

Iteration 2: [#6X3:1]~[#6X3:2]~[#7X3,#7X2-1:3]~[*:4]

Iteration 5: No change

 

Iteration 1:

  • After t78, there are a lot of parameters that cover #7X3 - #6 chemistry.

It’s unclear if any parameters should be replaced by these.

Rowley DS:

Substituted Phenyl DS:

*If this plot were to perfectly reproduce the ff with only the TIG parameters, we should see only TIG3, 5, and 6 in this plot. There are also t96 and t97.

This might be an indication that the parameter placement is not correct, or t96 or t97 need an interpolated parameter.

Iteration 2 proposed by @David Mobley :

  • Change parameter to [#6X3:1]~[#6X3:2]~[#7X3,#7X2-1:3]~[*:4]

  • Place tig6 after tig1

Rerunning the parameter assignment plots with tig6 iteration 2 in openff-1.3.0-tig.offxml modified:

Rowley DS:

Phenyl DS:

Iteration 5:

Plot of TIG6 over all QCA data

 

TIG7

Iteration 1: [#7X2:1]~[#6X3:2]~[#7X3:3]~[*:4] 

Iteration 2: [#7X2:1]~[#6X3:2]~[#7X3,#7X2-1:3]~[*:4] 

Iteration 1:

  • After TIG6.

Rowley DS:

Substituted Phenyl DS:

Iteration 2:

  • Placed after tig6 still, but tig6 is moved in the FF as outlined above

  • Parameter changed to: [#7X2:1]~[#6X3:2]~[#7X3,#7X2-1:3]~[*:4] 

Rerunning the parameter assignment plots with tig7 iteration 2 in openff-1.3.0-tig.offxml modified:

Rowley:

Phenyl:

Iteration 5:

Plot of TIG7 over all QCA data

 

TIG8

Iteration 1: [#7X3,#8X2,#16X2:1]~[#6X3:2]~[#7X3:3]~[*:4]

Iteration 2: [#7X3,#8X2,#16X2:1]~[#6X3:2]~[#7X3,#7X2-1:3]~[*:4]

Iteration 5: [#7X3,#8X2,#16X2:1]@[#6X3:2]~[#7X3,#7X2-1:3]~[*:4]

Iteration 1:

  • TIG8 placed after TIG7.

  • Parameters TIG6-8 are in a similar hierarchy.

Rowley DS:

*exactly reproduces FF results from using only TIG parameters

Substituted Phenyl DS:

*Same case as prior with t96 and t97

Iteration 2:

  • Parameter changed to [#7X3,#8X2,#16X2:1]~[#6X3:2]~[#7X3,#7X2-1:3]~[*:4]

  • Placed after new placement of tig7

Rerunning parameter assignment:

Rowley:

Phenyl:

Iteration 5:

Making TIG8 more specific to capture biaryl or bridged-aryl type of molecules. Plot below for the entire QCA torsiondrive datasets.

 

1. This is the resulting FF from the prior experiment:

2. 2nd Iteration resulting FF from the prior experiment:

3. 3rd iteration, addition of TIG1a covering ring chemistry, openff-1.3.0-tig-2.0

4. 4th iteration, TIG1a after t83 and remove t69a, openff-1.3.0-tig-2.2

5. 5th iteration, parameter splits or making parameters more specific based on chemical series, openff-1.3.0-tig-3.0

 

 

  • Iteration 4 FF plots here
    https://docs.google.com/presentation/d/1dOUi9zz5UvWOUVLmu-KqcfH3KGay_NDTq7bIaMZ7LGk/edit?usp=sharing

  • Iteration 5 plots and chemical series analysis here
    https://docs.google.com/presentation/d/1tqWBj_5TmtHuR77VkdZ1wQ0Y6eEbRx64hU5XcWWzqp0/edit?usp=sharing

 

Experiment 2: Adding TIG9-15, general WBO interpolated parameters to OpenFF v1.3.0:

 

 

 

 

 

 

    TIG9    

[*:1]~[#6X3:2]-[#7X3:3]-[#6X4#1:4]

After t78

*Rowley & phenyl DS same as openff 1.3.0

Phenyl DS:

 

    TIG10     

[*:1]~[#6X3:2]-[#8X2:3]-[#6X4#1:4]

 

    TIG11     

[*:1]~[#6X3:2]-[#7X3:3]-[#6X3:4]=O

 

    TIG12     

[*:1]~[#6X3:2]-[#8X2:3]-[#6X3:4]=O

 

    TIG13    

[*:1]~[#6X3:2]-[#7X3:3](~[#8X1])[#8X1:4]

 

    TIG14     

[*:1]~[#6X3:2]-[#6X3:3](-[#8H1])=[#8X1:4]

 

    TIG15     

[*:1]~[#6X3:2]-[#6X3:3](~[#8X1])[#8X1:4] 

 

 

Summary of number of overlaps for different datasets with Lim Mobley Benchmark set:

  • Fragment Stability Benchmark: 0

  • OpenFF Fragmenter Validation 1.0: 3

  • OpenFF Full TorsionDrive Benchmark 1: 56

  • OpenFF Gen 2 Torsion Set 1 Roche 2: 3

  • OpenFF Gen 2 Torsion Set 2 Coverage 2: 0

  • OpenFF Gen 2 Torsion Set 3 Pfizer Discrepancy 2: 39

  • OpenFF Gen 2 Torsion Set 4 eMolecules Discrepancy 2: 209

  • OpenFF Gen 2 Torsion Set 5 Bayer 2: 0

  • OpenFF Gen 2 Torsion Set 6 Supplemental 2: 0

  • OpenFF Group1 Torsions 2: 0

  • OpenFF Group1 Torsions 3: 0

  • OpenFF Primary Benchmark 1 Torsion Set: 238

  • OpenFF Primary Benchmark 2 Torsion Set: 400

  • OpenFF Primary TorsionDrive Benchmark 1: 10

  • OpenFF Rowley Biaryl v1.0: 0

  • OpenFF Substituted Phenyl Set 1: 1

  • OpenFF-benchmark-ligand-fragments-v1.0: 1

  • Pfizer Discrepancy Torsion Dataset 1: 56

  • SMIRNOFF Coverage Torsion Set 1: 0

  • TorsionDrive Paper: 0