WBO Interpolated Parameters
This page contains an overview of the projects/studies aimed at improving the FF optimization process in general. Specific features and decisions made for each FF optimization cycle should be recorded in Force Field Releases.
Objectives | Team |
---|---|
Main objective: Create general WBO interpolated parameters in the force field | Project driver: @David Mobley @Jessica Maat (Deactivated) @Pavan Behara |
Objectives for 2020: | Team members: @David Dotson @Jeffrey Wagner @Chaya Stern (Deactivated) @Trevor Gokey |
Current Projects | |
---|---|
Driver | Project / Study |
@Pavan Behara@Trevor Gokey @David Mobley @Jessica Maat (Deactivated) |
|
Scientific Questions | |
---|---|
Infrastructure Requirements | |
Desired features: | Software: |
Action items
Outcome: The range of wbo values these parameters have in the plots is 0.2, so it is difficult to ascertain any specific behavior now. There doesn’t seem to be any chemical series that can separate them into more child parameters
Two experiments:
ForceBalance runs on WBO Interpolated Parameters
The substituted phenyl dataset parameter usage using OpenFF v1.3.0 for Rowley and Phenyl Datasets:
Rowley:
Phenyl:
The substituted phenyl dataset parameter usage using FF with only proposed interpolated parameters for Rowley and Phenyl Datasets:
Rowley:
Phenyl:
The Phenyl dataset parameter usage using FF with only proposed TIG interpolated parameters for Rowley and Phenyl Datasets:
Rowley:
Phenyl:
The Phenyl dataset parameter usage using FF with only proposed it1-7 interpolated parameters for Rowley and Phenyl Datasets:
Rowley:
Phenyl:
Proposed WBO interpolated parameters from Chris Bayly:
General parameters:
TIG0 [*:1]~[#6X3:2]~[#6X3:3]~[*:4]
TIG1 [*:1]~[#6X3:2]~[#7X3:3]~[*:4]
TIG2 [*:1]~[#6X3:2]~[#7X2:3]~[*:4]
TIG3 [#6X3:1]~[#6X3:2]~[#6X3:3]~[*:4]
TIG4 [#7X2:1]~[#6X3:2]~[#6X3:3]~[*:4]
TIG5 [#7X3#8X2#16X2:1]~[#6X3:2]~[#6X3:3]~[*:4]
TIG6 [#6X3:1]~[#6X3:2]~[#7X3:3]~[*:4]
TIG7 [#7X2:1]~[#6X3:2]~[#7X3:3]~[*:4]
TIG8 [#7X3#8X2#16X2:1]~[#6X3:2]~[#7X3:3]~[*:4]
<Proper smirks="[*:1]~[#6X3:2]~[#6X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG0" idivf1="1.0"></Proper>
<Proper smirks="[*:1]~[#6X3:2]~[#7X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG1" idivf1="1.0"></Proper>
<Proper smirks="[*:1]~[#6X3:2]~[#7X2:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG2" idivf1="1.0"></Proper>
<Proper smirks="[#6X3:1]~[#6X3:2]~[#6X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG3" idivf1="1.0"></Proper>
<Proper smirks="[#7X2:1]~[#6X3:2]~[#6X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG4" idivf1="1.0"></Proper>
<Proper smirks="[#7X3,#8X2,#16X2:1]~[#6X3:2]~[#6X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG5" idivf1="1.0"></Proper>
<Proper smirks="[#6X3:1]~[#6X3:2]~[#7X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG6" idivf1="1.0"></Proper>
<Proper smirks="[#7X2:1]~[#6X3:2]~[#7X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG7" idivf1="1.0"></Proper>
<Proper smirks="[#7X3,#8X2,#16X2:1]~[#6X3:2]~[#7X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG8" idivf1="1.0"></Proper>
Parameters for substituted phenyl dataset:
TIG9 [*:1]~[#6X3:2]-[#7X3:3]-[#6X4#1:4]
TIG10 [*:1]~[#6X3:2]-[#8X2:3]-[#6X4#1:4]
TIG11 [*:1]~[#6X3:2]-[#7X3:3]-[#6X3:4]=O
TIG12 [*:1]~[#6X3:2]-[#8X2:3]-[#6X3:4]=O
TIG13 [*:1]~[#6X3:2]-[#7X3:3](~[#8X1])[#8X1:4]
TIG14 [*:1]~[#6X3:2]-[#6X3:3](-[#8H1])=[#8X1:4]
TIG15 [*:1]~[#6X3:2]-[#6X3:3](~[#8X1])[#8X1:4]
<Proper smirks="[*:1]~[#6X3:2]-[#7X3:3]-[#6X4,#1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG9" idivf1="1.0"></Proper>
<Proper smirks="[*:1]~[#6X3:2]-[#8X2:3]-[#6X4,#1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG10" idivf1="1.0"></Proper>
<Proper smirks="[*:1]~[#6X3:2]-[#7X3:3]-[#6X3:4]=O" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG11" idivf1="1.0"></Proper>
<Proper smirks="[*:1]~[#6X3:2]-[#8X2:3]-[#6X3:4]=O" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG12" idivf1="1.0"></Proper>
<Proper smirks="[*:1]~[#6X3:2]-[#7X3:3](~[#8X1])[#8X1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG13" idivf1="1.0"></Proper>
<Proper smirks="[*:1]~[#6X3:2]-[#6X3:3](-[#8H1])=[#8X1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG14" idivf1="1.0"></Proper>
<Proper smirks="[*:1]~[#6X3:2]-[#6X3:3](~[#8X1])[#8X1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG15" idivf1="1.0"></Proper>
Experiment 1: Adding TIG1-8, general WBO interpolated parameters to OpenFF v1.3.0:
Workflow:
Start with TIG0 parameter
Add parameter to FF hierarchy using the results from FF’s using only interpolated parameters and inspection of current FF parameters
Generate the parameter usage plots for phenyl and Rowley dataset
Compare plot to the parameter usage results using only interpolated parameters to make sure usage is reasonable
Move to next parameter, and repeat at step #1 until all parameters are added to FF
Experiment Notes:
General WBO Interpolated Parameters | SMIRKS | Insertion in v1.3.0 OpenFF |
---|---|---|
TIG0 | [*:1]~[#6X3:2]~[#6X3:3]~[*:4] Iteration 5: No change |
After addition to TIG0: Rowley DS: Substituted Phenyl DS: Iteration 5: Plot of TIG0 over all QCA data, only one point left after being drained by specific parameters that come after this
|
TIG1 | Iteration 1: [*:1]~[#6X3:2]~[#7X3:3]~[*:4] Iteration 2: [*:1]~[#6X3:2]~[#7X3, #7X2-1:3]~[*:4] Iteration 4,5: TIG1a [*:1]~[#6X3:2]@[#7X3,#7X3+1,#7X2,#7X2-1:3]~[*:4] TIG1b [*:1]~[#6X3:2]@[#6X3:3]~[*:4] TIG1c [*:1]~[#6X3:2](=[#8X1])~[#7X3,#7X2-1:3]~[*:4] TIG1d [*:1]~[#6X3:2](~[#7X2])~[#7X3,#7X2-1,#7X2,#7X3+1:3]~[*:4] | Iteration 1:
Question: -Should we replace t69 parameters with TIG1? The parameters t69 and t69a: <Proper smirks="[*:1]~[#7X3,#7X2-1:2]-!@[#6X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1="1.624344674592e+00 * mole**-1 * kilocalorie" id="t69" idivf1="1.0"/>
<Proper smirks="[*:1]-[#7X3:2]-[#6X3$(*=[#8,#16,#7]):3]~[*:4]" periodicity1="2" periodicity2="1" phase1="180.0 * degree" phase2="0.0 * degree" k1="1.802103402795e+00 * mole**-1 * kilocalorie" k2="0.0 * mole**-1 * kilocalorie" id="t69a" idivf1="1.0" idivf2="1.0"/> Rowley DS: Substituted Phenyl DS:
Iteration 2 based on suggestions from @David Mobley :
Rowley DS: Substituted Phenyl DS:
Iteration 5: TIGs a and b correspond to ring torsions that correlate with high barriers and are placed at the bottom after t83 TIG1c and TIG1d are splits of TIG1 into two chemical series: (1) carbonyl with a nitrogen and (2) Formamidine type molecules
After splitting to 1c and 1d, and placing 1d after TIG2 here are the plots: |
TIG2 | Iteration 1: [*:1]~[#6X3:2]~[#7X2:3]~[*:4] Iteration 2: [*:1]~[#6X3:2]~[#7X2, #7X3+1:3]~[*:4] Iteration 5: No change | Iteration 1:
*No change in plots from TIG1 experiment, was expected because Rowley and Phenyl dataset do not use TIG2 in experiment with only interpolated params.
Iteration 2 based on feedback from @David Mobley:
Substituted phenyl DS: Rowley dataset: *TIG2 is capturing a chemical series that t72 used to capture, this looks much better than previous parameter placement. Iteration 5: Plot of TIG2 over all QCA data
|
TIG3 | [#6X3:1]~[#6X3:2]~[#6X3:3]~[*:4] Iteration 5: No change
|
Parameter t48 is a more specific child parameter of TIG3:
Rowley DS: Substituted Phenyl DS: Experiment proposal from @David Mobley :
Iteration 5: Plot of TIG3 over all QCA data
|
TIG4 | [#7X2:1]~[#6X3:2]~[#6X3:3]~[*:4] Iteration 5: [#7X2:1]@[#6X3:2]~[#6X3:3]@[*:4] |
There are no obvious parameters in the FF that TIG4 could replace, t40 is closest but too far up in the hierarchy. This would make TIG4 come before TIG0 and so on. t40 parameter: Rowley DS: Substituted Phenyl DS: Iteration 5: When this parameter is plotted over all QCA data some molecules not grouped with others make the slope negative. Making this parameter more specific to biaryl kind of molecules alleviates this problem. After adding the data from ‘OpenFF Group 1 Torsions’ we still see a downtrend for this parameter
|
TIG5 | [#7X3, #8X2, #16X2:1]~[#6X3:2]~[#6X3:3]~[*:4] Iteration 5: TIG5a [#7X3,#8X2,#16X2:1]~[#6X3:2]~[#6X3:3](=[#8X1])~[*:4] TIG5b [#7X3,#8X2,#16X2:1]@[#6X3:2]~[#6X3:3]@[*:4] |
Rowley DS: Substituted Phenyl DS: Iteration 5: Similar to TIG1 split, here also dividing into two parameters that correspond to (1) Carbonyls with a side carbon (2) biaryl or an aryl on one side and a dangling chain on other side of the central bond Post split the plots are: 5a on left, 5b on the right
|
TIG6 | Iteration 1: [#6X3:1]~[#6X3:2]~[#7X3:3]~[*:4] Iteration 2: [#6X3:1]~[#6X3:2]~[#7X3,#7X2-1:3]~[*:4] Iteration 5: No change
| Iteration 1:
It’s unclear if any parameters should be replaced by these. Rowley DS: Substituted Phenyl DS: *If this plot were to perfectly reproduce the ff with only the TIG parameters, we should see only TIG3, 5, and 6 in this plot. There are also t96 and t97. This might be an indication that the parameter placement is not correct, or t96 or t97 need an interpolated parameter. Iteration 2 proposed by @David Mobley :
Rerunning the parameter assignment plots with tig6 iteration 2 in openff-1.3.0-tig.offxml modified: Rowley DS: Phenyl DS: Iteration 5: Plot of TIG6 over all QCA data
|
TIG7 | Iteration 1: [#7X2:1]~[#6X3:2]~[#7X3:3]~[*:4] Iteration 2: [#7X2:1]~[#6X3:2]~[#7X3,#7X2-1:3]~[*:4] | Iteration 1:
Rowley DS: Substituted Phenyl DS: Iteration 2:
Rerunning the parameter assignment plots with tig7 iteration 2 in openff-1.3.0-tig.offxml modified: Rowley: Phenyl: Iteration 5: Plot of TIG7 over all QCA data
|
TIG8 | Iteration 1: [#7X3,#8X2,#16X2:1]~[#6X3:2]~[#7X3:3]~[*:4] Iteration 2: [#7X3,#8X2,#16X2:1]~[#6X3:2]~[#7X3,#7X2-1:3]~[*:4] Iteration 5: [#7X3,#8X2,#16X2:1]@[#6X3:2]~[#7X3,#7X2-1:3]~[*:4] | Iteration 1:
Rowley DS: *exactly reproduces FF results from using only TIG parameters Substituted Phenyl DS: *Same case as prior with t96 and t97 Iteration 2:
Rerunning parameter assignment: Rowley: Phenyl: Iteration 5: Making TIG8 more specific to capture biaryl or bridged-aryl type of molecules. Plot below for the entire QCA torsiondrive datasets.
|
1. This is the resulting FF from the prior experiment:
2. 2nd Iteration resulting FF from the prior experiment:
3. 3rd iteration, addition of TIG1a covering ring chemistry, openff-1.3.0-tig-2.0
4. 4th iteration, TIG1a after t83 and remove t69a, openff-1.3.0-tig-2.2
5. 5th iteration, parameter splits or making parameters more specific based on chemical series, openff-1.3.0-tig-3.0
Iteration 4 FF plots here
https://docs.google.com/presentation/d/1dOUi9zz5UvWOUVLmu-KqcfH3KGay_NDTq7bIaMZ7LGk/edit?usp=sharingIteration 5 plots and chemical series analysis here
https://docs.google.com/presentation/d/1tqWBj_5TmtHuR77VkdZ1wQ0Y6eEbRx64hU5XcWWzqp0/edit?usp=sharing
Experiment 2: Adding TIG9-15, general WBO interpolated parameters to OpenFF v1.3.0:
|
|
|
---|---|---|
TIG9 | [*:1]~[#6X3:2]-[#7X3:3]-[#6X4#1:4] | After t78 *Rowley & phenyl DS same as openff 1.3.0 Phenyl DS:
|
TIG10 | [*:1]~[#6X3:2]-[#8X2:3]-[#6X4#1:4] |
|
TIG11 | [*:1]~[#6X3:2]-[#7X3:3]-[#6X3:4]=O |
|
TIG12 | [*:1]~[#6X3:2]-[#8X2:3]-[#6X3:4]=O |
|
TIG13 | [*:1]~[#6X3:2]-[#7X3:3](~[#8X1])[#8X1:4] |
|
TIG14 | [*:1]~[#6X3:2]-[#6X3:3](-[#8H1])=[#8X1:4] |
|
TIG15 | [*:1]~[#6X3:2]-[#6X3:3](~[#8X1])[#8X1:4] |
|
Summary of number of overlaps for different datasets with Lim Mobley Benchmark set:
Fragment Stability Benchmark: 0
OpenFF Fragmenter Validation 1.0: 3
OpenFF Full TorsionDrive Benchmark 1: 56
OpenFF Gen 2 Torsion Set 1 Roche 2: 3
OpenFF Gen 2 Torsion Set 2 Coverage 2: 0
OpenFF Gen 2 Torsion Set 3 Pfizer Discrepancy 2: 39
OpenFF Gen 2 Torsion Set 4 eMolecules Discrepancy 2: 209
OpenFF Gen 2 Torsion Set 5 Bayer 2: 0
OpenFF Gen 2 Torsion Set 6 Supplemental 2: 0
OpenFF Group1 Torsions 2: 0
OpenFF Group1 Torsions 3: 0
OpenFF Primary Benchmark 1 Torsion Set: 238
OpenFF Primary Benchmark 2 Torsion Set: 400
OpenFF Primary TorsionDrive Benchmark 1: 10
OpenFF Rowley Biaryl v1.0: 0
OpenFF Substituted Phenyl Set 1: 1
OpenFF-benchmark-ligand-fragments-v1.0: 1
Pfizer Discrepancy Torsion Dataset 1: 56
SMIRNOFF Coverage Torsion Set 1: 0
TorsionDrive Paper: 0