This page contains an overview of the projects/studies aimed at improving the FF optimization process in general. Specific features and decisions made for each FF optimization cycle should be recorded in FF releases.
Objectives | Team |
---|---|
Main objective: Create general WBO interpolated parameters in the force field | Project driver: David Mobley Jessica Maat (Deactivated) Pavan Behara |
Objectives for 2020: | Team members: David Dotson Jeffrey Wagner Chaya Stern (Deactivated) Trevor Gokey |
Current Projects | |
---|---|
Driver | Project / Study |
Pavan BeharaTrevor Gokey David Mobley Jessica Maat (Deactivated) |
|
Scientific Questions | |
---|---|
Infrastructure Requirements | |
Desired features: | Software: |
Action items
- Experiment 3: Amide experiment Jessica Maat (Deactivated)
- Review placement, and redundancy of parameters in iteration 5 (openff-1.3.0-tig-3.0), esp. closeness of parameters TIG* with t70-b,c,d - Jessica Maat (Deactivated)high priority
- Review placement- check more general parameters are placed below a more specific parameter Jessica Maat (Deactivated)
- Remove t70d, check if there are overlaps with t70* with any TIG*, add sulfur to TIG1c
- Changed TIG1c from [*:1]~[#6X3:2](=[#8X1])~[#7X3,#7X2-1:3]~[*:4] to [*:1]~[#6X3:2](=[#8X1,#16X1])~[#7X3,#7X2-1:3]~[*:4], verify this change is correct with Pavan Behara
- Create iteration 6
- Do an initial fit with iteration 5(or 6) FF with Trevor's Help Pavan Behara
- Check parameter coverage by incrementing datasets from Gen 2 sets (Gen 2 are used for fitting Parsley), suggest datasets for fitting - Pavan Behara
- Pre-process datasets to tag molecules that overlap with Lim-Mobley_Benchmarking_set - Pavan Behara
- Check params with negative slope (4, 5b) or possibility to split (6,7):
Outcome: The range of wbo values these parameters have in the plots is 0.2, so it is difficult to ascertain any specific behavior now. There doesn’t seem to be any chemical series that can separate them into more child parameters - TIG4, TIG5b Pavan Behara
- TIG6, TIG7 Jessica Maat (Deactivated)
- Pass FFs and targets to Trevor
- Provide another FF to TG by splitting the interpolated params to single, aromatic, double bonded general torsion terms Pavan Behara
- Analyze the outcome of the initial fits after getting back new FF from Trevor Pavan Behara
- Plot QM Vs MM
- Redo plots with MM_torsions vs WBO
- Parameter specific gradients from Trevor’s work?
- Plot the residuals (QM - MM_intrinsic_torsion), instead of current analysis solely done with QM data. - Jessica Maat (Deactivated)
- David DotsonPavan Behara Helping with the MM torsion barriers
- David DotsonPavan Behara Check in at 11 am
- Jeffrey WagnerJoshua HortonPavan Behara check in after ff-release call on MM intrinsic torsion
- Fitting series of new interpolated torsion to a bunch of molecules with as much data in the plots as possible Pavan BeharaTrevor Gokey
- Refit just tig parameters, see if fitting only TIG parameters improve FF
- By molecule fit, look at forcebalance repoJessica Maat (Deactivated)
- Use ForceBalance to fit TIG0 for a single molecule/torsion profile with Parsley + TIG0
- Pick a molecule
- Prep FFXML file for FB
- Prep targets for FB
- run FB
- Evaluate results: Check objective function before and after
- Evaluate results: check torsion profile before and after and compare to QM
- Iterate over many molecules and fit new TIG0 for each, storing k values by molecule (with sub-tasks
- pick molecules to iterate over
- iterate and fit using FB
- evaluate results for each
- Generate new plot of k vs WBO, where now we plot k from the fits out of TIG0
- Only include molecules without strong steric clashes (LJ) in this analysis
- break out molecules by TIGn parameter which WOULD be applied (even though all k values come from TIG0)
- Repeat experiment [*:1]~[#6X3:2]~[#7:3]~[*:4]
- Splitting parameters based on K versus WBO
- Generate unique smirks pattern for torsion
Two experiments:
- Fitting series of new interpolated torsion to a bunch of molecules with as much data in the plots as possible Pavan BeharaTrevor Gokey
- By molecule fit, look at forcebalance repo Jessica Maat (Deactivated)
ForceBalance runs on WBO Interpolated Parameters
The substituted phenyl dataset parameter usage using OpenFF v1.3.0 for Rowley and Phenyl Datasets:
Rowley:
Phenyl:
The substituted phenyl dataset parameter usage using FF with only proposed interpolated parameters for Rowley and Phenyl Datasets:
Rowley:
Phenyl:
The Phenyl dataset parameter usage using FF with only proposed TIG interpolated parameters for Rowley and Phenyl Datasets:
Rowley:
Phenyl:
The Phenyl dataset parameter usage using FF with only proposed it1-7 interpolated parameters for Rowley and Phenyl Datasets:
Rowley:
Phenyl:
Proposed WBO interpolated parameters from Chris Bayly:
General parameters:
TIG0 [*:1]~[#6X3:2]~[#6X3:3]~[*:4]
TIG1 [*:1]~[#6X3:2]~[#7X3:3]~[*:4]
TIG2 [*:1]~[#6X3:2]~[#7X2:3]~[*:4]
TIG3 [#6X3:1]~[#6X3:2]~[#6X3:3]~[*:4]
TIG4 [#7X2:1]~[#6X3:2]~[#6X3:3]~[*:4]
TIG5 [#7X3#8X2#16X2:1]~[#6X3:2]~[#6X3:3]~[*:4]
TIG6 [#6X3:1]~[#6X3:2]~[#7X3:3]~[*:4]
TIG7 [#7X2:1]~[#6X3:2]~[#7X3:3]~[*:4]
TIG8 [#7X3#8X2#16X2:1]~[#6X3:2]~[#7X3:3]~[*:4]
<Proper smirks="[*:1]~[#6X3:2]~[#6X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG0" idivf1="1.0"></Proper> <Proper smirks="[*:1]~[#6X3:2]~[#7X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG1" idivf1="1.0"></Proper> <Proper smirks="[*:1]~[#6X3:2]~[#7X2:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG2" idivf1="1.0"></Proper> <Proper smirks="[#6X3:1]~[#6X3:2]~[#6X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG3" idivf1="1.0"></Proper> <Proper smirks="[#7X2:1]~[#6X3:2]~[#6X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG4" idivf1="1.0"></Proper> <Proper smirks="[#7X3,#8X2,#16X2:1]~[#6X3:2]~[#6X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG5" idivf1="1.0"></Proper> <Proper smirks="[#6X3:1]~[#6X3:2]~[#7X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG6" idivf1="1.0"></Proper> <Proper smirks="[#7X2:1]~[#6X3:2]~[#7X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG7" idivf1="1.0"></Proper> <Proper smirks="[#7X3,#8X2,#16X2:1]~[#6X3:2]~[#7X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG8" idivf1="1.0"></Proper>
Parameters for substituted phenyl dataset:
TIG9 [*:1]~[#6X3:2]-[#7X3:3]-[#6X4#1:4]
TIG10 [*:1]~[#6X3:2]-[#8X2:3]-[#6X4#1:4]
TIG11 [*:1]~[#6X3:2]-[#7X3:3]-[#6X3:4]=O
TIG12 [*:1]~[#6X3:2]-[#8X2:3]-[#6X3:4]=O
TIG13 [*:1]~[#6X3:2]-[#7X3:3](~[#8X1])[#8X1:4]
TIG14 [*:1]~[#6X3:2]-[#6X3:3](-[#8H1])=[#8X1:4]
TIG15 [*:1]~[#6X3:2]-[#6X3:3](~[#8X1])[#8X1:4]
<Proper smirks="[*:1]~[#6X3:2]-[#7X3:3]-[#6X4,#1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG9" idivf1="1.0"></Proper> <Proper smirks="[*:1]~[#6X3:2]-[#8X2:3]-[#6X4,#1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG10" idivf1="1.0"></Proper> <Proper smirks="[*:1]~[#6X3:2]-[#7X3:3]-[#6X3:4]=O" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG11" idivf1="1.0"></Proper> <Proper smirks="[*:1]~[#6X3:2]-[#8X2:3]-[#6X3:4]=O" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG12" idivf1="1.0"></Proper> <Proper smirks="[*:1]~[#6X3:2]-[#7X3:3](~[#8X1])[#8X1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG13" idivf1="1.0"></Proper> <Proper smirks="[*:1]~[#6X3:2]-[#6X3:3](-[#8H1])=[#8X1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG14" idivf1="1.0"></Proper> <Proper smirks="[*:1]~[#6X3:2]-[#6X3:3](~[#8X1])[#8X1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG15" idivf1="1.0"></Proper>
Experiment 1: Adding TIG1-8, general WBO interpolated parameters to OpenFF v1.3.0:
Workflow:
Start with TIG0 parameter
Add parameter to FF hierarchy using the results from FF’s using only interpolated parameters and inspection of current FF parameters
Generate the parameter usage plots for phenyl and Rowley dataset
Compare plot to the parameter usage results using only interpolated parameters to make sure usage is reasonable
Move to next parameter, and repeat at step #1 until all parameters are added to FF
Experiment Notes:
General WBO Interpolated Parameters | SMIRKS | Insertion in v1.3.0 OpenFF |
---|---|---|
TIG0 | [*:1]~[#6X3:2]~[#6X3:3]~[*:4] Iteration 5: No change |
After addition to TIG0: Rowley DS: Substituted Phenyl DS: Iteration 5: Plot of TIG0 over all QCA data, only one point left after being drained by specific parameters that come after this |
TIG1 | Iteration 1: [*:1]~[#6X3:2]~[#7X3:3]~[*:4] Iteration 2: [*:1]~[#6X3:2]~[#7X3, #7X2-1:3]~[*:4] Iteration 4,5: TIG1a [*:1]~[#6X3:2]@[#7X3,#7X3+1,#7X2,#7X2-1:3]~[*:4] TIG1b [*:1]~[#6X3:2]@[#6X3:3]~[*:4] TIG1c [*:1]~[#6X3:2](=[#8X1])~[#7X3,#7X2-1:3]~[*:4] TIG1d [*:1]~[#6X3:2](~[#7X2])~[#7X3,#7X2-1,#7X2,#7X3+1:3]~[*:4] | Iteration 1:
Question: -Should we replace t69 parameters with TIG1? The parameters t69 and t69a: <Proper smirks="[*:1]~[#7X3,#7X2-1:2]-!@[#6X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1="1.624344674592e+00 * mole**-1 * kilocalorie" id="t69" idivf1="1.0"/> <Proper smirks="[*:1]-[#7X3:2]-[#6X3$(*=[#8,#16,#7]):3]~[*:4]" periodicity1="2" periodicity2="1" phase1="180.0 * degree" phase2="0.0 * degree" k1="1.802103402795e+00 * mole**-1 * kilocalorie" k2="0.0 * mole**-1 * kilocalorie" id="t69a" idivf1="1.0" idivf2="1.0"/> Rowley DS: Substituted Phenyl DS: Iteration 2 based on suggestions from David Mobley :
Rowley DS: Substituted Phenyl DS: Iteration 5: TIGs a and b correspond to ring torsions that correlate with high barriers and are placed at the bottom after t83 TIG1c and TIG1d are splits of TIG1 into two chemical series: (1) carbonyl with a nitrogen and (2) Formamidine type molecules After splitting to 1c and 1d, and placing 1d after TIG2 here are the plots: |
TIG2 | Iteration 1: [*:1]~[#6X3:2]~[#7X2:3]~[*:4] Iteration 2: [*:1]~[#6X3:2]~[#7X2, #7X3+1:3]~[*:4] Iteration 5: No change | Iteration 1:
*No change in plots from TIG1 experiment, was expected because Rowley and Phenyl dataset do not use TIG2 in experiment with only interpolated params. Iteration 2 based on feedback from David Mobley:
Substituted phenyl DS: Rowley dataset: *TIG2 is capturing a chemical series that t72 used to capture, this looks much better than previous parameter placement. Iteration 5: Plot of TIG2 over all QCA data |
TIG3 | [#6X3:1]~[#6X3:2]~[#6X3:3]~[*:4] Iteration 5: No change |
Parameter t48 is a more specific child parameter of TIG3:
Rowley DS: Substituted Phenyl DS: Experiment proposal from David Mobley :
Iteration 5: Plot of TIG3 over all QCA data |
TIG4 | [#7X2:1]~[#6X3:2]~[#6X3:3]~[*:4] Iteration 5: [#7X2:1]@[#6X3:2]~[#6X3:3]@[*:4] |
There are no obvious parameters in the FF that TIG4 could replace, t40 is closest but too far up in the hierarchy. This would make TIG4 come before TIG0 and so on. t40 parameter: <Proper smirks="[#6X4;r3:1]-;@[#6X4;r3:2]-[#6X3:3]=[#7X2:4]" periodicity1="3" periodicity2="2" periodicity3="1" phase1="0.0 * degree" phase2="180.0 * degree" phase3="180.0 * degree" k1="-3.973437766056e-01 * mole**-1 * kilocalorie" k2="1.595780918938e+00 * mole**-1 * kilocalorie" k3="1.436787426846e-01 * mole**-1 * kilocalorie" id="t40" idivf1="1.0" idivf2="1.0" idivf3="1.0"/> Rowley DS: Substituted Phenyl DS: Iteration 5: When this parameter is plotted over all QCA data some molecules not grouped with others make the slope negative. |
TIG5 | [#7X3, #8X2, #16X2:1]~[#6X3:2]~[#6X3:3]~[*:4] Iteration 5: TIG5a [#7X3,#8X2,#16X2:1]~[#6X3:2]~[#6X3:3](=[#8X1])~[*:4] TIG5b [#7X3,#8X2,#16X2:1]@[#6X3:2]~[#6X3:3]@[*:4] |
Rowley DS: Substituted Phenyl DS: Iteration 5: Similar to TIG1 split, here also dividing into two parameters that correspond to (1) Carbonyls with a side carbon (2) biaryl or an aryl on one side and a dangling chain on other side of the central bond Post split the plots are: 5a on left, 5b on the right |
TIG6 | Iteration 1: [#6X3:1]~[#6X3:2]~[#7X3:3]~[*:4] Iteration 2: [#6X3:1]~[#6X3:2]~[#7X3,#7X2-1:3]~[*:4] Iteration 5: No change | Iteration 1:
It’s unclear if any parameters should be replaced by these. Rowley DS: Substituted Phenyl DS: *If this plot were to perfectly reproduce the ff with only the TIG parameters, we should see only TIG3, 5, and 6 in this plot. There are also t96 and t97. This might be an indication that the parameter placement is not correct, or t96 or t97 need an interpolated parameter. Iteration 2 proposed by David Mobley :
Rerunning the parameter assignment plots with tig6 iteration 2 in openff-1.3.0-tig.offxml modified: Rowley DS: Phenyl DS: Iteration 5: Plot of TIG6 over all QCA data |
TIG7 | Iteration 1: [#7X2:1]~[#6X3:2]~[#7X3:3]~[*:4] Iteration 2: [#7X2:1]~[#6X3:2]~[#7X3,#7X2-1:3]~[*:4] | Iteration 1:
Rowley DS: Substituted Phenyl DS: Iteration 2:
Rerunning the parameter assignment plots with tig7 iteration 2 in openff-1.3.0-tig.offxml modified: Rowley: Phenyl: Iteration 5: Plot of TIG7 over all QCA data |
TIG8 | Iteration 1: [#7X3,#8X2,#16X2:1]~[#6X3:2]~[#7X3:3]~[*:4] Iteration 2: [#7X3,#8X2,#16X2:1]~[#6X3:2]~[#7X3,#7X2-1:3]~[*:4] Iteration 5: [#7X3,#8X2,#16X2:1]@[#6X3:2]~[#7X3,#7X2-1:3]~[*:4] | Iteration 1:
Rowley DS: *exactly reproduces FF results from using only TIG parameters Substituted Phenyl DS: *Same case as prior with t96 and t97 Iteration 2:
Rerunning parameter assignment: Rowley: Phenyl: Iteration 5: Making TIG8 more specific to capture biaryl or bridged-aryl type of molecules. Plot below for the entire QCA torsiondrive datasets. |
1. This is the resulting FF from the prior experiment:
2. 2nd Iteration resulting FF from the prior experiment:
3. 3rd iteration, addition of TIG1a covering ring chemistry, openff-1.3.0-tig-2.0
4. 4th iteration, TIG1a after t83 and remove t69a, openff-1.3.0-tig-2.2
5. 5th iteration, parameter splits or making parameters more specific based on chemical series, openff-1.3.0-tig-3.0
Iteration 4 FF plots here
https://docs.google.com/presentation/d/1dOUi9zz5UvWOUVLmu-KqcfH3KGay_NDTq7bIaMZ7LGk/edit?usp=sharingIteration 5 plots and chemical series analysis here
https://docs.google.com/presentation/d/1tqWBj_5TmtHuR77VkdZ1wQ0Y6eEbRx64hU5XcWWzqp0/edit?usp=sharing
Experiment 2: Adding TIG9-15, general WBO interpolated parameters to OpenFF v1.3.0:
<Proper smirks="[*:1]~[#6X3:2]-[#7X3:3]-[#6X4,#1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG9" idivf1="1.0"></Proper> <Proper smirks="[*:1]~[#6X3:2]-[#8X2:3]-[#6X4,#1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG10" idivf1="1.0"></Proper> <Proper smirks="[*:1]~[#6X3:2]-[#7X3:3]-[#6X3:4]=O" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG11" idivf1="1.0"></Proper> <Proper smirks="[*:1]~[#6X3:2]-[#8X2:3]-[#6X3:4]=O" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG12" idivf1="1.0"></Proper> <Proper smirks="[*:1]~[#6X3:2]-[#7X3:3](~[#8X1])[#8X1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG13" idivf1="1.0"></Proper> <Proper smirks="[*:1]~[#6X3:2]-[#6X3:3](-[#8H1])=[#8X1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG14" idivf1="1.0"></Proper> <Proper smirks="[*:1]~[#6X3:2]-[#6X3:3](~[#8X1])[#8X1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG15" idivf1="1.0"></Proper>
TIG9 | [*:1]~[#6X3:2]-[#7X3:3]-[#6X4#1:4] | After t78 *Rowley & phenyl DS same as openff 1.3.0 Phenyl DS: |
TIG10 | [*:1]~[#6X3:2]-[#8X2:3]-[#6X4#1:4] | |
TIG11 | [*:1]~[#6X3:2]-[#7X3:3]-[#6X3:4]=O | |
TIG12 | [*:1]~[#6X3:2]-[#8X2:3]-[#6X3:4]=O | |
TIG13 | [*:1]~[#6X3:2]-[#7X3:3](~[#8X1])[#8X1:4] | |
TIG14 | [*:1]~[#6X3:2]-[#6X3:3](-[#8H1])=[#8X1:4] | |
TIG15 | [*:1]~[#6X3:2]-[#6X3:3](~[#8X1])[#8X1:4] |
Summary of number of overlaps for different datasets with Lim Mobley Benchmark set:
Fragment Stability Benchmark: 0
OpenFF Fragmenter Validation 1.0: 3
OpenFF Full TorsionDrive Benchmark 1: 56
OpenFF Gen 2 Torsion Set 1 Roche 2: 3
OpenFF Gen 2 Torsion Set 2 Coverage 2: 0
OpenFF Gen 2 Torsion Set 3 Pfizer Discrepancy 2: 39
OpenFF Gen 2 Torsion Set 4 eMolecules Discrepancy 2: 209
OpenFF Gen 2 Torsion Set 5 Bayer 2: 0
OpenFF Gen 2 Torsion Set 6 Supplemental 2: 0
OpenFF Group1 Torsions 2: 0
OpenFF Group1 Torsions 3: 0
OpenFF Primary Benchmark 1 Torsion Set: 238
OpenFF Primary Benchmark 2 Torsion Set: 400
OpenFF Primary TorsionDrive Benchmark 1: 10
OpenFF Rowley Biaryl v1.0: 0
OpenFF Substituted Phenyl Set 1: 1
OpenFF-benchmark-ligand-fragments-v1.0: 1
Pfizer Discrepancy Torsion Dataset 1: 56
SMIRNOFF Coverage Torsion Set 1: 0
TorsionDrive Paper: 0