This page contains an overview of the projects/studies aimed at improving the FF optimization process in general. Specific features and decisions made for each FF optimization cycle should be recorded in FF releases.

Objectives

Team

Main objective: Create general WBO interpolated parameters in the force field

Project driver: David Mobley Jessica Maat (Deactivated) Pavan Behara

Objectives for 2020:

Team members: David Dotson Jeffrey Wagner Chaya Stern (Deactivated) Trevor Gokey

Current Projects

Driver

Project / Study

Pavan BeharaTrevor Gokey David Mobley Jessica Maat (Deactivated)

  • ForceBalance optimization of wiberg bond order interpolated parameters

  • Analysis of QCA datasets for wbo versus torsion barrier height

Scientific Questions

Infrastructure Requirements

Desired features:

Software:

Action items

Two experiments:

ForceBalance runs on WBO Interpolated Parameters

The substituted phenyl dataset parameter usage using OpenFF v1.3.0 for Rowley and Phenyl Datasets:

Rowley:

Phenyl:

The substituted phenyl dataset parameter usage using FF with only proposed interpolated parameters for Rowley and Phenyl Datasets:

Rowley:

Phenyl:

The Phenyl dataset parameter usage using FF with only proposed TIG interpolated parameters for Rowley and Phenyl Datasets:

Rowley:

Phenyl:

The Phenyl dataset parameter usage using FF with only proposed it1-7 interpolated parameters for Rowley and Phenyl Datasets:

Rowley:

Phenyl:

Proposed WBO interpolated parameters from Chris Bayly:

General parameters:

    TIG0      [*:1]~[#6X3:2]~[#6X3:3]~[*:4]

    TIG1      [*:1]~[#6X3:2]~[#7X3:3]~[*:4]

    TIG2      [*:1]~[#6X3:2]~[#7X2:3]~[*:4]

    TIG3      [#6X3:1]~[#6X3:2]~[#6X3:3]~[*:4]

    TIG4      [#7X2:1]~[#6X3:2]~[#6X3:3]~[*:4] 

    TIG5      [#7X3#8X2#16X2:1]~[#6X3:2]~[#6X3:3]~[*:4]

    TIG6      [#6X3:1]~[#6X3:2]~[#7X3:3]~[*:4]

    TIG7      [#7X2:1]~[#6X3:2]~[#7X3:3]~[*:4] 

    TIG8      [#7X3#8X2#16X2:1]~[#6X3:2]~[#7X3:3]~[*:4]

    <Proper smirks="[*:1]~[#6X3:2]~[#6X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG0" idivf1="1.0"></Proper>
    <Proper smirks="[*:1]~[#6X3:2]~[#7X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG1" idivf1="1.0"></Proper>
    <Proper smirks="[*:1]~[#6X3:2]~[#7X2:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG2" idivf1="1.0"></Proper>
    <Proper smirks="[#6X3:1]~[#6X3:2]~[#6X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG3" idivf1="1.0"></Proper>
    <Proper smirks="[#7X2:1]~[#6X3:2]~[#6X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG4" idivf1="1.0"></Proper>
    <Proper smirks="[#7X3,#8X2,#16X2:1]~[#6X3:2]~[#6X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG5" idivf1="1.0"></Proper>
    <Proper smirks="[#6X3:1]~[#6X3:2]~[#7X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG6" idivf1="1.0"></Proper>
    <Proper smirks="[#7X2:1]~[#6X3:2]~[#7X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG7" idivf1="1.0"></Proper>
    <Proper smirks="[#7X3,#8X2,#16X2:1]~[#6X3:2]~[#7X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG8" idivf1="1.0"></Proper>

Parameters for substituted phenyl dataset:

    TIG9     [*:1]~[#6X3:2]-[#7X3:3]-[#6X4#1:4]

    TIG10      [*:1]~[#6X3:2]-[#8X2:3]-[#6X4#1:4]

    TIG11      [*:1]~[#6X3:2]-[#7X3:3]-[#6X3:4]=O

    TIG12      [*:1]~[#6X3:2]-[#8X2:3]-[#6X3:4]=O

    TIG13     [*:1]~[#6X3:2]-[#7X3:3](~[#8X1])[#8X1:4]

    TIG14      [*:1]~[#6X3:2]-[#6X3:3](-[#8H1])=[#8X1:4]

    TIG15      [*:1]~[#6X3:2]-[#6X3:3](~[#8X1])[#8X1:4] 

    <Proper smirks="[*:1]~[#6X3:2]-[#7X3:3]-[#6X4,#1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG9" idivf1="1.0"></Proper>
    <Proper smirks="[*:1]~[#6X3:2]-[#8X2:3]-[#6X4,#1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG10" idivf1="1.0"></Proper>
    <Proper smirks="[*:1]~[#6X3:2]-[#7X3:3]-[#6X3:4]=O" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG11" idivf1="1.0"></Proper>
    <Proper smirks="[*:1]~[#6X3:2]-[#8X2:3]-[#6X3:4]=O" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG12" idivf1="1.0"></Proper>
    <Proper smirks="[*:1]~[#6X3:2]-[#7X3:3](~[#8X1])[#8X1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG13" idivf1="1.0"></Proper>
    <Proper smirks="[*:1]~[#6X3:2]-[#6X3:3](-[#8H1])=[#8X1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG14" idivf1="1.0"></Proper>
    <Proper smirks="[*:1]~[#6X3:2]-[#6X3:3](~[#8X1])[#8X1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG15" idivf1="1.0"></Proper>

Experiment 1: Adding TIG1-8, general WBO interpolated parameters to OpenFF v1.3.0:

Workflow:

  1. Start with TIG0 parameter

  2. Add parameter to FF hierarchy using the results from FF’s using only interpolated parameters and inspection of current FF parameters

  3. Generate the parameter usage plots for phenyl and Rowley dataset

  4. Compare plot to the parameter usage results using only interpolated parameters to make sure usage is reasonable

  5. Move to next parameter, and repeat at step #1 until all parameters are added to FF

Experiment Notes:

General WBO Interpolated Parameters

SMIRKS

Insertion in v1.3.0 OpenFF

TIG0

[*:1]~[#6X3:2]~[#6X3:3]~[*:4]

Iteration 5: No change

  • Replace t43-t45

After addition to TIG0:

Rowley DS:

Substituted Phenyl DS:

Iteration 5:

Plot of TIG0 over all QCA data, only one point left after being drained by specific parameters that come after this

TIG1

Iteration 1: [*:1]~[#6X3:2]~[#7X3:3]~[*:4]

Iteration 2: [*:1]~[#6X3:2]~[#7X3, #7X2-1:3]~[*:4]

Iteration 4,5:

TIG1a [*:1]~[#6X3:2]@[#7X3,#7X3+1,#7X2,#7X2-1:3]~[*:4]

TIG1b [*:1]~[#6X3:2]@[#6X3:3]~[*:4]

TIG1c [*:1]~[#6X3:2](=[#8X1])~[#7X3,#7X2-1:3]~[*:4]

TIG1d [*:1]~[#6X3:2](~[#7X2])~[#7X3,#7X2-1,#7X2,#7X3+1:3]~[*:4]

Iteration 1:

  • Below t69 and t69a

Question:

-Should we replace t69 parameters with TIG1?

The parameters t69 and t69a:

    <Proper smirks="[*:1]~[#7X3,#7X2-1:2]-!@[#6X3:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1="1.624344674592e+00 * mole**-1 * kilocalorie" id="t69" idivf1="1.0"/>
    <Proper smirks="[*:1]-[#7X3:2]-[#6X3$(*=[#8,#16,#7]):3]~[*:4]" periodicity1="2" periodicity2="1" phase1="180.0 * degree" phase2="0.0 * degree" k1="1.802103402795e+00 * mole**-1 * kilocalorie" k2="0.0 * mole**-1 * kilocalorie" id="t69a" idivf1="1.0" idivf2="1.0"/>

Rowley DS:

Substituted Phenyl DS:

Iteration 2 based on suggestions from David Mobley :

  • Change tig0 to [*:1]~[#6X3:2]~[#7X3, #7X2-1:3]~[*:4]

  • Replace t69 with tig0

Rowley DS:

Substituted Phenyl DS:

Iteration 5:

 TIGs a and b correspond to ring torsions that correlate with high barriers and are placed at the bottom after t83

TIG1c and TIG1d are splits of TIG1 into two chemical series:

(1) carbonyl with a nitrogen and

(2) Formamidine type molecules

After splitting to 1c and 1d, and placing 1d after TIG2 here are the plots:

TIG2

Iteration 1: [*:1]~[#6X3:2]~[#7X2:3]~[*:4]

Iteration 2: [*:1]~[#6X3:2]~[#7X2, #7X3+1:3]~[*:4]

Iteration 5: No change

Iteration 1:

  • After t80, which is the most specific parameter with “[#6X3:2] [#7X2:3]”

*No change in plots from TIG1 experiment, was expected because Rowley and Phenyl dataset do not use TIG2 in experiment with only interpolated params.

Iteration 2 based on feedback from David Mobley:

  • Generalize parameter to [*:1]~[#6X3:2]~[#7X2, #7X3+1:3]~[*:4]

  • Replace t76-78 with tig2

Substituted phenyl DS:

Rowley dataset:

*TIG2 is capturing a chemical series that t72 used to capture, this looks much better than previous parameter placement.

Iteration 5:

Plot of TIG2 over all QCA data

TIG3

[#6X3:1]~[#6X3:2]~[#6X3:3]~[*:4]

Iteration 5: No change

  • Replace t48 with TIG3

Parameter t48 is a more specific child parameter of TIG3:

<Proper smirks="[#6X3:1]=[#6X3:2]-[#6X3:3]=[#8X1:4]" periodicity1="2" periodicity2="3" phase1="180.0 * degree" phase2="0.0 * degree" k1="3.980018770084e-01 * mole**-1 * kilocalorie" k2="-3.9 38620201049e-01 * mole**-1 * kilocalorie" id="t48" idivf1="1.0" idivf2="1.0"/>

Rowley DS:

Substituted Phenyl DS:

Experiment proposal from David Mobley :

  • Retain t48 below TIG3 and (maybe) add interpolation.

  • Run this experiment in fitting, or see how (once you have more data on this plots) it would separate things.

Iteration 5:

Plot of TIG3 over all QCA data

TIG4

[#7X2:1]~[#6X3:2]~[#6X3:3]~[*:4] 

Iteration 5: [#7X2:1]@[#6X3:2]~[#6X3:3]@[*:4]

  • After TIG3 parameter, which is the closest parameter to TIG4.

There are no obvious parameters in the FF that TIG4 could replace, t40 is closest but too far up in the hierarchy. This would make TIG4 come before TIG0 and so on.

t40 parameter:

    <Proper smirks="[#6X4;r3:1]-;@[#6X4;r3:2]-[#6X3:3]=[#7X2:4]" periodicity1="3" periodicity2="2" periodicity3="1" phase1="0.0 * degree" phase2="180.0 * degree" phase3="180.0 * degree" k1="-3.973437766056e-01 * mole**-1 * kilocalorie" k2="1.595780918938e+00 * mole**-1 * kilocalorie" k3="1.436787426846e-01 * mole**-1 * kilocalorie" id="t40" idivf1="1.0" idivf2="1.0" idivf3="1.0"/>

Rowley DS:

Substituted Phenyl DS:

Iteration 5:

When this parameter is plotted over all QCA data some molecules not grouped with others make the slope negative. Making this parameter more specific to biaryl kind of molecules alleviates this problem. After adding the data from ‘OpenFF Group 1 Torsions’ we still see a downtrend for this parameter

TIG5

[#7X3, #8X2, #16X2:1]~[#6X3:2]~[#6X3:3]~[*:4]

Iteration 5:

TIG5a [#7X3,#8X2,#16X2:1]~[#6X3:2]~[#6X3:3](=[#8X1])~[*:4]

TIG5b [#7X3,#8X2,#16X2:1]@[#6X3:2]~[#6X3:3]@[*:4]

  • TIG5 was placed after TIG4, these parameters cover chemistries of sp2 carbon-carbon bonds.

Rowley DS:

Substituted Phenyl DS:

Iteration 5:

Similar to TIG1 split, here also dividing into two parameters that correspond to

(1) Carbonyls with a side carbon

(2) biaryl or an aryl on one side and a dangling chain on other side of the central bond

Post split the plots are:

5a on left, 5b on the right

TIG6

Iteration 1: [#6X3:1]~[#6X3:2]~[#7X3:3]~[*:4]

Iteration 2: [#6X3:1]~[#6X3:2]~[#7X3,#7X2-1:3]~[*:4]

Iteration 5: No change

Iteration 1:

  • After t78, there are a lot of parameters that cover #7X3 - #6 chemistry.

It’s unclear if any parameters should be replaced by these.

Rowley DS:

Substituted Phenyl DS:

*If this plot were to perfectly reproduce the ff with only the TIG parameters, we should see only TIG3, 5, and 6 in this plot. There are also t96 and t97.

This might be an indication that the parameter placement is not correct, or t96 or t97 need an interpolated parameter.

Iteration 2 proposed by David Mobley :

  • Change parameter to [#6X3:1]~[#6X3:2]~[#7X3,#7X2-1:3]~[*:4]

  • Place tig6 after tig1

Rerunning the parameter assignment plots with tig6 iteration 2 in openff-1.3.0-tig.offxml modified:

Rowley DS:

Phenyl DS:

Iteration 5:

Plot of TIG6 over all QCA data

TIG7

Iteration 1: [#7X2:1]~[#6X3:2]~[#7X3:3]~[*:4] 

Iteration 2: [#7X2:1]~[#6X3:2]~[#7X3,#7X2-1:3]~[*:4] 

Iteration 1:

  • After TIG6.

Rowley DS:

Substituted Phenyl DS:

Iteration 2:

  • Placed after tig6 still, but tig6 is moved in the FF as outlined above

  • Parameter changed to: [#7X2:1]~[#6X3:2]~[#7X3,#7X2-1:3]~[*:4] 

Rerunning the parameter assignment plots with tig7 iteration 2 in openff-1.3.0-tig.offxml modified:

Rowley:

Phenyl:

Iteration 5:

Plot of TIG7 over all QCA data

TIG8

Iteration 1: [#7X3,#8X2,#16X2:1]~[#6X3:2]~[#7X3:3]~[*:4]

Iteration 2: [#7X3,#8X2,#16X2:1]~[#6X3:2]~[#7X3,#7X2-1:3]~[*:4]

Iteration 5: [#7X3,#8X2,#16X2:1]@[#6X3:2]~[#7X3,#7X2-1:3]~[*:4]

Iteration 1:

  • TIG8 placed after TIG7.

  • Parameters TIG6-8 are in a similar hierarchy.

Rowley DS:

*exactly reproduces FF results from using only TIG parameters

Substituted Phenyl DS:

*Same case as prior with t96 and t97

Iteration 2:

  • Parameter changed to [#7X3,#8X2,#16X2:1]~[#6X3:2]~[#7X3,#7X2-1:3]~[*:4]

  • Placed after new placement of tig7

Rerunning parameter assignment:

Rowley:

Phenyl:

Iteration 5:

Making TIG8 more specific to capture biaryl or bridged-aryl type of molecules. Plot below for the entire QCA torsiondrive datasets.

1. This is the resulting FF from the prior experiment:

2. 2nd Iteration resulting FF from the prior experiment:

3. 3rd iteration, addition of TIG1a covering ring chemistry, openff-1.3.0-tig-2.0

4. 4th iteration, TIG1a after t83 and remove t69a, openff-1.3.0-tig-2.2

5. 5th iteration, parameter splits or making parameters more specific based on chemical series, openff-1.3.0-tig-3.0

Experiment 2: Adding TIG9-15, general WBO interpolated parameters to OpenFF v1.3.0:

    <Proper smirks="[*:1]~[#6X3:2]-[#7X3:3]-[#6X4,#1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG9" idivf1="1.0"></Proper>
    <Proper smirks="[*:1]~[#6X3:2]-[#8X2:3]-[#6X4,#1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG10" idivf1="1.0"></Proper>
    <Proper smirks="[*:1]~[#6X3:2]-[#7X3:3]-[#6X3:4]=O" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG11" idivf1="1.0"></Proper>
    <Proper smirks="[*:1]~[#6X3:2]-[#8X2:3]-[#6X3:4]=O" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG12" idivf1="1.0"></Proper>
    <Proper smirks="[*:1]~[#6X3:2]-[#7X3:3](~[#8X1])[#8X1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG13" idivf1="1.0"></Proper>
    <Proper smirks="[*:1]~[#6X3:2]-[#6X3:3](-[#8H1])=[#8X1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG14" idivf1="1.0"></Proper>
    <Proper smirks="[*:1]~[#6X3:2]-[#6X3:3](~[#8X1])[#8X1:4]" periodicity1="2" phase1="180.0 * degree" k1_bondorder1="1*kilocalories_per_mole" k1_bondorder2="10*kilocalories_per_mole" id="TIG15" idivf1="1.0"></Proper>

    TIG9    

[*:1]~[#6X3:2]-[#7X3:3]-[#6X4#1:4]

After t78

*Rowley & phenyl DS same as openff 1.3.0

Phenyl DS:

    TIG10     

[*:1]~[#6X3:2]-[#8X2:3]-[#6X4#1:4]

    TIG11     

[*:1]~[#6X3:2]-[#7X3:3]-[#6X3:4]=O

    TIG12     

[*:1]~[#6X3:2]-[#8X2:3]-[#6X3:4]=O

    TIG13    

[*:1]~[#6X3:2]-[#7X3:3](~[#8X1])[#8X1:4]

    TIG14     

[*:1]~[#6X3:2]-[#6X3:3](-[#8H1])=[#8X1:4]

    TIG15     

[*:1]~[#6X3:2]-[#6X3:3](~[#8X1])[#8X1:4] 

Summary of number of overlaps for different datasets with Lim Mobley Benchmark set: