2021-09-28 Chemical Perception meeting notes

Date

Sep 28, 2021

Participants

  • @Trevor Gokey

Goals

  •  

Discussion topics

Time

Item

Presenter

Notes

Time

Item

Presenter

Notes

 

The effect of dataset quality on force field design: application to alkanes

 

@Trevor Gokey

  • [starting take notes after slide 5/6]

  • Combining smarts patterns and (here bonds), we see some molecules share some smarts patterns but some only for one molecule

  • Proposed solution would be to take N random selection of molecules from a given unique SMARTS

  • Repeat this for angle, torsions, etc…

  • [lengthy discussion between CB and TG, missed this, sorry]

  • Results for Tier 1 score:

    • Normally we group score by molecule

    • Now we group score (i.e. objective function) by smarts pattern [I think]

  • Start fitting procedure with initial parameters from QM averages (see slide 11)

  • slide 13 results:

    • for ring dataset

    • first splits out CCH angle, then ring 4 and 5 bonds

  • slide 15 results:

    • greedy approach

    • first splits out ring 3 angle

    • then combine HCH angle

    • CB: Did it ever split of CCH angle?

    • TG: No

    • CB: step 1-4, your code tries to figure out ring size, then it thinks “we are done”. Once we distinguish ring angles, we don’t need to distinguish anything else, is thast right?

    • TG: Right

    • CB: Find that interesting and plossible. Given your dataset, this makes sense. As we know, with ring 3, there are 2 weird angles (one in the ring and the exo-cyclic angle. I am not seeing any distinguishing between these here.

    • TG: That is correct.

    • CB: It looks like for the endo-cyclo rings, that is not about angle-valcence parameters, its about the geometry of the three-membered ring. It seems your program finds this. Your method now finds how special these angles are. That is very good result.

    • CB: Both approaches seem to identify different chemistries. Both seem viable though. Both of approaches seem to exclude distracting information and identify important patterns [again referring to 3 membered rings].

    • CB: Idea of using self-consistent iteration between different approaches

    • TG (slide 16): Hydrogen information improves training performance. Is this only because we fit more moelcules? If we do randomly add molecules, we show that we actually increase variance.

    • CB: Want to include an alternative interpretation. It is not only the size of the dataset, it is also the composition.
      So, your work says what are the best set of parameters to describe a large pool of compounds but it introduces a bias.
      Also, given a set of parameters what are the best set of molecules to train.

    • TG: We can figure out what’s not in our datasets.

    • CB: With wbo what’re the best molecules to sample for a particular parameter.

  • Slide 20/21

    • CB: So, your method does better on a specialized set of alkanes with your special set of parameters.

    • DM: You can train Sage/Parsley on this subset of alkanes with and without introducing the extra parameters.

  • CB: From the wbo side of things some insight, for torsions you’re looking at the four atoms in the dihedral but in wbo work the ortho substituents the 5th or 6th affect the results. So, the nonbonded effects may completely confound your designed dataset.
    So, one thing you can test pick molecules with strong electrostatics like t-butyl with 3 substituted, etc.

  • DM: Also, we may still gain a lot by not including those effects.









Action items

Decisions