Page Comparison

Date

28 Sep 2021

Participants

Trevor Gokey

Goals

Discussion topics

Time

Item

Presenter

Notes

The effect of dataset quality on force field design: application to alkanes

View file

name	Chemper Call Meeting 2021-09-29.pptx

Trevor Gokey

[starting take notes after slide 5/6]
Combining smarts patterns and (here bonds), we see some molecules share some smarts patterns but some only for one molecule
Proposed solution would be to take N random selection of molecules from a given unique SMARTS
Repeat this for angle, torsions, etc…
[lengthy discussion between CB and TG, missed this, sorry]
Results for Tier 1 score:
- Normally we group score by molecule
- Now we group score (i.e. objective function) by smarts pattern [I think]
Start fitting procedure with initial parameters from QM averages (see slide 11)
slide 13 results:
- for ring dataset
- first splits out CCH angle, then ring 4 and 5 bonds
slide 15 results:
- greedy approach
- first splits out ring 3 angle
- then combine HCH angle
- CB: Did it ever split of CCH angle?
- TG: No
- CB: step 1-4, your code tries to figure out ring size, then it thinks “we are done”. Once we distinguish ring angles, we don’t need to distinguish anything else, is thast right?
- TG: Right
- CB: Find that interesting and plossible. Given your dataset, this makes sense. As we know, with ring 3, there are 2 weird angles (one in the ring and the exo-cyclic angle. I am not seeing any distinguishing between these here.
- TG: That is correct.
- CB: It looks like for the endo-cyclo rings, that is not about angle-valcence parameters, its about the geometry of the three-membered ring. It seems your program finds this. Your method now finds how special these angles are. That is very good result.
- CB: Both approaches seem to identify different chemistries. Both seem viable though. Both of approaches seem to exclude distracting information and identify important patterns [again referring to 3 membered rings].
- CB: Idea of using self-consistent iteration between different approaches
- TG (slide 16): Hydrogen information improves training performance. Is this only because we fit more moelcules? If we do randomly add molecules, we show that we actually increase variance.
- CB: Want to include an alternative interpretation. It is not only the size of the dataset, it is also the composition.
  So, your work says what are the best set of parameters to describe a large pool of compounds but it introduces a bias.
  Also, given a set of parameters what are the best set of molecules to train.
- TG: We can figure out what’s not in our datasets.
- CB: With wbo what’re the best molecules to sample for a particular parameter.
Slide 20/21
- CB: So, your method does better on a specialized set of alkanes with your special set of parameters.
- DM: You can train Sage/Parsley on this subset of alkanes with and without introducing the extra parameters.
CB: From the wbo side of things some insight, for torsions you’re looking at the four atoms in the dihedral but in wbo work the ortho substituents the 5th or 6th affect the results. So, the nonbonded effects may completely confound your designed dataset.
So, one thing you can test pick molecules with strong electrostatics like t-butyl with 3 substituted, etc.
DM: Also, we may still gain a lot by not including those effects.

Versions Compared

Old Version 1

New Version Current

Key

Date

Participants

Goals

Discussion topics

Action items

Decisions