2020-10-30 Meeting notes

Date

13 Nov 2020

Item

Presenter

Notes

FB gradients and chemical space description

Trevor Gokey

Trevor Gokey will add link to slides here

Alketoh dataset is now on QCA
wants to use datadriven chem perc based on FB gradients
enumerating types iin an automated manner is important
encoding SMARTS using bits
phys properties (geom, NM, …) used to fit FFs with FB
decomposition of FB gradients: contribution of each individual coordinate in a given molecule to obj func gradients
example with toy FF: C-H bond parameters for CH4
CBy: (TH: sorry did not fully understand…)
example with OHCH2OH: fitting angles and bonds inform each other
based on Parsley 1.2.0: OCO angle can be improved independently

Slide 5

CBy – On right side, there’s a proper k with only red dots. That means that this molecule “wants” to change the parameter in a single direction?
- TG – Yes
CBn – Correspondingly, a column with very red spots and very blue spots indicates a need for a parameter split.

Slide 7

How to split terms: use cos similartiy of gradient vectors (based on rows shown on slide 6)
Find parameters that want to change the FF in the same way
find clusters how to split parameters
CBy – When I was doing bond charge corrections, I’d take this kind of matrix, and sum each column, and if a column had a sum with a large absolute value, seeing that as significant.
- Here, with the cosine similarity, you’re prioritizing WHERE the gradient is, but losing HOW BIG it is.
TG – Doing a column sum will focus this on a physcal parameter optimization. But if we look at internal coordinates, then it focuses on the typing scheme. Later in the talk, I’ll focus on how treating these as bit vectors will make this better.
TG – Possible insight is that a “perfectly typed” FF should have ONE major eigenvector/value, and everything else is close to 0. That would indicate that some parameter VALUES need to change, but no SMIRKS.

Slide 8

CBn – This is a good scheme, and the atom bit vectors are clever, but how would it be extended to include neighbors of the atoms involved in the bonds?
- TG – Unsure, we’ll cross that bridge whenw e get there
- CBn + CBy – This is going to be very hard and I suspect it will fail, and that it’s necessary to make a more extensible framework before this moves forward.
CBy – Would really prefer to see a tnagible example of this approach being applied to something a bit more complex like Angles
TG – What’s the furthest out that we go from the indexed atom in our SMIRKS patterns?
- CBn – Currently only 2, but in theory Chemper was able to make 10.
(General) – It’d be good to have a solution with the ability to search many bonds out from the atoms involved in the parameter
CBy – What if the tagged atoms were represented in a networkx graph, and then the bit vectors still applied to the nodes of the graph?

Conclusion page

CBn – Usually descriptors seems to be interrelated like, NX4 will always be +1 charge, and ring membership will preclude different ring memberships
- TG – I avoid those impossible cases by a) not making parameters that don’t ever occur in a large dataset and b) stopping things like HX3 (?TG didn’t mention how?)
CBy – Really happy to see this work, and glad to see it provoking discussion.
- Would you consider using a decision tree? If we can get a representation in one bit vector, then we can use decision trees/random forests.
- Impropers take all 6 paths through a SMARTS string. There may be some useful symmetries.
- Another way to deal with symmetry of bonds could be canonicalization. This may also let them be asymmetric in useful ways.
  - JW – WIll this lead to discontinuities in typing landscape?
- One good thing about Chemper is that it lets us define a maximal chemical space. So if we could do the same here that could reduce the dimensionality.
CBn – Seeing histograms of parameter usage/gradient direction may guide parameter definition/splitting.