2020-10-30 Chemical Perception meeting notes

Date

Nov 13, 2020

Participants

  • @Tobias Huefner

  • @Trevor Gokey

  • @Jeffrey Wagner

  • @David Mobley

  • @Jessica Maat (Deactivated)

  • Caitlin Bannan (CBn)

  • @Pavan Behara

  • @Christopher Bayly (CBy)

Goals

  •  

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

 

 

FB gradients and chemical space description

Trevor Gokey

 

Alketoh dataset is now on QCA

  • wants to use datadriven chem perc based on FB gradients

  • enumerating types iin an automated manner is important

  • encoding SMARTS using bits

  • phys properties (geom, NM, …) used to fit FFs with FB

  • decomposition of FB gradients: contribution of each individual coordinate in a given molecule to obj func gradients

  • example with toy FF: C-H bond parameters for CH4

  • CBy: (TH: sorry did not fully understand…)

  • example with OHCH2OH: fitting angles and bonds inform each other

  • based on Parsley 1.2.0: OCO angle can be improved independently

Slide 5

  • CBy – On right side, there’s a proper k with only red dots. That means that this molecule “wants” to change the parameter in a single direction?

    • TG – Yes

  • CBn – Correspondingly, a column with very red spots and very blue spots indicates a need for a parameter split.

Slide 7

  • How to split terms: use cos similartiy of gradient vectors (based on rows shown on slide 6)

  • Find parameters that want to change the FF in the same way

  • find clusters how to split parameters

  • CBy – When I was doing bond charge corrections, I’d take this kind of matrix, and sum each column, and if a column had a sum with a large absolute value, seeing that as significant.

    • Here, with the cosine similarity, you’re prioritizing WHERE the gradient is, but losing HOW BIG it is.

    •  

  • TG – Doing a column sum will focus this on a physcal parameter optimization. But if we look at internal coordinates, then it focuses on the typing scheme. Later in the talk, I’ll focus on how treating these as bit vectors will make this better.

  • TG – Possible insight is that a “perfectly typed” FF should have ONE major eigenvector/value, and everything else is close to 0. That would indicate that some parameter VALUES need to change, but no SMIRKS.

Slide 8

  • CBn – This is a good scheme, and the atom bit vectors are clever, but how would it be extended to include neighbors of the atoms involved in the bonds?

    • TG – Unsure, we’ll cross that bridge whenw e get there

    • CBn + CBy – This is going to be very hard and I suspect it will fail, and that it’s necessary to make a more extensible framework before this moves forward.

  • CBy – Would really prefer to see a tnagible example of this approach being applied to something a bit more complex like Angles

  • TG – What’s the furthest out that we go from the indexed atom in our SMIRKS patterns?

    • CBn – Currently only 2, but in theory Chemper was able to make 10.

  • (General) – It’d be good to have a solution with the ability to search many bonds out from the atoms involved in the parameter

  • CBy – What if the tagged atoms were represented in a networkx graph, and then the bit vectors still applied to the nodes of the graph?

Conclusion page

  • CBn – Usually descriptors seems to be interrelated like, NX4 will always be +1 charge, and ring membership will preclude different ring memberships

    • TG – I avoid those impossible cases by a) not making parameters that don’t ever occur in a large dataset and b) stopping things like HX3 (?TG didn’t mention how?)

  • CBy – Really happy to see this work, and glad to see it provoking discussion.

    • Would you consider using a decision tree? If we can get a representation in one bit vector, then we can use decision trees/random forests.

    • Impropers take all 6 paths through a SMARTS string. There may be some useful symmetries.

    • Another way to deal with symmetry of bonds could be canonicalization. This may also let them be asymmetric in useful ways.

      • JW – WIll this lead to discontinuities in typing landscape?

    • One good thing about Chemper is that it lets us define a maximal chemical space. So if we could do the same here that could reduce the dimensionality.

  • CBn – Seeing histograms of parameter usage/gradient direction may guide parameter definition/splitting.

Action items

Decisions