2020-08-07 Chemical Perception meeting notes

Date

Aug 7, 2020

Participants

  • @Hyesu Jang

  • @Jeffrey Wagner

  • @Christopher Bayly (CIB)

  • Caitlin Colleen Bannan (CCB)

  • @Trevor Gokey

  • @Jessica Maat (Deactivated)

  • @Lee-Ping Wang

  • @David Mobley

  • Tobias Huefner

Discussion topics

Item

Notes

Item

Notes

amide issue

  • DM – Issue has been discussed in #striketeam channel. Basically amides look too flexible. Arjun originally asked about 1.2, but it looks like all of our FFs have the issue.

  • TG –

     

  • TG – Only one torsiondrive in benchmark set matched blue pattern. Arjun had done some torsionsdrives with harmonic constraints

  • LPW – In geometric, wavefront propagation won’t be happening, but constrained minimization should be done identically

  • DM – Important thing is the shape of the area around the minimum (like, below 5 kcal/mol)

  • CIB – Looks like force constant for 1.2 is lower than 1.0, causing it to separate from QM close to the minimum.

  • CIB – My intuition is that the blue dataset includes carbamates and ureas, and so they should have a lower force constant (and this is what is seen)

  • CIB – We should identify what in the training set caused the lowering of the k value

  • (Hyesu’s slides)

  • HJ – A related issue is dialkyl amide. This relates out fitting of t69 (dialkyl amide) and t70 (amide)

  • CIB – I’d be in favor of making several t70 types. One would be t70a, which would be a general parameters. Then I’d make a variant “t70b” that changes atom 4 to specifically an O S or N. “t70c” would specify another substituent that is specifically an sp3 carbon (or maybe also a hydrogen).

  • CIB – We’d want to treat the amide group and differentiate it from other chemistries, which don’t have such high torsion barriers. We want to differentiate having a hydrogen/sp3 carbon substituent.

  • CCB – What I noticed with Trevor’s distributions are that the SMARTS used to select those weren’t the same as those used in the FF. I’d be interested to see whether the distributions of these parameters vs child parameters would indicate whether we NEED new parameters, and to get an estimate of the benefit of adding them. May be able to just do angle distributions and nitrogen plane angles.

    • CCB – This process would also be a stepping stone toward automated parameter type optimization

    • CIB – Agree. (Some thoughts about how to gradually turn this into automated parameter type optimization, will be on recording)

    • LPW – The gradient idea CIB mentioned has been on our docket for a while. I’ve been working with HJ to start on that recently.

    • LPW – Regarding pulling out QM torsion profiles based on SMARTS grouping – The QM energy has a lot of other factors that effectively add noise on top of the torsion energy. One development we could work towards is a torsion energy decomposition.

  • DM – I think the correctly-identifying-flat-nitrogens issue may be separable

    • CIB (later) – I’m concerned that these may not be entirely decoupled from the current amide issue.

  • (General) – we want to take this decision process and try to automate it. But we want a quick fix in the short term.

  • DM – Short term plan?

    • CIB – Differentiate known different chemistries. First place to do that is substituent on other side of carbonyl. We need to distinguish between having a sp3 carbon/H/(maybe also sp2 carbon), versus any other heteroatom.

    • Next distinguishing factor could be ureas

    • Starting:

      • t70a [:1]-[#7X3:2]-!@[#6X3$(=[#8,#16,#7]):3]~[*:4] 


      • t70 [#1:1]-[#7X3:2]-[#6X3:3]=[#8,#16,#7:4]

    • Proposed:

      • Keep t70 and t70a and add others below (maybe rename t70a to t69a?)

      • t70b. [*:1]-[#7X3:2]-[#6X3:3](=[#8,#16,#7:4])-[#6,#1]

      • t70c. [#1:1]-[#7X3:2]-[#6X3:3](=[#8,#16,#7:4])-[#6,#1]

      • t70d. [*:1]-[#7X3:2]-[#6X3:3](=[#8,#16,#7:4])-[#7X3]

    • HJ – If we add new torsion terms, we’ll want to add other training data as well

    • DM – We can push a quick fix in 1.2.1

  • HJ – What is our starting point for optimization? If we start from 1.2.0, then the fitting will be faster, but may be less accurate.

    • CIB – 1.2.0 has shown itself to be much better with respect to finding minima, so I’d be tempted to start with that. I’m a little concerned that we’d wind up adding complexity to our FF lines by optimizing one based on something other than s99F.

    • CCB – As a user, I’d be surprised if the 1.2.1 refit started from anything OTHER than 1.2.0.

  • TG – Which training+bnechmarking datasets should I be looking at?

    • HJ – Will contact you offline

  • Action items:

    • HJ – Can check out coverage of new parameters, to see if they’re capturing the right chemistry from the training set. Then can do FB fit, and see if final values vary much.

    • TG will get refit FF from Hyesu and run same benchmarking infrastructure on it. Will also work on chopping the data up more finely to do open-ended study of parameter distinction.

 

 

Action items

Decisions