2023-02-06 Chemical Perception meeting notes

Participants

  • @David Mobley

  • @Pavan Behara

  • @Trevor Gokey

  • @Jeffrey Wagner

  • @Chapin Cavender

  • @Tobias Huefner

  • @Lily Wang

Goals

  •  

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Perceiving SMARTS hierarchies

@Trevor Gokey

  • Meeting will be recorded and slides will be uploaded

  • Inverts how Chemper perceives and splits SMARTS

  • Reframes question of finding SMARTS that yields a partition, to allowing the partition to also change (slide 4)

  • PB: how many atoms can the SMARTS patterns take here?

    • TG: Any – as big as the molecule is. However, if you keep making the SMARTS larger and larger but you still can’t distinguish between groups, you need to add new primitives (e.g. TH’s problem with stereochemistry)

  • TH: How do you come up with initial SMARTS patterns?

    • TG: I take the union of clusters for general patterns and intersection for separating patterns, but you don’t always need to start off with initial/general patterns

  • TG: SMIRNOFF perception is like playing Guess Who? The primitives are like hair colour

  • TH: The parameters on slide 8 solve my current stereochemistry issue, but I would be worried about applying these parameters to other molecules (slide 8)

    • TG: you would need to increase your training dataset. The patterns on this slide will change depending on that. e.g. more data needed to separate out the H0 primitive

  • TH (slide 16): If I understand correctly, you’re first looking at and combining chemistries, and then looking at parameter values. So you could be combining chemistries that are very similar, but the parameter values themselves could be quite different. How possible/likely is that?

    • TG: definitely could happen – there was an example of a bent vs linear nitrogen bond. But if the SMARTS patterns are really similar and the parameters are very different, then your SMARTS isn’t very good. Here, I used the QM data after combining SMARTS, so I didn’t even average the MM data. It is difficult with torsions, but I found that with two molecules the performance was not degraded.

  • DM: what’s the status of this? How much of this is done and being tested, vs. planned or in the process of implementing?

    • TG: I have an implementation but I need to scale up with real data. The mapping and operation parts have been tricky. I hope to make this public soon. I think it works, using TH’s data as a test case

  • DM: one warning to people who might want to use general tools like this to reproduce, e.g., GAFF, it could be of limited value. GAFF/AMBER have problems that make them overly complicated.

    • TG: that’s where adding the fuzziness parameter likely helped. Converting GAFF to a SMIRNOFF hierarchy allows me to plug it into my code and improve it.

  • TH: do you think you could use your approach to converge bespoke force fields into a hierarchy that can be used?

    • TG: that’s the aim – I’m just clustering based on SMARTS.

  • TG: what would be a good test case for GAFF types? I would need to get an assignment of good molecules that communicates the intention of GAFF as much as possible

    • DM: You could more or less run antechamber on any set of molecules

    • TG: what would be a dataset that covers every GAFF type?

    • PB: does SMIRNOFF coverage have all the GAFF types?

    • DM: we cover more chemistry than GAFF. A good example would our training data – it should cover almost all GAFF types.

  • TH: is there an upper limit for number of molecules you can look like? How about GDB 13? It probably contains all chemistries

    • TG: no limit – didn’t necessarily want millions of molecules

    • TG: could be much slower if I need to go more than a layer out – the example shown today takes ~30 seconds but the GAFF types may take much longer. I’ll try on our coverage set first.

 

 

 

Action items

Decisions