Perceiving SMARTS hierarchies | @Trevor Gokey | Meeting will be recorded and slides will be uploaded Inverts how Chemper perceives and splits SMARTS Reframes question of finding SMARTS that yields a partition, to allowing the partition to also change (slide 4) PB: how many atoms can the SMARTS patterns take here? TG: Any – as big as the molecule is. However, if you keep making the SMARTS larger and larger but you still can’t distinguish between groups, you need to add new primitives (e.g. TH’s problem with stereochemistry)
TH: How do you come up with initial SMARTS patterns? TG: SMIRNOFF perception is like playing Guess Who? The primitives are like hair colour TH: The parameters on slide 8 solve my current stereochemistry issue, but I would be worried about applying these parameters to other molecules (slide 8) TH (slide 16): If I understand correctly, you’re first looking at and combining chemistries, and then looking at parameter values. So you could be combining chemistries that are very similar, but the parameter values themselves could be quite different. How possible/likely is that? TG: definitely could happen – there was an example of a bent vs linear nitrogen bond. But if the SMARTS patterns are really similar and the parameters are very different, then your SMARTS isn’t very good. Here, I used the QM data after combining SMARTS, so I didn’t even average the MM data. It is difficult with torsions, but I found that with two molecules the performance was not degraded.
DM: what’s the status of this? How much of this is done and being tested, vs. planned or in the process of implementing? DM: one warning to people who might want to use general tools like this to reproduce, e.g., GAFF, it could be of limited value. GAFF/AMBER have problems that make them overly complicated. TH: do you think you could use your approach to converge bespoke force fields into a hierarchy that can be used? TG: what would be a good test case for GAFF types? I would need to get an assignment of good molecules that communicates the intention of GAFF as much as possible DM: You could more or less run antechamber on any set of molecules TG: what would be a dataset that covers every GAFF type? PB: does SMIRNOFF coverage have all the GAFF types? DM: we cover more chemistry than GAFF. A good example would our training data – it should cover almost all GAFF types.
TH: is there an upper limit for number of molecules you can look like? How about GDB 13? It probably contains all chemistries TG: no limit – didn’t necessarily want millions of molecules TG: could be much slower if I need to go more than a layer out – the example shown today takes ~30 seconds but the GAFF types may take much longer. I’ll try on our coverage set first.
|