| |
---|
Slide 1 | MG – In other discussions, there was a statement that “ELF10” would be in Rosemary. What’s the plan? SB – The plan is for as much to move forward independently as possible. So the biopolymer force should move forward with !LF10 AM1BCC. If and when the GCN is done and validated, if Rosemary isn’t out yet, then it will
CB – SB, you said that, with Rosemary, we could have these two charge models dropped in. So in case A, the two charge models (explicit AM1BCC/library charges and GCN charges) converge and we can swap them out. But in the other, we need to fit all the valence terms so we need to pick the charge model before we start training. JC – So, the hope is that the two converge, and if they don’t, then we won’t incorporate the GCN in Rosemary.
MS – So, this is specifically a GCN fit to AM1BCC ELF10? SB – There are some choices, I’ll go over them. The MVP is a GCN fit to AM1, and we apply SMARTS based BCCs on top of it. The goal is to target AM1, but it’s possible that we could do RESP charges since they’re all kinda based on the same thing. CB – Agree. It’s my hope that the charge models will be convergent, and that future work can bring in RESP charges and better stuff as it comes.
|
Slide 3 | OM – Is there a decision on whether the network would be trained to AM1 and apply BCCs on top, or train to AM1BCC? JC – Bronze medal is pretty ambitious - If the GCN can reproduce AM1 charges within inter-toolkit differences then I think that’s a win. CB – I’d like to introduce the idea of a “platinum medal”, where we fit the GCN to reproduce ESP, not just RESP point charges. SB – I’ll answer that on the next slide. CB – My idea here is to fit to a more fundamental “ground truth” from QM. The RESP point charges are already squeezed through an information bottleneck. SB – I’d say that we would have already seen this… Since they’re all trying to reproduce ESPs it should all be the same thing. CB – In practice we saw numerical instabilities in the early 1990s fitting to ESPs, and that’s why we had to add restraints. During that work, I found that there wasn’t a clear unique solution to the assignment of point charges, and charges would become large in magnitude. And I think vsites would fix a lot of this. DC – Seconding numerical instabilities DC – When you talk about RESP charges, are you thinking RESP2? SB – RESP2 would be super expensive. When I was computing ESP data from the molecules from the industry benchmark set, I was finding that 8 cores would go through about 5 molecules per day. DC – So if it takes a long time to even do them in vaccuum, then we’ll need BCC corrections, since we won’t have the even MORE expensive implicit solvent calcs. SB – If we fit to polarizability then we should be in good shape
|
Slide 4 | |
Slide 5 | |
Slide 7 | SB will add links to mentioned datasets SB – I’m not sure where to put these datasets. Not sure that they’d be appropriate in qca-dataset-submission. CB – As an industry guy, one thing that’s emerged in the cheminformatics world, is that the enamine isn’t actually that chemically diverse. The Riniker and Bleiziffer sets will probably capture the widest range of chemistry. JC – The NCI250k is also a good idea. DM – Yeah, I think the diversity of these should probably be Riniker > OpenFF industry > Enamine.
|
Slide 9 | CB – So, we need the GCN to know about the effects of distant functional groups. Are we concerned about the possibility that fragmentation could cleave those distant groups? Maybe a better fragmentation scheme is needed? SB – It’s kinda tricky. With a GCN, it will only look a certain number of “hops” away from each atom, which is 5 or 6 bonds right now. So we will want to be careful about how to fragment. MG – Could we add in some large molecules and have them help constrain this training/help us benchmark if we capture important long-distance effects? CB – That’s a good idea. Also, Chaya’s WBO-based fragmentation methods could help here. SB – Chaya’s fragmentation method is conservative, and often gives us large fragments. CB – We need to make sure that we need to have some long-distance delocalization n the training set SB – I want to emphasize that fragmenter will include rings, and amide bonds, and rings connected to other rings by amide bonds, and we end up getting unreasonably large fragments from this approach.
DM – 12 heavy atoms seems like it will limit the chemistry in certain ways which may sometimes be concerning. JC – Even dipeptides are bigger than that! DM – Yeah, though I was thinking more about things that cannot be fragmented, e.g. amitriptyline or similar: Amitriptyline or morphine, etc. Morphine
SB – I’d like to try feeding in larger molecules, since this approach went so well. So runtime is the major constraint.
|
Slide 14 | |
Slide 16 | JC – One tough thing is that we can get very different performances based on different random seeds/initial weights. Also the choice of number of layers is kinda arbitrary. Also vanishing gradient problem. SB – I’ll chat with Yuanqing Wang about some of these details
|
| |