2022-12-15 Chemical Perception meeting notes

Participants

  • @David Mobley

  • @Michael Gilson

  • @Tobias Huefner

  • @Trevor Gokey

  • @Pavan Behara

  • @Jeffrey Wagner

  • @Lily Wang

  • @Chapin Cavender

Discussion topics

Item

Notes

Item

Notes

Update Tobias

  • https://docs.google.com/presentation/d/1zh7n5XHUbOruDAoYBMyIcW0PGU37jxh806ROvkxYBOA/edit?usp=sharing

  • Slide 3

    • MG – How is the new fit performing so poorly on the training set? Is it a limitation of the functional form?

    • TH – Possibly.

    • MG – Maybe a typing problem?

    • CC – It does look like molecule 1+4 have fundamentally different barrier heights.

    • DM – Also, to clarify, we’re looking at the residual energy after subtracting the other FF terms out. And we intentionally decided not to fit phase in previous mainline FFs to avoid overfitting.

      • TH – I also had trouble on a technical level trying to fit phase.

    •  

  • TH – Sometimes I can’t get a SMARTS from ChemPer

    • JW – More details? What’s the limitation in Chemper?

    • TH – Things like double bond stereochemistry

  • DM – Explain how we know what “correct” means again?

    • MG – Basically, TH would use Parsley 1.3.0 to directly assign parameters to GDB 6,7,8. Then for a molecule in GDB9, it would find the most similar motif in GDB6/7/8, and just directly steal the parameter from that, and we’d evaluate how often the stolen parameter is the same as what Parsley 1.3.0 would have assigned.

    • JW – Could be cool to see how this does on QM benchmarks, like “is it better to blend the 3 most similar parameters than to just pick the top one?”

    • MG – Could be interesting to further customize the fingerprint method as well. Like, if there’s an N in one position, and the clsest matches have an H or a O in that place. It’d be interesting to see how the optimal fingerprinting method choose between them, and whether that would follow any chemical principles.

  • DM – It would be good to work on the definition of “correct” here - The “correct” parameter assigned by the FF isn’t necessarily perfect - It could be better to compare to QM energies/geometries. For example, for a long time we tried to reproduce GAFF typing, but eventually we had to give up.

    • TH – Interesting, yeah, we could do something like fitting to a FF based on bespokefit.

    • DM – Could be good to talk to DCole about that.

    • TH – One possible result if we used bespoke learning is that there is a finite set of underlying types in the bespoke parameter set.

    • DM – Our philosophy is that we don’t need a huge set of parameters to do a pretty good job describing the physics of a large swath of chemistry.

    • MG – I think you’re saying is that it would be better to test against quantum data, and then the thing to compare to is QUBEKit… but maybe I’m misunderstanding, couldn’t you just compare directly to quantum data?

    • DM – Yes, but the error in the “typing by analogy” slide plots is in terms of torsion k.

    • MG – Yeah, but qubekit is constrained, whereas bespokefit isn’t as much

    • DM – Yeah, but bespokefit only fits torsions.

  • TG – With ChemPer, I’ve also had trouble where I can’t work with the partitions it makes. So I wondered if you could drop one or two partitions and make it work.

    • TH – So you’re saying that you can drop some atoms that would be in the partition to make it work?

    • TG – Yeah, I haven’t tested yet, but I’ve found cases where I think I could have solved it by swapping out some atoms. I’ll let you know if I have any success on that front. These would be atom index swaps, so maybe in cases where chemper fails, there’s a small number of swaps that we can do to fix it…

    • TH – I tried that as well - Either I’m hitting a problem where there is no permutation that works, or the number of permutations is so large that it doesn’t work any more.

    •  

    •  

  •  

 

 

Action items

Decisions