2022-09-14 Chemical Perception meeting notes

Participants

  • @Tobias Huefner

  • @Trevor Gokey

  • @Pavan Behara

  • @David Mobley

  • @Lily Wang

  • @Jeffrey Wagner

  • @Chapin Cavender

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Update

 

@Tobias Huefner

  • TG – This formulation assumes you can uniquely idenfiy every tuple of atoms?

    • TH – Yes. This uses atom ranks in RDKit to identify symmetric atoms.

    • TG – Does this mean that rank depends on the dataset you’re looking at?

    • JW – I think that bonds would always be considered different if they’re in different molecules, but they could be said to be the same if they are in the same molecule are are between atoms with the same ranks.

    • TH – Right…. (spotty connection)

  • JW – Sometimes I see torsion profiles on the slides, but they aren’t compared to anything - How should I read those?

    • TH – They’re overlapping. Because of the small dataset and molecules the fits were quite good.

  • TG – Question: How did you find M-groups?

    • TH – If everything is unique then this is the same as doing M different optimizations. They don’t talk to each other except at the end, when they’re considering merging.

    • TG – So one molecule doesn’t influence another?

    • TH – Right

  • TG – Comment: Are you planning to train on something other than parsley?

    • TH – Yes. But for now it’s easy to do that.

    • TG – I’d love to see this trained on a larger dataset - Could see how it scales.

    • TG – Agree. Would love to have better dataset, where bonds aren’t harmonic

    • TH – Suggestions for dataset?

    • TG – AlkEthOH is on QCA

    • JW – Training on AlkEthOH - Use QM energies or MM energies?

    • TH – Would be cool to train on QC data.

    • TG – No torsiondrives or hessians in that set - Just optgeos.

  • TG – How do you encode this scheme? Are all outputs SMARTS from chemper? At a glance it looks like a matrix/encoding specific to RDKit.

    • TH – This is a major roadblock currently. There’s no way to dump the state of the z vector, other than chemper, where we then can’t load it back. What I’m looking into now is “typing by analogy”. So you take a bunch of molecules, type them, and then group them by fingerprint. Then you take a query set and try to match it to its closest fingerprint. This is agnostic to the typing strategy being used.

  • TH – Next time I’ll show the results of further sampling.

    • TG – I’ll make sure you’re on the agenda for that!

 

 

 

 

Action items

Decisions