Update | @Tobias Huefner | TG – This formulation assumes you can uniquely idenfiy every tuple of atoms? TH – Yes. This uses atom ranks in RDKit to identify symmetric atoms. TG – Does this mean that rank depends on the dataset you’re looking at? JW – I think that bonds would always be considered different if they’re in different molecules, but they could be said to be the same if they are in the same molecule are are between atoms with the same ranks. TH – Right…. (spotty connection)
JW – Sometimes I see torsion profiles on the slides, but they aren’t compared to anything - How should I read those? TG – Question: How did you find M-groups? TH – If everything is unique then this is the same as doing M different optimizations. They don’t talk to each other except at the end, when they’re considering merging. TG – So one molecule doesn’t influence another? TH – Right
TG – Comment: Are you planning to train on something other than parsley? TH – Yes. But for now it’s easy to do that. TG – I’d love to see this trained on a larger dataset - Could see how it scales. TG – Agree. Would love to have better dataset, where bonds aren’t harmonic TH – Suggestions for dataset? TG – AlkEthOH is on QCA JW – Training on AlkEthOH - Use QM energies or MM energies? TH – Would be cool to train on QC data. TG – No torsiondrives or hessians in that set - Just optgeos.
TG – How do you encode this scheme? Are all outputs SMARTS from chemper? At a glance it looks like a matrix/encoding specific to RDKit. TH – This is a major roadblock currently. There’s no way to dump the state of the z vector, other than chemper, where we then can’t load it back. What I’m looking into now is “typing by analogy”. So you take a bunch of molecules, type them, and then group them by fingerprint. Then you take a query set and try to match it to its closest fingerprint. This is agnostic to the typing strategy being used.
TH – Next time I’ll show the results of further sampling.
|