View file |
---|
name | Chemper Call 2022-02-28.pptx |
---|
|
| Trevor Gokey | TG - easy to compare SMARTS for bonds, hard to compare for molecules; encode differences in binary, then describe bond using graph union (equivalent to bitwise OR) TG - when graphs have different cardinality, provide a SMARTS that will match chemistries with both numbers of nodes TG - The union of all matched SMARTS gives the most general parameter, then removal of bits gives specific parameters CB - this seems to depend on what chemical environments are present in the training set. Will this still work in practice when we can encounter new chemical environments? TG - wildcards are dangerous, we should think carefully about what they mean when we specify parameters
TG - objective function is composed of physical space term (SSR for observables) and chemical space term (exponential function of number of unique chemical environments and number of bits needed to represent those environments). TG - for alkanes, data-driven parameter set (this work) outperforms Sage by adding only one extra parameter. CC - how do you read these graphs? TG - start with smallest set of parameters at left of graph. In each iteration, identify candidate splits/merges with scoring function, fully optimize 10 best candidates, then plot new physical/chemical objective
TG - The ground-truth best choice ([!r4:1]~[*:2] ) was 11th best with my scoring function, but targeting only the single best candidate gets pretty close CB - it seems like your objective function favors small gains on parameters prevalent in the training dataset over large gains on sparse parameters CB - I’m still worried that SMARTS like [!r4] will be too general for the wider chemical world outside of your training dataset. Can you generalize your approach to binary-encoded SMARTS to identify how to specify these SMARTS in a bigger space, e.g. Sage? DM - you could give your encoder exactly the Sage training set and ask it to differentiate this parameter TG - this is hard because you have to identify at what stage in the hierarchy you want to search CB - this is equivalent to the wizardry of human chemical intuition. Wizards (chemists) always make some allegedly generalizable choice that is not actually generalizable. This is the problem we want to solve by automation at some point. TG - has this problem been solved, at least locally, in bespokefit ? DM - no, because bespokefit is not meant to be transferable
|