DLM: looks like the 7 membered ring might be due to a proton rearrangement [?], we saw similar problems before. Best solution could be to filter it out of benchmark dataset
PB: we thought filtering H-bonds might get rid of the problematic conformers
LMI: I looked into that, but benchmarks seemed to indicate worse performance
LW: could it be because the benchmark set includes the problematic conformers?
LMI: maybe, need to look into it
PB: re a32: we saw similar problems with this hypervalent S before, but we decided it was a rare chemistry and didn’t address it
DM: do we need to go digging around for more data for these problems?
LMI: haven’t scoped out the problem yet
DM: I would spend maybe 1-2 hours to see if eMolecules or ChEMBL has enough molecules to expand dataset
LMI/BW: SMARTS pattern might be hard part
TG: Besmarts could handle this, but chemper might be better suited – you could take the smarts pattern from each of your molecules and query the database for those. Besmarts would give you the union instead. It would take all your chemical environments and find pattern that matches all those environments. If you use that pattern to query, it will select anything that matches
LMI: hard to distinguish between the different angles being either 180 or 90
DM: I think in MM world you would want a multiple MM solution, since angles are indistinguishable
TG: not quite true, all the CH3s is 90, but CH2 is different
LMI: but if CH2 was CH3 it would probably still be same geometry. The substituent is specific to this mol
TG: this problem may need specific smirks to handle these cases
DM: ideas for dealing with this problem systematically?
LW: are QM energies noticeably higher for outlier conformers?
LMI: yes, ~0.2 hartree higher
PB: we used to do more granular benchmarks where we looked at everything more than X% away from the mean, e.g. a bond length of more than X angstrom away
LW: for ddEs we could apply Bill Swope’s modification to only consider conformers within 0.4 A and look at outliers for geometry targets
DM: I think we want to be more aggressive in pruning benchmark set than fitting set
Science team will investigate systematic ways to fix benchmark set