2024-02-07 FF Fitting Meeting

Participants

  • @Alexandra McIsaac

  • @David Mobley

  • Bill Swope

  • @Trevor Gokey

  • @Pavan Behara

  • @Michael Gilson

  • @Lily Wang

  • @Brent Westbrook (Unlicensed)

Goals

  •  

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Outliers in the benchmarking dataset

LM

  • Recording

  • DLM: looks like the 7 membered ring might be due to a proton rearrangement [?], we saw similar problems before. Best solution could be to filter it out of benchmark dataset

    • PB: we thought filtering H-bonds might get rid of the problematic conformers

    • LMI: I looked into that, but benchmarks seemed to indicate worse performance

    • LW: could it be because the benchmark set includes the problematic conformers?

    • LMI: maybe, need to look into it

  • PB: re a32: we saw similar problems with this hypervalent S before, but we decided it was a rare chemistry and didn’t address it

  • DM: do we need to go digging around for more data for these problems?

    • LMI: haven’t scoped out the problem yet

    • DM: I would spend maybe 1-2 hours to see if eMolecules or ChEMBL has enough molecules to expand dataset

    • LMI/BW: SMARTS pattern might be hard part

    • TG: Besmarts could handle this, but chemper might be better suited – you could take the smarts pattern from each of your molecules and query the database for those. Besmarts would give you the union instead. It would take all your chemical environments and find pattern that matches all those environments. If you use that pattern to query, it will select anything that matches

  • LMI: hard to distinguish between the different angles being either 180 or 90

    • DM: I think in MM world you would want a multiple MM solution, since angles are indistinguishable

    • TG: not quite true, all the CH3s is 90, but CH2 is different

    • LMI: but if CH2 was CH3 it would probably still be same geometry. The substituent is specific to this mol

    • TG: this problem may need specific smirks to handle these cases

  • DM: ideas for dealing with this problem systematically?

    • LW: are QM energies noticeably higher for outlier conformers?

    • LMI: yes, ~0.2 hartree higher

    • PB: we used to do more granular benchmarks where we looked at everything more than X% away from the mean, e.g. a bond length of more than X angstrom away

    • LW: for ddEs we could apply Bill Swope’s modification to only consider conformers within 0.4 A and look at outliers for geometry targets

    • DM: I think we want to be more aggressive in pruning benchmark set than fitting set

    • Science team will investigate systematic ways to fix benchmark set

Action items

Decisions