2020-08-26 BespokeFit meeting notes

Date

Aug 26, 2020

Participants

  • @Jeffrey Wagner

  • @Joshua Horton

  • @Jamshed Anwar (Deactivated)

  • @David Mobley

Discussion topics

Notes

Notes

  • JH – Prototype is getting near complete. There’s a bit on jank, where a process has to babysit the QCA snowflake and error cycle.

  • JH – Did some binning to try to get good or bad torsion profiles. But the logic had trouble separating the good from bad.

    • DM – Probably better to move quickly rather than worrying about miscategorizing a few cases. It seems inevitable that a really good binning system will require a lot of human review.

    • JH – I’ll send out a summary of what I’m finding now, and which molecules/parameters are good/bad

  • JH – Not sure how FB internals work. Is there preference toward matching minima? Is there an energy cutoff?

    • DM – Yes, but not sure about the specific details.

    • DM – Basically, matching minima is most important, followed by getting low barrier heights right, and making sure impassably large barriers remain impassable.

    • JA – So barrier heights near kT are important, anything much higher is less important.

    • DM – IIRC, it’s something like 6kT

  • https://openforcefield.atlassian.net/wiki/spaces/~102144901/pages/600473647

    • JA – Looking at these plots, the RMSD may be covering up some details, so it may be better to check out the maximum deviation for each molecule.

    • DM – When I look by eye, I largely want to see if the “shape is the same”. So if there’s a different number of peaks or minima, the I suspect that sterics are causing trouble. In those cases, we would want to avoid including those high-steric-energy structures in the torsion fitting.

    • JA – Some measure of curvature may be the best way to go here, and deconvoluting the energy contributions.

    • JH – Yes, I think that signed curvature is a useful metric here.

    • DM – What logic is applied to the curvature?

    • JH – Yes, we do al allclose comparison between the two arrays

    • DM – It may be good to downweight points based on extremely nonsmooth points in the MM energy landscape

    • JH – Concerned about waste and miscategorization close to the cutoff – Lots of good stuff could be thrown out near the barrier

    • JA – It’d be good to see the steric energies on the plots.

    • JW – We could list all possible directions and try to sort to find the ones with the highest returns first

    • DM – Point filter could be a generic “black box” function that we implement minimally now, and future refinements could just work on improving this box.

    • (General) – In bespoke workflow, could use a similar black box function to look at QM torsion scan and see if some poitns should be removed.

  • Bespoke workflow architecture

    • 4 processes:

      • bespoke.py

      • ForceBalance

      • Archive manager

      • Archive worker manager

        • Archive worker(s)

    • ANI jobs take 30 seconds each

    • Conda environment should be consistent – Could put package up on omnia/label/rc

  • Chemper utility/future?

    • JW – Would we imagine using bespoke fits to all torsion drives followed by Chemper collapsing to make a FF?

    • DM – Two shortcomings of that would be 1) we don’t have complete coverage of chemical space and 2) we’d need to develop a metric of when two parameters are “close enough” to be merged.

  • JH – How should I evaluate whether bespoke fits are improving the models?

    • JH – Previously we did conformer energy ranking, coupled with high temperature MD.

    • DM – Could also optimize geometries that didn’t appear in scan and see if new minima are correctly ranked

    • DM – Binding free energies are the ultimate goal, but it’ll be hard to use that as a benchmark, and may be misleading. Could talk to JC or HBM to try putting bespoke fits in place of where they’ve used ANI or other FFs in their benchmarking.



Action items

Decisions