2021-09-01 Benchmarking followup workshop discussion notes

Discussion topics

Item

Notes

Item

Notes

 

  • AGobbi – When running OPLS3 with FEP+, did you use ffbuilder?

    • DH – Yes. Not all the FEP+ calcs were run by me. I took them from the published results, including from C Schindler and Janssen people.

  • XHou – Re: benchmarking metrics slide (27) – Have you considered comparing the shape of the curve? In pfizer we compare the shape of the entire torsion scans.

    • LD – We are looking to include torsion scans in the openff-benchmark command tree, but it’s not yet fully implemented.

    • SB – We’ve started to do some crude analysis of torsion energies both in terms of shape and magnitude. Right now we’re looking at normalizing torsion barrier heights and then compare RMSE.

    • DH – Work so far has been single points on the torsion energy surface

  • AG – Also slide 27 – Do you have plots for these alternative comparisons?

    • LD – The histograms showed before are from the “compare forcefields” path. I haven’t had time to make histograms for the swope and lucas analyses. I do have the plots for the Janssen proprietary datasets.

  • AG – Re slide 29, did these reports of failures in 1.3 lead to direct changes in the way that openff-2.0 was trained?

    • LD + DM – No. And there are still more reports from thomas fox that we haven’t fixed.

  • AG – Can we run the torsion deviation code on our own structures?

    • LD – Yes, I’ll find T Fox’s code and share it. Also, today’s interactive session will show some of his

  • DH – Additional analyses?

    • TFox – Torsion profiles. Conformational ensembles (basically whether the FF is good enough that I don’t need to do expensive QM things).

      • AG – Don’t have the conformational ensemble data from the current datasets?

      • TF – I’m not sure that the current dataset has all the energy info that would be needed to do the compairons I want

      • DM – More details?

      • TF – I’d like to see whether a metric of ranking for relative conformer energetics.

      • JW – Something like a rho or a tau would be a great idea

      • DH – Agree. Also the use of the word “ensemble” makes me think about “MD”, and maybe that could be a good analyses, to see if the clusters from ligand MD capture the energy landscape well.

      • TF – That would be good, though it does seem challenging.

Interactive session

  • TF – Is the interpretation that t17 has 260 violations, and 35% of when the parameter is assigned it’s there will be a deviation of more than 30 degrees.

    • LD – We can’t exactly interpret this number directly. But it’s a weighted metric.

    • DM – Could we assume that “1” is a baseline for regular frequency of violation?

    • JW – Two things – Note that this is a biased dataset. Also, a single central bond can have many uses of a parameter, so we can’t make a special meaning of a value of 1.

  • TFox – t157 is a *-*-*-* torsion. What’s going on there?

    • (General) – We’ll look further into this

    •  

  • AGobbi – It would be great to have a commandline tool that runs over a whole dataset.

    • LD – Agree

    • JW – We’ll aim to put this in season 2

Followups

  • Jeff will upload video

  • Jeff will upload slides to zenodo

  • Notes for other workshops?

    • Good to have a non-presenter for interactive session to manage pacing, since it’s hard to do a live demo and keep track of time.

Josh session prep

  • What do we want people to know by the time this is over?

    • Say bespokefit exists - it’s for bespoke parameter derivation, here’s some details/math about how it works

      • Currently focused on torsions, planning to expand to other terms in the future. But you’ll get the most reliable/highest value use from bespoke torsions.

      • Possible backends (psi4, ANI, XTB, etc)

    • Does produce accurate results?

      • JH has previous slides about this

    • CAN I use this in production?

      • No, it still errors a lot, largely due to use of QCFractal.

      • This session should focus on a one-off use of it for a molecule/under conditions that we know work

    • SHOULD I use this in production?

    • HOW do I use this in production?

  • How do we want to allocate time/effort?

  • Post notice on website

  • Interactive demo

    • Make a repo like the benchmarking repo

    • Run locally or on binder?

      • If local install is likely to fail, let’s do binder

      • Return at 26 past

    • Use QCA data? Or XTB?

      • Could do two parts of demo:

        • one using QCA data

          • Maybe predownload jsons into repo

        • one using XTB

          • Could be failure-tolerant / have other part of notebook work even without.

    • Show output OFFXML file, and run gas phase MD

    • Could look at torsion RMSE. Make plots that show improvement in outcome compared to Parsley/Sage parameters.

      • Could reuse JACS ligands in QCArchive, show improved torsion profiles compared to Parsley/Sage

  • Jeff will set up a blank repo for Josh to start populating

Action items

Decisions