2021-05-19 Benchmarking for Industry Partners - Development Meeting notes

2021-05-19 Benchmarking for Industry Partners - Development Meeting notes

Participants

  • @Lorenzo D'Amore

  • @David Hahn

  • @David Dotson

  • @Jeffrey Wagner

  • @Simon Boothroyd

  • @Joshua Horton

Goals

  • Updates from team

  • Refactor kick-off?

  • Establish next partner call with Gary?

Discussion topics

Item

Notes

Item

Notes

Updates from Team

  • LD: At Janssen, generating reports for submission

    • running ffbuilder for OPLS4, taking a few days

    • DD: should we proceed with a merge on #63?

      • DH: it is known to work on OPLS3, but there are known adaptions needed for OPLS4; running OPLS4 testing now

    • DD: should we engage Kaushik for testing?

      • DH: he appeared interested, would be useful

      • DD: I’ll email him, use it as an opportunity to poke for data submission,

    • LD – Torsion benchmarking – I added two other QM reference methods. Metahybrid minnesota functional, and another which is hybrid and is recommended by psi4 community (grimme B95 w/ dispersion). What I’ve seen is that the results don’t change. So I think b3lyp is good enough, based on the fact that the torsional barriers were the same among all 3 functionals.

      • LD – Also, this touches on whether we introduce bias by using the same QM method to do both our fitting and benchmarking. Since these all seem to be the same, I don’t think we’re introducing undue bias.

      • LD – The meta-GG? methods seem faster, without loss of accuracy. There were more setting I could have varied, but I didn’t want to change too many variables in Psi4.

      • LD – I’ll publish these on confluence

      • DD – So, the goal with this study is to identify a choice of basis set that’s different from our current method/basis that we could use for bnchmarking, so that we don’t benchmark and train against the same QM

      • LD – This started because I saw that openff-1.3.0 overpredicted many barrier heights. So I wanted to do an independent run of QM to see whether this overestimation of barrier heights was “real”

      • SB – It’ll be good to sync up with Pavan Behara on this, since he’s also exploring different QM methods.

      • SB – I understand the argument that we dont want to test and train on the same level of theory. But I think the way that we picked it may make this not be an issue. Hopefully, in the future, we can use something like ANI that will avoid this issue. We’ve been provided with an alpha version of a twice-differentiable version of ANI that could be useful, but is missing a lot of provenance.

      • DD – Do we have next steps for this?

      • SB – I’m working with JH on this. We need to see whether the results indicate the differentiability/smoothness that we need. Once we do these tests, I can provide feedback to the ANI team. There’s no hard timeline for this yet.

    • LD – Had debugging session with DH regarding the analysis proposed by BS and XL. I had an early run that looked good, but I hit an error when I moved to a larger analysis. DH helped me work around this.

      • DD – Great. Next step would probably be to make a PR for this in openff-benchmark.

      • LD – I’m moving this to the codebase; will create PR soon

      • LD – To recap: BS wanted to identify the QM minimum, and then for each conformer, to calculate the dE to that minimum.

      • SB – This makes sense.

    • LD – DD and I had a live session to sort out how to reproduce Lim result. We’ll have another working session on this following this call.

  • DH

    • Not much to update re: benchmarking. Worked with LD on the point above. Kept working on our results for benchmark project.

    • Finished PL benchmarking paper, and polished up repo.

      • DD – I really appreciate this. It could help with my scoping of work for F@H.

      • DH – My recent changes may make it incompatible with the PMX workflow. So the PMX repo needs some updates to fix this. I’ll do this when I can but I have a lot of other work on my plate.

      • DD – LD and I will be trying this out soon. As currently written, will PLBenchmarks work with DH’s PMX fork?

      • DH – Not currently.

      • DD – Are there places where I can help here?

      • DH – Yes, it should be pretty straightforward. Mostly changing variable/method names.

      • (LD and DH are already planning to work together on this)

      • DD – Thanks for trailblazing this, DH!

    • Wanted to thank Simon for the openff-evaluator template repo to steal snippets for the docs.

  • JW:

    • no immediate updates; found problems with nonbonded energies in how openff-toolkit works now (small problems)

      • means that if we switch openff-benchmark to next toolkit release, we will likely get different energy values out

      • switching how we do rdkit conformer generation to make it deterministic; forcing an atom order. This makes RMSD cutoff for conformer generation effectively more aggressive, with fewer conformers resulting

    • DD – I’m thinking about how these changes will fit into the refactor. In terms of keeping refactors/changes isolated from industry partners, this should be safe because they’re just pulling from the prod envs, where everything’s pinned.

    • JW – Refactor is on my calendar for June-Sep

    • JW – Thomas Fox had pointed out spiro compounds, 4-membered ring with nitrogen

      • Unlikely that JW can fix this in toolkit in a way that’s useful for the current benchmarking season. So DD has been very helpful in helping partners root out these cases and separate them from datasets.

  • JH

    • Looked at public dataset on QCA. Was looking for implicit hydrogen problems. A few from Genentech. 800 cases of implicit H problem. Majority of cases are Merck.

    • 800 unique mols have this problem, 4000 conformers.

    • To catch it, I did the “load it with OE and see if it raises an error” trick. I’ll try to automate this into QCSubmit, but the feature will be OE-dependent.

    • JW: of the 800, do you know how many were from Merck?

      • JH – 790+. Probably the whole merck set.

      • DH: is it possible to check this with RDKit?

        • take the molecule, add hydrogens, remove hydrogens, then compare number of atoms

        • JW: may be a “how many implicit hydrogens do you have” method

        • SB: the RDKit implementation is idiosyncratic with how it does this; have to be careful with the approach; turn off hand-holding features to make sure you’re getting what you intend

    • JW – What should we do about this?

      • Definitely notify Merck

      • SB – The QM basically had the molecules fall apart. Then the connectivity change checker would reject the final geometry.

      • SB – The RMSD, even for cases that aren’t thrown out, is probably not useful.

      • DD: we can reach out to Christina, detail what we observe, and ask for clarification on the preparation process; from her response we may be able to determine the issue and inform a reroll of the dataset by her to add explicit hydrogens; we can then reroll the season 1 dataset as a version bump with these explicit hydrogen molecules

    • JW (looked at results numbers from Merck internal dataset) – These seem to be reasonable metrics, so CS’s internal dataset was probably prepared somewhat differently.

      • DD – CS’s contribution to public dataset was sent to us in December, so CS may not have known about the explicit H requirement.

  • DD

    • Running QM optimizations from public set. About 52% complete

    • Running MM calcs locally. Goal was to see whether we can reproduce “missing TFD” cases from last week. This should help us get a locally reproducing case

    • Worked with LD on reproducing Lim results. Hit some snags re: changes in QC dataset models. LD and I are working on openff-benchmark #77 to fix this. This PR will be a place to aggregate changes needed to reproduce Lim benchmark.

    • I’ll ping partners again to submit results.

      • JW – May be good to send on a day other than Friday, otherwise the emails may get lost in the wash.

      • DD – I’m planning on getting this one out today.

    • DD – Plan next partner call with Gary? It will probably be ~3 weeks out if I ask now.

      • JW – Do we have areas where we need feedback? OR information to share?

      • DD – Could share progress, discuss results. Results are public so we could point people at that. Could share a summary but would require some analysis from us.

      • JW – We have the annual workshop in ~3 weeks. Could present results slide.

      • SB – I’d be interested in presenting these results, though I’d like to know more about them.

        • DD – All we know about the molecules is in the spreadsheets.

        • JW – And coverage reports, but it’s hard to know what that means in such a large aggregate.

        • SB – Yeah, it’ll be hard to know what to do with the coverage reports, though they may be good evidence that we’re covering a wide range of parameters/chemistries.

Action items

@David Dotson will engage Kaushik for Schrodinger pathway testing
@Lorenzo D'Amore will publish results of his torsion study on Confluence
@Lorenzo D'Amore will engage with @Pavan Behara on torsions and QM methods to coordinate activities, share insights
@Lorenzo D'Amore will create PR against openff-benchmark for new analysis subcommands
@Lorenzo D'Amore and @David Hahn will propagate changes from protein-ligand-benchmark into pmx Binding Free Energy Workflow
@David Dotson will engage Christina on public dataset preparation, what we observe, dataset resubmission with explicit hydrogens
@David Dotson will analyze local MM compute of public dataset; check for missing tfd as Thomas observed
@David Dotson will ping all partners for results again
@Simon Boothroyd will work with @David Dotson to assemble results from partner data so far for annual workshop presentation

Decisions