2021-02-17 Benchmarking for Industry Partners - Development Meeting notes

Date

Feb 17, 2021

Participants

  • @David Dotson

  • @Joshua Horton

  • @Jeffrey Wagner

  • @David Hahn

Goals

  • Updates from project team members

  • Where should we deposit result data?

  • Should we pursue an expanded Season 2 shortly after Season 1?

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Updates

 

  • JH: not much has changed; looked into Bill’s issue and have a proposed solution

    • we’re getting an export case that has undefined stereochemistry in his environment. Could allow with allow_undefined_stereochemistry=True, but question is should we? As in, do we want these flowing downstream to analysis or to MM step?

    • JW: looks like QM may have decided the molecule isn’t planar, so now it’s triggering unknown stereochemistry

      • fix could include allowing undefined stereochemistry on export, but would need to also do so on read

      • Two solutions can think of:

        • allow undefined stereo here, make appropriate allowances downstream

        • we eliminate the molecule at this point

    • JH: don’t think letting it go downstream would affect results negatively; FF doesn’t currently care about N stereochemistry

    • JW: Perhaps we should try reproducing the error and experimenting with the approach we take with these?

      • we would need to do some experimentation to decide on an appropriate path for these, but that will take longer

      • we’ve identified a point of failure, but trying to apply a quick fix may not serve us well here; we will need to deliberate on the policy we take on undefined stereo outside of the immediate issue

    • DD: [propose] drop cases like this that occur, issue a warning; make changes to codebase to allow Bill’s server export to proceed, make sure this is smooth for users of execute approach

  • JW: Mirror-image conformers; not deadly but worth fixing.

    • if we have two identical, but mirror-image conformers, they pass through; we would like them deduplicated after QM export

    • DD: can we encode the logic in analysis for now?

  • DH: preparing production run, discussed which molecules to include

    • OPLS3 executor:

      • openff-benchmark schrodinger ffbuilder for parameterizing from QM step

      • can take a long time to parameterize many molecules

      • openff-benchmark schrodinger optimize to run the optimizations

    • DD: is there any danger to being dependent on the set of paramters a partner may already have in ~/.opls

      • perhaps a good thing to use it, even if there is some variability or path-dependence, since this is how partners would use OPLS in practice

    • DH: would like to make it part of Season 1 if possible

Should we do a Season 2 shortly after Season 1?

David D.

  • DH: would like to have some changes in, so maybe not immediately after

    • Improved conformer generation, new FFs, and torsion scans

  • JH: could include RDKit forcefields, like MM99; know some partners are interested in ANI, so perhaps ANI2x; also xtb

  • DH: ANI in Schrodinger also doable

    • so this speaks to the interest from partners

    • also want to get OPLS3 approach in here

  • DD: Season 2 can also just be a scale up, say 5x or 10x dataset sizes; we have time to think on this, can decide over next few weeks

Connectivity rearrangement

Jeff

  • JW: need to probably filter QM outputs on really-stretched bonds

    • Existing code in openff-qcsubmit here.

    • Toolkit dev is tied up a bit right now; would need to make two PRs of the same change to 0.8.4-rc, master

    • JW: JH can we add this to openff-toolkit, and then change the usage in openff-qcsubmit to use that?

  • JW: will require a change in openff-benchmark to use this detection, gracefully toss out molecules

openff-benchmark release

David D.

  • New release as early as middle of next week

    • fix exporters (DD)

    • heuristic for connectivity changes

    • mirror-image deduplication

      • DD: can we make this a postprocessing step? For users using server approach, doing this between QM and MM would require surgery; would be better to avoid if possible

      • JW: if multiple conformers optimize to the same geometry in QM, we want to remove all but one

        • for MM, if multiple conformers optimize to the same geometry, we do want to keep these

        • [proposal] make step 6 deduplication of QM, analysis becomes step 7

        • JW: happy to open the PR into the analysis components and get DH review

        • DH: for match-minima, nice to only deduplicate the QM conformers, then proceed with matching

  • DH: for a future season, do we want to do QM and MM (as opposed to QM then MM like we are doing here)?

    • JW: would be interested in this; starting from RDKit molecules for all could get weird for some MM cases

    • JH: Greg Landrum did a post on conformer generator; says random matrix method is more robust

      • JW: tried random matrix out; didn’t solve anything that was unsolved, created more planar cis-peptides

      • JH: haven’t tested it yet myself; omega seems quite reliable

      • JW: might be a good summer grad student project

  •  

Demo of openff-benchmark schrodinger

David H.

  • JW: should we do schrodinger steps without the bespoke torsion fits?

    • DH: can choose either using custom parameters or not using with openff-benchmark schrodinger optimize; requires running the command twice with different flag

    • JW: I like that; running twice with and without --opls-dir is a clear indication that two different runs are being performed

  • JW: no Schrodinger license in the Mobley Lab; will look to find a way to test from another consortium lab

    • Might be best to find someone in Janssen that can try it out on their machine

  • DH: will consider how to nicely set tag for FF from optimize step; need an indicator whether pure opls3e is used, or whether opls3e-bespoke (or similar, indicating custom params) is used

  • JW: this is looking great! Planning ahead, should get a second person that can run this code; will help to iterate on and spot issues quickly. Don’t want all the risk of mistakes to fall on DH

    • DH: will make a draft PR, DD will review and use Bill Swope for testing

Drop zone for results?

David D.

  • JW: probably a publicly-accessible GDrive; can prepare it with a folder for each openff-benchmark release; subfolders for each company code; example directory giving e.g. burn-in equivalent files

    • dangerously-public GDrive link is a feature; partners need to know data is entering public domain

  • Perhaps not ask them to upload results for a while; if we make more method changes would prefer not asking them to do multiple times

    • DH: would be good for us to have proposal for procedure/artifacts in next call

    • DD: I can draft announcement we’d make, can then discuss content next call

Action items

@David Dotson will address issue Bill encountered by issuing warning and skipping cases with undefined stereochemistry from QM, MM (server export); also add in same solution for execute approach
@David Hahn will continue refining openff-benchmark schrodinger command tree; aiming for inclusion with Season 1 if possible
@Jeffrey Wagner will consider how we can encode mirror-image detection in analysis code to address openff-benchmark#57; will draft PR, get review from @David Hahn
@Jeffrey Wagner will add detect_connectivity_changes_heuristic or equivalent to openff-toolkit, with PRs against master, 0.8.4-rc.
@David Hahn will make draft PR for openff-benchmark schrodinger; @David Dotson will operate as reviewer, loop in Bill Swope for testing if willing
@David Dotson will prepare proposed announcement for result drop; present next meeting

Decisions