2021-02-17 Benchmarking for Industry Partners - Development Meeting notes

Date

Feb 17, 2021

Participants

@David Dotson
@Joshua Horton
@Jeffrey Wagner
@David Hahn

Goals

Updates from project team members
Where should we deposit result data?
Should we pursue an expanded Season 2 shortly after Season 1?

Discussion topics

Item	Presenter	Notes

Item	Presenter	Notes
Updates		JH: not much has changed; looked into Bill’s issue and have a proposed solution we’re getting an export case that has undefined stereochemistry in his environment. Could allow with `allow_undefined_stereochemistry=True`, but question is should we? As in, do we want these flowing downstream to analysis or to MM step? JW: looks like QM may have decided the molecule isn’t planar, so now it’s triggering unknown stereochemistry fix could include allowing undefined stereochemistry on export, but would need to also do so on read Two solutions can think of: allow undefined stereo here, make appropriate allowances downstream we eliminate the molecule at this point JH: don’t think letting it go downstream would affect results negatively; FF doesn’t currently care about N stereochemistry JW: Perhaps we should try reproducing the error and experimenting with the approach we take with these? we would need to do some experimentation to decide on an appropriate path for these, but that will take longer we’ve identified a point of failure, but trying to apply a quick fix may not serve us well here; we will need to deliberate on the policy we take on undefined stereo outside of the immediate issue DD: [propose] drop cases like this that occur, issue a warning; make changes to codebase to allow Bill’s server export to proceed, make sure this is smooth for users of execute approach JW: Mirror-image conformers; not deadly but worth fixing. if we have two identical, but mirror-image conformers, they pass through; we would like them deduplicated after QM export DD: can we encode the logic in analysis for now? DH: preparing production run, discussed which molecules to include OPLS3 executor: `openff-benchmark schrodinger ffbuilder` for parameterizing from QM step can take a long time to parameterize many molecules `openff-benchmark schrodinger optimize` to run the optimizations DD: is there any danger to being dependent on the set of paramters a partner may already have in `~/.opls` perhaps a good thing to use it, even if there is some variability or path-dependence, since this is how partners would use OPLS in practice DH: would like to make it part of Season 1 if possible
Should we do a Season 2 shortly after Season 1?	David D.	DH: would like to have some changes in, so maybe not immediately after Improved conformer generation, new FFs, and torsion scans JH: could include RDKit forcefields, like MM99; know some partners are interested in ANI, so perhaps ANI2x; also xtb DH: ANI in Schrodinger also doable so this speaks to the interest from partners also want to get OPLS3 approach in here DD: Season 2 can also just be a scale up, say 5x or 10x dataset sizes; we have time to think on this, can decide over next few weeks
Connectivity rearrangement	Jeff	JW: need to probably filter QM outputs on really-stretched bonds Existing code in `openff-qcsubmit` here. Toolkit dev is tied up a bit right now; would need to make two PRs of the same change to 0.8.4-rc, master JW: JH can we add this to `openff-toolkit`, and then change the usage in `openff-qcsubmit` to use that? JW: will require a change in `openff-benchmark` to use this detection, gracefully toss out molecules
`openff-benchmark` release	David D.	New release as early as middle of next week fix exporters (DD) heuristic for connectivity changes mirror-image deduplication DD: can we make this a postprocessing step? For users using server approach, doing this between QM and MM would require surgery; would be better to avoid if possible JW: if multiple conformers optimize to the same geometry in QM, we want to remove all but one for MM, if multiple conformers optimize to the same geometry, we do want to keep these [proposal] make step 6 deduplication of QM, analysis becomes step 7 JW: happy to open the PR into the analysis components and get DH review DH: for `match-minima`, nice to only deduplicate the QM conformers, then proceed with matching DH: for a future season, do we want to do QM and MM (as opposed to QM then MM like we are doing here)? JW: would be interested in this; starting from RDKit molecules for all could get weird for some MM cases JH: Greg Landrum did a post on conformer generator; says random matrix method is more robust JW: tried random matrix out; didn’t solve anything that was unsolved, created more planar cis-peptides JH: haven’t tested it yet myself; omega seems quite reliable JW: might be a good summer grad student project
Demo of `openff-benchmark schrodinger`	David H.	JW: should we do schrodinger steps without the bespoke torsion fits? DH: can choose either using custom parameters or not using with `openff-benchmark schrodinger optimize`; requires running the command twice with different flag JW: I like that; running twice with and without `--opls-dir` is a clear indication that two different runs are being performed JW: no Schrodinger license in the Mobley Lab; will look to find a way to test from another consortium lab Might be best to find someone in Janssen that can try it out on their machine DH: will consider how to nicely set tag for FF from `optimize` step; need an indicator whether pure `opls3e` is used, or whether `opls3e-bespoke` (or similar, indicating custom params) is used JW: this is looking great! Planning ahead, should get a second person that can run this code; will help to iterate on and spot issues quickly. Don’t want all the risk of mistakes to fall on DH DH: will make a draft PR, DD will review and use Bill Swope for testing
Drop zone for results?	David D.	JW: probably a publicly-accessible GDrive; can prepare it with a folder for each `openff-benchmark` release; subfolders for each company code; example directory giving e.g. burn-in equivalent files dangerously-public GDrive link is a feature; partners need to know data is entering public domain Perhaps not ask them to upload results for a while; if we make more method changes would prefer not asking them to do multiple times DH: would be good for us to have proposal for procedure/artifacts in next call DD: I can draft announcement we’d make, can then discuss content next call

Action items

@David Dotson will address issue Bill encountered by issuing warning and skipping cases with undefined stereochemistry from QM, MM (server export); also add in same solution for execute approach

@David Hahn will continue refining openff-benchmark schrodinger command tree; aiming for inclusion with Season 1 if possible

@Jeffrey Wagner will consider how we can encode mirror-image detection in analysis code to address openff-benchmark#57; will draft PR, get review from @David Hahn

@Jeffrey Wagner will add detect_connectivity_changes_heuristic or equivalent to openff-toolkit, with PRs against master, 0.8.4-rc.

@David Hahn will make draft PR for openff-benchmark schrodinger; @David Dotson will operate as reviewer, loop in Bill Swope for testing if willing

@David Dotson will prepare proposed announcement for result drop; present next meeting

Meetings