2021-01-27 Benchmarking for Industry Partners - Development Meeting notes

Date

27 Jan 2021

Participants

Goals

Updates from project team team members
Identify and address project risks

Discussion topics

Item	Presenter	Notes
Updates		DH: test calculations (non-burn-in) hitting a lot of “Unknown error” cases; may try option (c) and see if this produces different results these do appear to reduce in number after each error cycling, so not easy to reproduce in a single case DD: if there are any persistent error cases after several rounds of error cycling, please share (if you can) the input molecule and we can take it to the `psi4` devs to get a better solution in place (at least better error messaging) JW: worked on deployment last week; making conda package (automation), implementing single-file installer (automation) tricky bug in RDKit; seen references by Greg Landrum in validation, when a mol passes other checks, later on the mol will get written and read in SDF there are cases where there is not full fidelity in write/read; we do this as part of our validation for this reason; write to a StringIO object to avoid hammering filesystem there are some cases that roundtripping StringIO fails, but to a file does not. Is consistently reproducible for a given molecule that fails; not clear what about these molecules is causing this so, switched to using filesystem writes instead of StringIO to avoid the issue for now; will pursue (later) creating an RDKit issue with an example also found a mangling of the atom indices from QM output QM export is the first place we see it have ideas on where the problem is; will investigate and report back findings working to make parameter coverage output (step 3) be the input for optimizations (step 4) have two places where we use arbitrary RMS cutoffs minimum allowable RMS between generated conformers (step 2): 1.0A propose changing to 1.5A deduplication for input conformers (step 1): 0.1A propose changing to 1.0A the proposed changes would reduce the number of conformers in a dataset, speeding up the QM stage of the benchmark globally the existing choices give more conformers, and differences between them are perhaps not meaningful at the level of our forcefield [decision] change input conformer deduplication cutoff from 0.1 to 1.0A heavy-atom RMS; change conformer generation cutoff from 1.0A to 1.5A heavy-atom RMS Jeff will make this change as a PR; David H. will give it a go to see how many molecules come through from his testing dataset need to make sure we don’t end up with e.g. over 50% of molecules with only one conformer JH: had a go at the burn-in set; will use to investigate issues with Jeff played around with `psi4` versions; resolved we should follow up with Ben and ask about `psi4` release schedule plans DD: need to create group codes for each partner BASF : BSF VRTX : VTX XTALPI : XTP BMS : BMS MERCK : MRK BOEHRINGER-INGELHEIM : BRI ROCHE : RCH GENENTECH : GNT BAYER : BYR JANSSEN : JNS working to add `set-tag` functionality for e.g. marking mistake datasets in a server as `defunct` need to get test coverage up, docstring coverage of the `optimize` command tree need to also add export of detailed qcvars (as we get from `openff-benchmark optimize execute`) when using a server-based approach; there is a desire for dipole moment data, which is in there, but not currently exported by our `export` command
Burn-in	David D.	DD: how do we poke partners to get feedback? JW: we’ll run out of bandwidth if we resort to DMs; best to keep it on the channels resort to DM threads with all of us on it if needed, but only if asked DD: will draft a “poke” message for #benchmarks-partners, send to JW,JH,DH as draft DD: will prepare a spreadsheet of all partners, with status indicators like “burn-in complete”, “started production run”, “choice of optimization approach”, “variation notes”
Project risks		None identified.

Action items

David Hahn will share persistent unknown error cases (if possible) for circulation with psi4 devs for troubleshooting, better error reporting
Jeffrey Wagner will investigate connectivity mangling in optimization results; solve if possible
Jeffrey Wagner will create RDKit issue with molecule that serializes well to a file, but not to a string via StringIO (time permitting; not urgent)
Joshua Horton will raise question of psi4 release cycle (plan for 1.4 release) at Friday call with Ben Pritchard
David Dotson will improve test coverage of openff-benchmark optimize command tree
David Dotson will add a set-tag subcommand to openff-benchmark optimize that allows for e.g. defunct task routing
David Dotson will improve server export to include full qcvars data output
David Dotson will create a spreadsheet for tracking partner status
David Dotson will draft a “poke” message for #benchmarks-partners, send to team; ship it
Joshua Horton will update coverage reporter to support appending, output all conformers so it can be used within workflow stream
David Hahn will iterate with Jeffrey Wagner on new cutoff choices, evaluate if cutoffs yield favorable results (reduced number of practically-identical conformers, but not too many molecules with only 1 conformer)

Date

Participants

Goals

Discussion topics

Action items

Decisions

0 Comments