2021-01-06 Benchmarking for Industry Partners - Development Meeting notes

Date

Jan 6, 2021

Participants

@David Dotson
@Joshua Horton
@Jeffrey Wagner
Xavier Lucas
@David Hahn

Goals

Updates from project team members
Identify and address development issues encountered
Identify and address project risks

Discussion topics

Item	Presenter	Notes

Item	Presenter	Notes
TorsionDrive performance	Xavier Lucas	XL – Torsiondrives of simple molecules take a long time DD – This seems like something to do with the refresh rate. If we crank this up it should go faster.
Updates from team members	Josh+Jeff+David+David	JH: was working on coverage reporter; Jeff is finishing that up Also created PR for conformer generation performance JW: plugging back into these PRs; eager to merge JH: on JW+JH – How to handle molecules that will fail AM1BCC? Option 1: Run AM1BCC in coverage step and also in MM energy evaluation step Pro – We identify molecules that will fail AM1BCC sooner Con – We run AM1BCC twice on each good molecule (once during validation and again during energy evaluation) Option 2: Don’t check during initial steps, just let it fail in energy evaluations Pro – Only run AM1BCC on good molecules once Con – We’ll submit wasteful QM jobs on molecules for which we can’t get MM energies [decision] Option 2 is preferable at this time; missing molecules from MM is informative information, and downstream analysis will have to work around it DH: Got first three compute options working; couldn’t get fourth working. Over Christmas wanted to run a set, had issues with jobs running for ages without getting a result DD: we’ll do a working session right after this call Continued on analysis step; implemented conformer-matching step takes reference from QM minimized structure, uses best RMS there will be MM-minimized conformers that will not match any QM with this method JW: Nothing major to report; 0.8.2 release is out for toolkit; takes care of majority of good molecules that were being marked invalid still 2 other areas of stereochemistry issues 80% of bad marking was double-bond stereochemistry went from 5% error rate to 1-2% error rate DD: worked with Bill Swope to develop out compute approaches need to put together public submissions this week, next week
Issues to address		Discuss series field addition. COM-SER-XXXXX-YY Could this work? Are there fundamental issues with this? Basically, this is meant to accommodate adding molecules after an initial submission, since this appears likely from at least one user DD: the series approach tries to address an analogous case of conda envs being more deterministic to create new than to update/evolve JW – Could treat series identifier like group name, but allow validate to have a running mode where MORE data is added JW: Could see two strategies for lookahead needed a bit of common code that does JH – `validate` should have an `add` option, which check new inputs against all existing output graphs, and flags duplicates as failures conf gen, coverage checking, and optimization will look ahead to all output prefixes and not overwrite anything (COM-MMMMM) that’s already been run – We should add tests to all steps for this [decision] – This is the approach we will take. There won’t be a series identifier. Determine where indices are losing leading zeros in last field. JH: will make it so single-conformers submitted with `ds.add_molecule` don’t experience any id mangling Can't get smirnoff99Frosst-1.1.0 spec to work. JH: think it’s a validation issue in QCSubmit; will follow up
Project risks		DD: updated schedule; aiming for: 1/15 protocol feature-complete 1/22 present protocol to partners 2/1 start partners up with production approach, get them set up for support

Action items

@David Dotson will investigate slow behavior of torsiondrive script for simple molecules; follow up with Xavier Lucas

@Jeffrey Wagner will finish coverage reporter PR prototyped by @Joshua Horton

@Jeffrey Wagner will review and merge conformer generation PR prepared by @Joshua Horton

@David Dotson will work with David Hahn to address compute issues, find working pathway at Janssen

@David Hahn will fill in analysis instructions in protocol

@Joshua Horton will make it possible for OptimizationDataset.add_molecule to leave IDs alone under single-conformer conditions

@Joshua Horton will address smirnoff99Frosst-1.1.0 spec failure in QCSubmit

Meetings