2021-01-06 Benchmarking for Industry Partners - Development Meeting notes

Date

Jan 6, 2021

Participants

  • @David Dotson

  • @Joshua Horton

  • @Jeffrey Wagner

  • Xavier Lucas

  • @David Hahn

Goals

  • Updates from project team members

  • Identify and address development issues encountered

  • Identify and address project risks

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

TorsionDrive performance

Xavier Lucas

  • XL – Torsiondrives of simple molecules take a long time

    • DD – This seems like something to do with the refresh rate. If we crank this up it should go faster.

Updates from team members

Josh+Jeff+David+David

  • JH: was working on coverage reporter; Jeff is finishing that up

    • Also created PR for conformer generation performance

    • JW: plugging back into these PRs; eager to merge

    • JH: on

    • JW+JH – How to handle molecules that will fail AM1BCC?

      • Option 1: Run AM1BCC in coverage step and also in MM energy evaluation step

        • Pro – We identify molecules that will fail AM1BCC sooner

        • Con – We run AM1BCC twice on each good molecule (once during validation and again during energy evaluation)

      • Option 2: Don’t check during initial steps, just let it fail in energy evaluations

        • Pro – Only run AM1BCC on good molecules once

        • Con – We’ll submit wasteful QM jobs on molecules for which we can’t get MM energies

      • [decision] Option 2 is preferable at this time; missing molecules from MM is informative information, and downstream analysis will have to work around it

  • DH: Got first three compute options working; couldn’t get fourth working.

    • Over Christmas wanted to run a set, had issues with jobs running for ages without getting a result

    • DD: we’ll do a working session right after this call

    • Continued on analysis step; implemented conformer-matching step

      • takes reference from QM minimized structure, uses best RMS

      • there will be MM-minimized conformers that will not match any QM with this method

  • JW: Nothing major to report; 0.8.2 release is out for toolkit; takes care of majority of good molecules that were being marked invalid

    • still 2 other areas of stereochemistry issues

    • 80% of bad marking was double-bond stereochemistry

      • went from 5% error rate to 1-2% error rate

  • DD: worked with Bill Swope to develop out compute approaches

    • need to put together public submissions this week, next week

Issues to address

 

  1. Discuss series field addition.

    1. COM-SER-XXXXX-YY

    2. Could this work? Are there fundamental issues with this?

    3. Basically, this is meant to accommodate adding molecules after an initial submission, since this appears likely from at least one user

    4. DD: the series approach tries to address an analogous case of conda envs being more deterministic to create new than to update/evolve

    5. JW – Could treat series identifier like group name, but allow validate to have a running mode where MORE data is added

    6. JW: Could see two strategies for lookahead needed

      1. a bit of common code that does

    7. JH – validate should have an add option, which check new inputs against all existing output graphs, and flags duplicates as failures

      1. conf gen, coverage checking, and optimization will look ahead to all output prefixes and not overwrite anything (COM-MMMMM) that’s already been run – We should add tests to all steps for this

      2. [decision] – This is the approach we will take. There won’t be a series identifier.

  2. Determine where indices are losing leading zeros in last field.

    1. JH: will make it so single-conformers submitted with ds.add_molecule don’t experience any id mangling

  3. Can't get smirnoff99Frosst-1.1.0 spec to work.

    1. JH: think it’s a validation issue in QCSubmit; will follow up

Project risks

 

  • DD: updated schedule; aiming for:

    • 1/15 protocol feature-complete

    • 1/22 present protocol to partners

    • 2/1 start partners up with production approach, get them set up for support

 

Action items

@David Dotson will investigate slow behavior of torsiondrive script for simple molecules; follow up with Xavier Lucas
@Jeffrey Wagner will finish coverage reporter PR prototyped by @Joshua Horton
@Jeffrey Wagner will review and merge conformer generation PR prepared by @Joshua Horton
@David Dotson will work with David Hahn to address compute issues, find working pathway at Janssen
@David Hahn will fill in analysis instructions in protocol
@Joshua Horton will make it possible for OptimizationDataset.add_molecule to leave IDs alone under single-conformer conditions
@Joshua Horton will address smirnoff99Frosst-1.1.0 spec failure in QCSubmit

Decisions