2020-11-25 Benchmarking for Industry Partners - Development Meeting notes

Date

Nov 25, 2020

Participants

  • @David Dotson

  • @David Hahn

  • @Joshua Horton

  • @Jeffrey Wagner

  • @Gary Tresadern

Goals

  • Updates from project team members

  • Identify and address development issues encountered

  • Identify and address project risks

Discussion topics

Item

Notes

Item

Notes

Progress Updates

  • JW: update on ingestion components

    • this is the first thing user input hits, so we want to make sure we’re careful about what we accept

    • details on this PR:

    • this PR Is ready for review; following this, will be working on conformer generation/filling in early next week, since that’s needed to line up for compute input testing

      • Also committed to the coverage report, but that’s a side-spur of the workflow

    • DD: I will review, and push to merge!

    • JW: perhaps have a testing party next week

  • DD: Update on compute component

    • Separating workflow into “Seasons”, where we may want to change the methods that we’re benchmarking

    • Using CLI

    • Openeye loading warning is annoying

      • JH – logging.disable(logging.WARNING)

    • Showed demo of submitting molecule

    • Working on pulling the data out

    • JW: What all openff versions will we be using? How should we handle different coverage reports?

      • DD – JW will work on this, so use your intuition

  • JH: keeping up with what everyone is doing, synthesizing this into the deployment procedure doc

    • DD: thank you for this; we need this high level perspective to ensure our overall approach is ultimately usable for our users

  • DH: Kicked off discussion on Slack for if we want QM-then-MM vs. QM-and-MM ; not a clear answer yet on what’s preferred

    • Would need intermediates to be stored if we do the then approach

    • DD: the QM final molecules exported could be understood by downstream tools to be the starting points for the MM optimizations

    • JW: Gary’s preference?

    • GT: Would prefer precedence, consistency with what is already published.

      • that said, if we want to show that MM can reproduce the same result, that has value

      • probably start with what has been done, though

    • DD: can design to the QM-then-MM approach protocol; the approaches can still support the and approach at a later time

    • GT: think it’s important to at least do apples-to-apples with the paper; the other protocol is still interesting, but better for now to approach with the same protocol

    • DH: the and protocol would also present problems for the analysis approach, since it depends on the then comparison

    • JW: how much care is put in to ensure we don’t get duplicates?

      • what cutoff should I use for conformers?

      • DH: DM recommended that you start with a high threshold and then drop it iteratively.

        • It won’t break downstream analysis to have less than 10 conformers

    • DH: a bit stuck at the moment; need inputs to work with

    • JW: DD and I can sprint to get the components needed for downstream today

  • Coverage report: focus on needed data being present first; pretty or human-readable can come after; could be a JSON blob

Common infrastructure

  • JW: utils.py? cli.py?

  • JW: Common dataset loader?

  • JW: Common OpenEye unloader?

  • JW: Automated testing?

Action items

@David Dotson will review Jeff’s initial implementation of ID assignment, mol evaluation
@Jeffrey Wagner will sprint to get components for compute consumption ready for merge to unblock @David Hahn
@David Dotson will sprint to get components for compute ready for merge to unblock @David Hahn

Decisions