2020-11-25 Benchmarking for Industry Partners - Development Meeting notes

Date

25 Nov 2020

Participants

Goals

Updates from project team members
Identify and address development issues encountered
Identify and address project risks

Discussion topics

Item

Notes

Progress Updates

JW: update on ingestion components
- this is the first thing user input hits, so we want to make sure we’re careful about what we accept
- details on this PR:
- this PR Is ready for review; following this, will be working on conformer generation/filling in early next week, since that’s needed to line up for compute input testing
  - Also committed to the coverage report, but that’s a side-spur of the workflow
- DD: I will review, and push to merge!
- JW: perhaps have a testing party next week
DD: Update on compute component
- Separating workflow into “Seasons”, where we may want to change the methods that we’re benchmarking
- Using CLI
- Openeye loading warning is annoying
  - JH – logging.disable(logging.WARNING)
- Showed demo of submitting molecule
- Working on pulling the data out
- JW: What all openff versions will we be using? How should we handle different coverage reports?
  - DD – JW will work on this, so use your intuition
JH: keeping up with what everyone is doing, synthesizing this into the deployment procedure doc
- DD: thank you for this; we need this high level perspective to ensure our overall approach is ultimately usable for our users
DH: Kicked off discussion on Slack for if we want QM-then-MM vs. QM-and-MM ; not a clear answer yet on what’s preferred
- Would need intermediates to be stored if we do the then approach
- DD: the QM final molecules exported could be understood by downstream tools to be the starting points for the MM optimizations
- JW: Gary’s preference?
- GT: Would prefer precedence, consistency with what is already published.
  - that said, if we want to show that MM can reproduce the same result, that has value
  - probably start with what has been done, though
- DD: can design to the QM-then-MM approach protocol; the approaches can still support the and approach at a later time
- GT: think it’s important to at least do apples-to-apples with the paper; the other protocol is still interesting, but better for now to approach with the same protocol
- DH: the and protocol would also present problems for the analysis approach, since it depends on the then comparison
- JW: how much care is put in to ensure we don’t get duplicates?
  - what cutoff should I use for conformers?
  - DH: DM recommended that you start with a high threshold and then drop it iteratively.
    - It won’t break downstream analysis to have less than 10 conformers
- DH: a bit stuck at the moment; need inputs to work with
- JW: DD and I can sprint to get the components needed for downstream today
Coverage report: focus on needed data being present first; pretty or human-readable can come after; could be a JSON blob

Common infrastructure

JW: utils.py? cli.py?
JW: Common dataset loader?
JW: Common OpenEye unloader?
JW: Automated testing?

Action items

David Dotson will review Jeff’s initial implementation of ID assignment, mol evaluation
Jeffrey Wagner will sprint to get components for compute consumption ready for merge to unblock David Hahn
David Dotson will sprint to get components for compute ready for merge to unblock David Hahn

Date

Participants

Goals

Discussion topics

Action items

Decisions

0 Comments