2020-11-25 Benchmarking for Industry Partners - Development Meeting notes

Date

Nov 25, 2020

Participants

@David Dotson
@David Hahn
@Joshua Horton
@Jeffrey Wagner
@Gary Tresadern

Goals

Updates from project team members
Identify and address development issues encountered
Identify and address project risks

Discussion topics

Item	Notes

Item

Notes

Progress Updates

JW: update on ingestion components
- this is the first thing user input hits, so we want to make sure we’re careful about what we accept
- details on this PR:
- this PR Is ready for review; following this, will be working on conformer generation/filling in early next week, since that’s needed to line up for compute input testing
  - Also committed to the coverage report, but that’s a side-spur of the workflow
- DD: I will review, and push to merge!
- JW: perhaps have a testing party next week
DD: Update on compute component
- Separating workflow into “Seasons”, where we may want to change the methods that we’re benchmarking
- Using CLI
- Openeye loading warning is annoying
  - JH – logging.disable(logging.WARNING)
- Showed demo of submitting molecule
- Working on pulling the data out
- JW: What all openff versions will we be using? How should we handle different coverage reports?
  - DD – JW will work on this, so use your intuition
JH: keeping up with what everyone is doing, synthesizing this into the deployment procedure doc
- DD: thank you for this; we need this high level perspective to ensure our overall approach is ultimately usable for our users
DH: Kicked off discussion on Slack for if we want QM-then-MM vs. QM-and-MM ; not a clear answer yet on what’s preferred
- Would need intermediates to be stored if we do the then approach
- DD: the QM final molecules exported could be understood by downstream tools to be the starting points for the MM optimizations
- JW: Gary’s preference?
- GT: Would prefer precedence, consistency with what is already published.
  - that said, if we want to show that MM can reproduce the same result, that has value
  - probably start with what has been done, though
- DD: can design to the QM-then-MM approach protocol; the approaches can still support the and approach at a later time
- GT: think it’s important to at least do apples-to-apples with the paper; the other protocol is still interesting, but better for now to approach with the same protocol
- DH: the and protocol would also present problems for the analysis approach, since it depends on the then comparison
- JW: how much care is put in to ensure we don’t get duplicates?
  - what cutoff should I use for conformers?
  - DH: DM recommended that you start with a high threshold and then drop it iteratively.
    - It won’t break downstream analysis to have less than 10 conformers
- DH: a bit stuck at the moment; need inputs to work with
- JW: DD and I can sprint to get the components needed for downstream today
Coverage report: focus on needed data being present first; pretty or human-readable can come after; could be a JSON blob

Common infrastructure

JW: utils.py? cli.py?
JW: Common dataset loader?
JW: Common OpenEye unloader?
JW: Automated testing?

Action items

@David Dotson will review Jeff’s initial implementation of ID assignment, mol evaluation

@Jeffrey Wagner will sprint to get components for compute consumption ready for merge to unblock @David Hahn

@David Dotson will sprint to get components for compute ready for merge to unblock @David Hahn

Meetings