2020-09-04 QCA Submission Meeting notes

Date

Sep 4, 2020

Participants

@David Dotson
@Trevor Gokey
Ben Pritchard

Goals

New advancements
- lifecycle now only error cycles compute specs present in a submission’s dataset(s)/compute(s)
New submissions
- Disacharrides #124
- We would like to increase the rate of submissions to our infrastructure.
  - We can compute relatively quickly, and are working to reduce failure rates, but the queue empties quickly
  - What can we do to increase the rate of submission?
    - QCSubmit/bespoke workflow training? Additional docs?
    - Active solicitation?
    - Low-priority, but massive submissions?
Upcoming infrastructure improvements
- Psi4Harness error reporting fix QCEngine#266
- STANDARDS-based versioning #137
Upcoming science support
- PCM-based implicit solvent pathway
- ESPs and wavefunction storage

Discussion topics

Item	Presenter	Notes

Item	Presenter	Notes
Torchani failures	David	Seeing `struct.error: unpack requires a buffer of 81920 bytes` going to raise an issue on `torchani` and see what feedback we get Trevor: appears to be a random error due to cluster network/filesystem behavior David: suggests that we might need to compensate for this at the QCEngine layer; not clear yet
QCF Server	Ben	Want to improve the logging, submission speed added PR for KV-store compression Considering hiding tasks from the user instead of exposing tasks, just exposing results and feeding everything through this Considering adding a state to results - DEFUNCT for results that will never work Need a standard terminology for a lot of things Trevor: a lot of signatures are a JSON blob, but not clear what it needs to be processed Terminology: `add_compute` would like to find a replacement for David: we’ll be pleased with any changes you choose to make at your discretion David: happy to devote time to reviewing PRs; can definitely justify those that align with OpenFF needs Trevor: how hard would it be to add an MMMolecule schema? Ben: MolSSI has many times tried to come up with a schema that works for both QM and MM, and so far proved impossible Trevor: I’ll keep thinking about this; definitely an area I’m interested in (esp. when thinking about downstream benchmarking)
Increase rate of submission?	David	Want to increase rate of submission; what are our bottlenecks? Trevor: Enamines dataset - 11 billion SMILES strings Ben: no technical reason why this shouldn’t work David: even 1 million molecules would take a while to submit Ben: not sure where that bottleneck is yet; would want to solve this perhaps aim for beginning of October for submission of a large dataset like this? David: as long as we can do priority “low” for this, should be okay; don’t want to saturate our queue so fitting tasks can’t get through Ben: should be good for this; works by sorting the queue table David: I will create a torsiondrive version of the protein dataset; will be a good exercise for me and for QCSubmit components
Can we submit multiple compute tags?	Trevor	We can only have one tag, but modifying a tag can solve the issue of guiding compute resources, e.g. “openff-enamine” or “openff-ani” Being worked on; target is end of month

Action items

@Trevor Gokey will prepare an initial submission of <1M molecules from the Enamines dataset; aiming for October submission; this will inform the drafting of STANDARDSv3, broaden familiarity with QCSubmit

@David Dotson will submit an issue to torchani pointing out the error tracebacks we’re seeing on #136; not necessarily a torchani issue (may be in QCEngine), but will be informative to get developer input

@David Dotson will prepare a torsiondrive version of the initial protein dataset; this will also inform the drafting of STANDARDSv3, broaden familiarity with

@David Dotson will engage with Ben Pritchard on QCFractal PRs as necessary to advance OpenFF feature/efficiency aims

Meetings