2020-09-04 QCA Submission Meeting notes

Date

Sep 4, 2020

Participants

  • @David Dotson

  • @Trevor Gokey

  • Ben Pritchard

Goals

  • New advancements

    • lifecycle now only error cycles compute specs present in a submission’s dataset(s)/compute(s)

  • New submissions

    • Disacharrides #124

    • We would like to increase the rate of submissions to our infrastructure.

      • We can compute relatively quickly, and are working to reduce failure rates, but the queue empties quickly

      • What can we do to increase the rate of submission?

        • QCSubmit/bespoke workflow training? Additional docs?

        • Active solicitation?

        • Low-priority, but massive submissions?

  • Upcoming infrastructure improvements

  • Upcoming science support

    • PCM-based implicit solvent pathway

    • ESPs and wavefunction storage

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Torchani failures

David

  • Seeing struct.error: unpack requires a buffer of 81920 bytes

    • going to raise an issue on torchani and see what feedback we get

    • Trevor: appears to be a random error due to cluster network/filesystem behavior

    • David: suggests that we might need to compensate for this at the QCEngine layer; not clear yet

QCF Server

Ben

  • Want to improve the logging, submission speed

    • added PR for KV-store compression

  • Considering hiding tasks from the user

    • instead of exposing tasks, just exposing results and feeding everything through this

  • Considering adding a state to results - DEFUNCT

    • for results that will never work

  • Need a standard terminology for a lot of things

  • Trevor: a lot of signatures are a JSON blob, but not clear what it needs to be processed

  • Terminology: add_compute would like to find a replacement for

  • David: we’ll be pleased with any changes you choose to make at your discretion

  • David: happy to devote time to reviewing PRs; can definitely justify those that align with OpenFF needs

  • Trevor: how hard would it be to add an MMMolecule schema?

    • Ben: MolSSI has many times tried to come up with a schema that works for both QM and MM, and so far proved impossible

    • Trevor: I’ll keep thinking about this; definitely an area I’m interested in (esp. when thinking about downstream benchmarking)

Increase rate of submission?

David

  • Want to increase rate of submission; what are our bottlenecks?

  • Trevor: Enamines dataset - 11 billion SMILES strings

    • Ben: no technical reason why this shouldn’t work

    • David: even 1 million molecules would take a while to submit

    • Ben: not sure where that bottleneck is yet; would want to solve this

      • perhaps aim for beginning of October for submission of a large dataset like this?

    • David: as long as we can do priority “low” for this, should be okay; don’t want to saturate our queue so fitting tasks can’t get through

      • Ben: should be good for this; works by sorting the queue table

  • David: I will create a torsiondrive version of the protein dataset; will be a good exercise for me and for QCSubmit components

Can we submit multiple compute tags?

Trevor

  • We can only have one tag, but modifying a tag can solve the issue of guiding compute resources, e.g. “openff-enamine” or “openff-ani”

    • Being worked on; target is end of month

Action items

@Trevor Gokey will prepare an initial submission of <1M molecules from the Enamines dataset; aiming for October submission; this will inform the drafting of STANDARDSv3, broaden familiarity with QCSubmit
@David Dotson will submit an issue to torchani pointing out the error tracebacks we’re seeing on #136; not necessarily a torchani issue (may be in QCEngine), but will be informative to get developer input
@David Dotson will prepare a torsiondrive version of the initial protein dataset; this will also inform the drafting of STANDARDSv3, broaden familiarity with
@David Dotson will engage with Ben Pritchard on QCFractal PRs as necessary to advance OpenFF feature/efficiency aims

Decisions