2020-07-24 QCA Submission Meeting notes

Date

Jul 24, 2020

Participants

  • @David Dotson

  • @Trevor Gokey

  • @Joshua Horton

  • Ben Pritchard

Goals

  • Clear old tasks from Confluence

  • Establish requirements in QC* stack for enabling ESP work

    • storing wavefunctions?

  • New datasets:

    • Genentech Optimization Dataset

    • Look at early-fail torsion drives in the OpenFF Protein Fragments v1.0 set

    • single point energy dataset on the final geometry of the protein fragments, with wavefunction storage - side issue in future we may want to run this again with different theory/basis how do we add this compute through the CI?

  • Old datasets to begin tracking:

    • OpenFF Gen 2 Torsion Set series

    • OpenFF Gen 2 Torsion Set 2 series

    • DANCE 1 eMolecules t142

    • OpenFF Ehrman Informative Optimization v0.2

    • How do we want to track old series? Bundle them up, or make separate PRs?

  • Where do CMILES go?

    • Are they attached to molecule records, or somewhat less coupled?

    • Are molecules in a new dataset submission accepted, and are the CMILES retained, if the molecules already exist in QCA?

  • Roadmap refresh: Quantum Chemistry Infrastructure / QC*

  • Trawl automation issues for additional functionality:

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

ESP calculations

David

  • Ben: Wavefunctions + basis set should be sufficient

  • Josh: Hyesu is going to experiment with this to verify

  • Ben: Wavefunction will allow us to calculate other things too

    • Some of this could perhaps go into QCPortal

    • Visualizer of orbitals for molecules

  • Ben: Should only need to specify the protocol for a submission; no development required on QC* stack

  • Josh: Need to add a field for QCSubmit to handle this

  • Ben: Need to specify that we only want wavefunction for final structure

Genentech dataset

David

  • Josh will iterate as desired on this dataset, no rush

    • can break up as desired

INCOMPLETEs

Trevor

  • INCOMPLETEs from Initial Protein Datasets appear to be clearing

ERRORs

Trevor

  • Trevor investigating older dataset errors, restarting where possible

    • Restarted all failed tasks, from all OpenFF datasets (~1100 optimizations total)

  • If we could add OptimizationProcedureId to the error cycling reports, that would be helpful for deeper dives

  • Add an explicit note that errored optimiztions/torsiondrives have been restarted to reports

Adding additional qc specs to existing datasets

Josh

  • Will put together a GHA workflow that implements additions of spec(s) to an existing dataset

  • Will utilize QCSubmit for a clear pathway

Old datasets

David

  • David: how do we want to handle old datasets that are a series?

    • Tie them together under a single PR? Make them separate?

      • Josh: separate is probably better, even if it’s more work

Next week

David

  • We’ll pick up additional items on the agenda we ran out of time for next week

Action items

@Joshua Horton will prototype a Github Action workflow for adding additional QC specs to an existing dataset using QCSubmit
@Trevor Gokey will watch for any hanging INCOMPLETEs on our currently-running datasets; so far they may be clearing without issue
@David Dotson will set up dataset tracking (error cycling) on existing incomplete datasets
@Joshua Horton will add necessary fields for QCSubmit to support ESP calculations (store the wavefunction for final conformation)
@Joshua Horton will iterate as desired on Genentech dataset using QCSubmit; not pressing demand, but any submissions fairly soon could take advantage of relatively light queue
@David Dotson will add OptimizationProcedureIds to error cycling reporting, explicit note that errored procedures restarted

Decisions