2020-07-24 QCA Submission Meeting notes

Date

Jul 24, 2020

Participants

@David Dotson
@Trevor Gokey
@Joshua Horton
Ben Pritchard

Goals

Clear old tasks from Confluence
Establish requirements in QC* stack for enabling ESP work
- storing wavefunctions?
New datasets:
- Genentech Optimization Dataset
- Look at early-fail torsion drives in the OpenFF Protein Fragments v1.0 set
- single point energy dataset on the final geometry of the protein fragments, with wavefunction storage - side issue in future we may want to run this again with different theory/basis how do we add this compute through the CI?
Old datasets to begin tracking:
- OpenFF Gen 2 Torsion Set series
- OpenFF Gen 2 Torsion Set 2 series
- DANCE 1 eMolecules t142
- OpenFF Ehrman Informative Optimization v0.2
- How do we want to track old series? Bundle them up, or make separate PRs?
Where do CMILES go?
- Are they attached to molecule records, or somewhat less coupled?
- Are molecules in a new dataset submission accepted, and are the CMILES retained, if the molecules already exist in QCA?
Roadmap refresh: Quantum Chemistry Infrastructure / QC*
Trawl automation issues for additional functionality:

Discussion topics

Item	Presenter	Notes

Item	Presenter	Notes
ESP calculations	David	Ben: Wavefunctions + basis set should be sufficient Josh: Hyesu is going to experiment with this to verify Ben: Wavefunction will allow us to calculate other things too Some of this could perhaps go into QCPortal Visualizer of orbitals for molecules Ben: Should only need to specify the protocol for a submission; no development required on QC* stack Josh: Need to add a field for QCSubmit to handle this Ben: Need to specify that we only want wavefunction for final structure
Genentech dataset	David	Josh will iterate as desired on this dataset, no rush can break up as desired
INCOMPLETEs	Trevor	INCOMPLETEs from Initial Protein Datasets appear to be clearing
ERRORs	Trevor	Trevor investigating older dataset errors, restarting where possible Restarted all failed tasks, from all OpenFF datasets (~1100 optimizations total) If we could add `OptimizationProcedureId` to the error cycling reports, that would be helpful for deeper dives Add an explicit note that errored optimiztions/torsiondrives have been restarted to reports
Adding additional qc specs to existing datasets	Josh	Will put together a GHA workflow that implements additions of spec(s) to an existing dataset Will utilize QCSubmit for a clear pathway
Old datasets	David	David: how do we want to handle old datasets that are a series? Tie them together under a single PR? Make them separate? Josh: separate is probably better, even if it’s more work
Next week	David	We’ll pick up additional items on the agenda we ran out of time for next week

Action items

@Joshua Horton will prototype a Github Action workflow for adding additional QC specs to an existing dataset using QCSubmit

@Trevor Gokey will watch for any hanging INCOMPLETEs on our currently-running datasets; so far they may be clearing without issue

@David Dotson will set up dataset tracking (error cycling) on existing incomplete datasets

@Joshua Horton will add necessary fields for QCSubmit to support ESP calculations (store the wavefunction for final conformation)

@Joshua Horton will iterate as desired on Genentech dataset using QCSubmit; not pressing demand, but any submissions fairly soon could take advantage of relatively light queue

@David Dotson will add OptimizationProcedureIds to error cycling reporting, explicit note that errored procedures restarted