2020-08-28 QCA Submission Meeting notes

Date

Aug 28, 2020

Participants

@David Dotson
Ben Pritchard
@Joshua Horton
@Trevor Gokey
@Jeffrey Wagner

Goals

New advancements
- QCSubmit can now perform submissions with ignore_errors
- Docker deployments now up
- New clusters: HPC@UCI (old cluster) and Greenplanet
New submissions
- Disacharrides #124
- ANI2x additional compute for benchmark ligands #136
Upcoming infrastructure improvements
- Psi4Harness error reporting fix QCEngine#266
- STANDARDS-based versioning #137
Upcoming science support
- PCM-based implicit solvent pathway
- ESPs and wavefunction storage

Discussion topics

Item	Presenter	Notes

Item	Presenter	Notes
Folding@Home?	Ben	Ben: if we can do compression of output well, might be able to support David: I’ll start the PR for INCOMPLETE restarts David: I’m interested in Folding@Home for QCA if: the business case is there (we get something we don’t have already, but really want, and have the resources to spend developing that compute pathway) it fits into the stated mission of Folding@Home
Local testing for server development	Ben	Ben: has a 2GB dump he can share that includes a partial DB dump David: interested in using this to stand up a local testing env
QCSubmit `ignore_errors`	Josh	Josh: can now submit with explicit ignoring of errors to support submissions that we know won’t all complete successfully valuable for adding supplemental compute post-initial-submission David: will review #136 immediately after call; ready to merge then!
Docker deployments now up	David	Now have automated Docker image builds to DockerHub from prod env changes Jeff: Galileo needs a way to pin to a particular version David: Our prod images get pushed with a tag corresponding to the git commit hash they were generated from; `latest` tag always points to most recent (so it moves)
Galileo	Josh, Jeff	Josh: need Docker image for server, worker separately Jeff: Mainly want to pursue this pathway for folks that want to run the bespoke workflow entirely on a stack they control David, Jeff: let’s put the Dockerfile for the server, specialized workers in the bespoke-workflow repo; images pushed to `openforcefield` Dockerhub org
New compute	Trevor	New workers spun up on two new resources ANI and MM : 1000 eachmapo David: Perhaps rename `prp` slack channel to `fractal-compute` for current status? Jeff: Should try and be two-deep on each machine if possible; need a contact besides John for Lilac
Precomputed datasets	Jeff, Ben, Josh	Jeff: are there paths for precomputed data to get into QCA? Ben: yes, we’re doing that for example with the ANI folks
Infrastructure Roadmap	Jeff	Jeff: please make changes to our working infrastructure roadmap as you see fit: Infrastructure Roadmap 2020 We want to make sure folks are able to work on something that aligns with their interests, and that they get recognized for it
PCM	Josh	PCM should be doable with a small change in the Psi4 python layer; spoke with Lori for additions psi4#1986
ESP	Josh	ESPs - returning the density is a small change in Psi4 getting back a working wavefunction from the stored coefficients is not something that is currently implemented On Lori’s roadmap psi4#1987
Disaccharides	David, Josh, Jeff	Jeff: disacharrides submission itself not high priority/urgent however, with the protein submission (which we only got PDBs for), there are suspected issues with this approach that are fundamental can’t reliably generate CHMILES anticipating issues with the protein dataset we’d like to avoid with the pathway we are pursuing with the disaccharide dataset
Dataset standards	Trevor, Jeff, David	Trevor: adding more detailed field requirements for submissions David: this is the policy doc we use to express the world we want for our datasets this is the starting point for discussions around implementation or procedure Trevor: Major version tied to STANDARDS version minor version is changes to the dataset e.g. validation of errors depending on version Control new names - we should be able to do this Major version tied to STANDARDS version Need to communicate molecule info in the name via version some other aspect Jeff: use a date for the name? David: This is an important effort, and Trevor is empowered to draft this as he sees fit could be worthwhile to draft 3 or so questions, responses to which would be informative, that can be sent to individuals in the org. Responses to these questions can then inform the approaches taken. Different folks may have different needs from e.g. the versioning/naming scheme

Action items

@David Dotson will review #136, merge when ready.

@David Dotson will create a PR for INCOMPLETE restarts; there are potentially many solutions, so we want to start the conversation now

@Trevor Gokey will solicit opinions on versioning/naming for datasets at discretion, continue advancing STANDARDS PR until ready for review.