2020-09-18 QCA Submission Meeting notes

Date

18 Sep 2020

Participants

Goals

New advancements
New submissions
- Protein Fragments TorsionDrives
  - can geomeTRIC do combinatorics of torsions, e.g. phi and psi for an amino acid in a grid search?
- Enamine REAL subset Optimization
  - Trevor working on putting this together
- Jessica Maat’s dataset
Upcoming infrastructure improvements
- Psi4Harness error reporting fix QCEngine#266
- STANDARDS-based versioning #137
- Torchani changes - QCEngine release
Upcoming science support
- PCM-based implicit solvent pathway
- ESPs and wavefunction storage
- Uploading datasets calculated on private server
Larger advances
- Automated FF coverage gap identification, torsion prioritization, submission generation
- Benchmarking (dashboard, etc.)

Discussion topics

Item	Presenter	Notes
geomeTRIC combinations	David	Can we do combinatorics for e.g. phi and psi? JH: yes, limited to two combinatorics probably not advised, FB doesn’t accept 2D target TG: can select all rotateable bonds, just toss them in JH: could do different priorities for different torsions as separate compute calls TG: all these are small amino acids, just grab all rotateable bonds, maybe filter out the carboxyl at the end of e.g. aspartate TG + JH : make sure you (DD) submit the dihedrals separately with their own index in `dataset.add_molecules`
Upcoming FF fit datasets	Josh	Have we heard anything about incoming datasets for the next fit? DD: I keep expecting to see PRs come through from e.g. Hyesu, nothing yet TG: Next fit (1.3) looks like it will mostly be an adjustment of priors
Implicit solvent usage	Josh	All keywords now supported on psi4 nightly TG: we’ll need to update our psi4 version pinning around when we need to use this functionality DD: Josh, can you test the most recent linux build (https://anaconda.org/psi4/psi4/1.4a2.dev1013+fb7f385/download/linux-64/psi4-1.4a2.dev1013+fb7f385-py38hd3583e2_0.tar.bz2) and see if it meets our needs here? Simon may want to use it within a month or so, and we’ll want to verify we have a version we can pin to for production
ESPs	Josh	Need this PR to proceed. cannot rebuild psi4 wavefunction without this There is a hack we can do, but not sure if it’s sufficient BP : will talk to Lori about this PR, see if this is something she is willing to hand off or if it’s basically finished DD : we’ll loop back on this next week for path forward
Uploading datasets from private instance	Trevor	Would like to be able to submit dataset computed on private QCA instance to prod instance BP: the export part of that is actually implemented; import part needs more work TG: For bespoke workflow would be nice to have JH: will definitely be helpful and is of interest in bespoke workflow
Benchmarking	Trevor + David	TG: is there benchmarking tooling that can help with batch processing on the QCPortal client? BP: ideally the client already does this, but if it doesn’t, we’ll want it to DD: should be able to support benchmarking needs for batch processing; Ben and I embarking on this in the next few months TG: arbitrary lambda operations on data possible? DD: would have to happen in the python layer on the server perhaps submitted function could be dispatched to a worker process on the server, operations data local, then shipped by the server to client TG: richer filtering would be a higher priority than the above
QCSubmit for automated FF coverage improvement	Josh	JH: Would be welcome to have this functionality included in QCSubmit DD: would like everyone to to chew on what would be needed for the feedback loop of automated FF coverage gap identification, torsion prioritization, submission generation over the next week JH: Is the feedback loop approach similar to something the ANI folks used (including automated dataset selection)? Isn’t this called “active learning”? DD: I can reach out to Adrian Roitberg and friends for advice on their approach. even if we partially achieve an autonomous (with human approval) feedback loop, we will have also improved much of our data access and benchmarking infrastructure to support the decision-making access patterns required we already have large candidate datasets to draw samples from we already have good submission automation tooling via QCSubmit

Action items

David Dotson will continue preparations of TorsionDrive dataset for protein fragments
Joshua Horton will test the most recent linux build of psi4 to see if it meets our needs for PCM-based implicit solvent usage
Ben Pritchard will discuss QCElemental#230 with Lori Burns to determine path forward, since it appears to be a blocker for ESP usage
David Dotson will reach out to Adrian Roitberg and others in his group for advice on their approach for active learning to train ANI2x. Perhaps we can benefit from a similar approach?
Everyone will consider what would be needed for a feedback loop of automated FF coverage gap identification, torsion prioritization, submission generation over the next week

2020-09-18 QCA Submission Meeting notes

Date

Participants

Goals

Discussion topics

Action items

Decisions