2021-12-17 QCA Submission meeting notes

Participants

@Pavan Behara
@Chapin Cavender
@Joshua Horton
Ben Pritchard
@David Dotson

Goals

Updates from MolSSI
- how fast is storage filling from wavefunction-storing single-point sets?
Compute
- QM workers on Lilac
- XTB workers on Newcastle
- QM workers on TSCC
- QM workers on UCI
- QM, ANI, XTB workers on PRP
New submissions
- submission issues with OpenMM datasets - Gateway Timeouts
  - update on behavior and workaround
- dipeptide dataset
  - v1.1
User questions/issues
Science support needs
Infrastructure needs / advances
- psi4 on conda-forge

Discussion topics

Item	Presenter	Notes

Item	Presenter	Notes
Updates from MolSSI	Ben	BP – if you submit an optimization, the protocols part of a QC spec is ignored there is utility for it, e.g. STDOUT flag planning to put that into QCElmental/QCSchema BP – do we want torsiondrive protocols? DD – presumably yes, can think of e.g. wanting wavefunction for final gradient of each optimization if we can structure `protocols` to accommodate future user requests to modify behavior of torsiondrives, optimizations, that would be ideal; not sure how easy that is though JH – if we’re calculating the wavefunction for every gradient and only saving the last one, that would allow for using the wavefunction of the last gradient as the SCF guess for the next gradient BP – also can accommodate torsiondrives as a procedure now that it’s in QCEngine BP – working to make the database more constrained, more normal-form have several torsiondrives that are missing optimizations (about 2 dozen); don’t have a good solution and can’t just drop nulls in have a table that links torsiondrive to its optimizations also have a table that stores index of minimum DD – I think it’s fine if we drop the tables that store the minimum, and instead determine the minimum client side or through a REST endpoint and determine server-side; not much argument for storing this BP – also have new statuses waiting running errored cancelled complete deleted do we want an e.g. `invalid`? Something to indicate a known broken calculation? a complete record could be marked as invalid if it’s found to be straight up wrong, for example also gives us a way out for the torsiondrives with missing optimizations, for example BP – could also create a one-to-many table that can take multiple comments on a record PB – can we make collections `invalid`, or just records? BP – that can be done! Haven’t gotten into collections yet. will be hitting that in January versioning, snapshotting, etc. DD – I think supporting versioning, status as separate fields would be a boon; will work with you on this in January. BP – thinking of holding a virtual workshop, use it to show off the way new QCArchive works, tentatively in February DD – how quickly is storage filling up due to wavefunction storage? Have pubchem set going and want to ensure we won’t overwhelm capacity over the break. BP – about 5MB per calculation only have about 1.5TB of storage on the SSD DD – I’ll notify John that we will trim off orbitals and eigenvalues from these submissions, then begin submitting them
Compute		PB – if folks want the orbitals, we may get a request later for orbitals. would be wise to include Peter in the discussion DD – will put together a message to Peter and John to begin using the pubchem set 1 dataset and determine if they need wavefunction data for their use case. Will inform whether we proceed with wavefunctions on the remaining 5 datasets. Will note storage projections as to why this is important to answer.
Task submission slowness		DD – will make a PR to QCFractal for submission optimizations; corresponding PR to QCSubmit to take advantage can grin and bear it with current submissions for now BP – restarting database with postgres logging enabled; will see if we’re missing a key index on a table; if so fixing task creation slowness may be an easy fix DD – running submission with tasks now BP – in new version we will query for spec only once per set of tasks submitted; right now a query on the spec hits 5 indexes, each and every task right now time consuming part is `_create_task`; considering moving this to occurring when manager requests task, not on client submission DD – that would actually be a pretty great optimization, with no downside as far as I can tell; manager can afford to wait on first call, but in steady-state operation calls for X tasks at a time, so wouldn’t see slowdown really; would save the client an immense amount of time submitting.
Walkthrough of current work on QCFractal	Ben	Organized by entity, not by system component. e.g. molecules have everything in same directory, including models, REST route, postgres storage schema, etc. molecules have a mutable identifier field that can be queried useful for adding e.g. CMILES directly to a molecule client has separation between `get_` and `query_` methods getters allow you to get things by id only, in order query allows by field, but order of course not guaranteed, max number of results, can be paginated tasks hidden from users, attached to records due to 1:1 relationship records keep their series of errors, allowing for in-server error cycling user management can be done from client; using RBAC (role-based access control) as the basis admin, read, monitor, compute, and submit roles switched from tornado to flask, using java web tokens instead of sending credentials with every request JH – can you query by molecule identifiers? BP – currently can do by id if I can query by e.g. Inchi, then get say all optimizations that use that molecule DD – perhaps we can stack some methods on `Molecule` itself that lets you do this; might require either subclassing or monkey-patching QCElementals' molecule object. BP – deduplication is tricky, in particular for the trajectory of an optimization for optimizations, we now create entirely new gradients avoids issue of not being able to resubmit calculations that have the same set of hashed attributes, but e.g. a different psi4 version that fixes a bug

Action items

@David Dotson will engage John and Peter regarding need for wavefunctions for pubchem sets; ask them to begin using pubchem set 1 for downstream work if it helps answer this question

@David Dotson will make PRs against QCFractal and QCSubmit implementing submission optimizations, aiming for merge and release in early January

@David Dotson will finish manager signal handling PR for QCFractal for January release

@David Dotson will work with Ben on collection status, version field in January

Meetings

2021-12-17 QCA Submission meeting notes

Participants

Goals

Discussion topics

Action items

Decisions