2021-09-10 QCA Submission meeting notes

Participants

Goals

Updates from MolSSI
- duplicate tasks deleted?
User questions/issues, new submissions
Science support needs
Infrastructure needs / advances

Discussion topics

Item	Presenter	Notes
Updates from MolSSI	Ben	Fixed issues with inconsistent state 10s of thousands of jobs in task queue associated with complete records if you resubmit a record that already exists, even if COMPLETE, a task will get created know where this hole is; can fix and produce new release, deploy to public QCArchive BP: removed all duplicate tasks DD: will turn managers back on to compute industry datasets, put them on their own compute tag and observe if backward-forward behavior persists
User issues, new submissions		PB: #220 - need new qcsubmit JH: just have one PR on qcsubmit blocking release; working on this PB: have one compute spec on this submission `DF-CCSD(T)/CBS` that will take up to 150 GiB of memory for 16 heavy atoms PB: typically also use 48 cores JH: working on #223, blocked by validation issues as well going to add some ANI specs, will need some ANI workers will still use ANI2x we had issues with, but relaxing convergence criteria DD: until we have a release from Adrian, unfortunately can’t use the latest fixes to this in prod JH: ML stuff, adding HDF5 support for QCSubmit instead of a ton of SDFs, can use one file JW: what are the contents JH: conformers and mapped SMILES JW: is this file going to contain the same content as the other files, or is there something fundamentally different here? one thing that makes SDF safer is that readers and writers are not something we’re defining JH: there’s a lot of repeated info in the SDF also want to pave the way for multimolecule support, dimers, etc. JW: good point JH: understand concerns on future variability; would like to get a spec down as much as possible JH: any feedback anyone has on this issue ( ) appreciated DD: concerned about collection size; will run into same issue as before SB+JH: not clear if it’s a single collection with a million conformers, or spread across several collections, or multiple million conformer collections BP: the metadata object for a collection gets very big as more and more objects are involved (molecules, specs), so this becomes an issue in the way collections are currently implemented is getting fixed in the `next` branch SB: can see this taking another month for John and Peter to resolve; what is the timeline for `next` branch deployment? BP: end of the year earliest? Can’t make a guarantee there, though JW: DD, would you be willing to jump onto next OpenMM call to lay out constraints? DD: yes JH: is a test submission still in play? SB: yeah, can push for this, also as a way to prove the core idea of the dataset works before we push through a massive set SB: Chapin’s dataset; what’s the status? DD: worked with him to set up manager on UCSD resources; can switch on and off at will; waiting for word on new submission status SB: think there may still be some ambiguity on what data, how it will be different from the Cerutti sets; will coordinate with Chapin and see where we’re at
Science support needs
Infrastructure needs		BP: with this fix for the submission deduplication, can also include the fix for the slow queries encountered recently this is adding indices to a single table; remove combined index, add a bunch of single indices to columns is a DB migration in practice will require more memory on the server; shouldn’t be an issue PB: question for Josh using one spec in new submission where method is a “ + “ joined method, no basis (https://psicode.org/psi4manual/master/cbs.html ) JW: short form for this method looks like it might present a discoverability issue; perhaps use a long form instead? PB: should I leave it like this, or use long form? JH: I think long form is supported through keywords; qcsubmit won’t like `None` for `basis` in `psi4`

Action items

David Dotson will turn managers back on for industry datasets, put them on their own compute tag to monitor behavior and progress
Joshua Horton will cut a new release of openff-qcsubmit to unblock new users submissions
David Dotson will spin up workers specifically for Pavan’s high memory spec, Josh’s ANI submission once submitted
David Dotson will chime on openmm/qmdataset on limits of large collection submissions for current QCArchive; get a sense for the numbers of entities involved and assess if this presents problems for collection metadata
Simon Boothroyd will push for a test submission from John/Peter for openmm/qmdataset to assess scientific value before pursuing larger sets
Ben Pritchard will prioritize Collection and next branch development on QCFractal; aiming tentatively for end-of-year deployment (cannot guarantee)
Simon Boothroyd will follow up with Chapin Cavender on status of dipeptide dataset, identify ambiguities and resolve if possible
Ben Pritchard will include a fix for submission task duplication and slow queries in upcoming Fractal release and deployment/migration
Joshua Horton will follow up with Pavan Behara on long form / keyword support in openff-qcsubmi for psi4 specs that include basis=None

Participants

Goals

Discussion topics

Action items

Decisions

0 Comments