2021-09-10 QCA Submission meeting notes

Participants

  • @Simon Boothroyd

  • @Jeffrey Wagner

  • @Pavan Behara

  • Ben Pritchard

  • @David Dotson

Goals

  • Updates from MolSSI

    • duplicate tasks deleted?

  • User questions/issues, new submissions

  • Science support needs

  • Infrastructure needs / advances

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Updates from MolSSI

Ben

  • Fixed issues with inconsistent state

    • 10s of thousands of jobs in task queue associated with complete records

    • if you resubmit a record that already exists, even if COMPLETE, a task will get created

      • know where this hole is; can fix and produce new release, deploy to public QCArchive

  • BP: removed all duplicate tasks

    • DD: will turn managers back on to compute industry datasets, put them on their own compute tag and observe if backward-forward behavior persists

User issues, new submissions



  • PB: #220 - need new qcsubmit

    • JH: just have one PR on qcsubmit blocking release; working on this

    • PB: have one compute spec on this submission DF-CCSD(T)/CBS that will take up to 150 GiB of memory for 16 heavy atoms

    • PB: typically also use 48 cores

  • JH: working on #223, blocked by validation issues as well

    • going to add some ANI specs, will need some ANI workers

      • will still use ANI2x we had issues with, but relaxing convergence criteria

      • DD: until we have a release from Adrian, unfortunately can’t use the latest fixes to this in prod

  • JH: ML stuff, adding HDF5 support for QCSubmit

    • instead of a ton of SDFs, can use one file

    • JW: what are the contents

    • JH: conformers and mapped SMILES

    • JW: is this file going to contain the same content as the other files, or is there something fundamentally different here?

      • one thing that makes SDF safer is that readers and writers are not something we’re defining

      • JH: there’s a lot of repeated info in the SDF

        • also want to pave the way for multimolecule support, dimers, etc.

      • JW: good point

      • JH: understand concerns on future variability; would like to get a spec down as much as possible

    • JH: any feedback anyone has on this issue ( ) appreciated

  • DD: concerned about collection size; will run into same issue as before

    • SB+JH: not clear if it’s a single collection with a million conformers, or spread across several collections, or multiple million conformer collections

    • BP: the metadata object for a collection gets very big as more and more objects are involved (molecules, specs), so this becomes an issue in the way collections are currently implemented

      • is getting fixed in the next branch

    • SB: can see this taking another month for John and Peter to resolve; what is the timeline for next branch deployment?

      • BP: end of the year earliest? Can’t make a guarantee there, though

    • JW: DD, would you be willing to jump onto next OpenMM call to lay out constraints?

      • DD: yes

  • JH: is a test submission still in play?

    • SB: yeah, can push for this, also as a way to prove the core idea of the dataset works before we push through a massive set

  • SB: Chapin’s dataset; what’s the status?

    • DD: worked with him to set up manager on UCSD resources; can switch on and off at will; waiting for word on new submission status

    • SB: think there may still be some ambiguity on what data, how it will be different from the Cerutti sets; will coordinate with Chapin and see where we’re at

Science support needs

 

 

Infrastructure needs

 

Action items

@David Dotson will turn managers back on for industry datasets, put them on their own compute tag to monitor behavior and progress
@Joshua Horton will cut a new release of openff-qcsubmit to unblock new users submissions
@David Dotson will spin up workers specifically for Pavan’s high memory spec, Josh’s ANI submission once submitted
@David Dotson will chime on openmm/qmdataset on limits of large collection submissions for current QCArchive; get a sense for the numbers of entities involved and assess if this presents problems for collection metadata
@Simon Boothroyd will push for a test submission from John/Peter for openmm/qmdataset to assess scientific value before pursuing larger sets
Ben Pritchard will prioritize Collection and next branch development on QCFractal; aiming tentatively for end-of-year deployment (cannot guarantee)
@Simon Boothroyd will follow up with @Chapin Cavender on status of dipeptide dataset, identify ambiguities and resolve if possible
Ben Pritchard will include a fix for submission task duplication and slow queries in upcoming Fractal release and deployment/migration
@Joshua Horton will follow up with @Pavan Behara on long form / keyword support in openff-qcsubmi for psi4 specs that include basis=None

Decisions

Â