Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Current »

Participants

Goals

  • Updates from MolSSI

    • duplicate tasks deleted?

  • User questions/issues, new submissions

  • Science support needs

  • Infrastructure needs / advances

Discussion topics

Item

Presenter

Notes

Updates from MolSSI

Ben

  • Fixed issues with inconsistent state

    • 10s of thousands of jobs in task queue associated with complete records

    • if you resubmit a record that already exists, even if COMPLETE, a task will get created

      • know where this hole is; can fix and produce new release, deploy to public QCArchive

  • BP: removed all duplicate tasks

    • DD: will turn managers back on to compute industry datasets, put them on their own compute tag and observe if backward-forward behavior persists

User issues, new submissions


  • PB: #220 - need new qcsubmit

    • JH: just have one PR on qcsubmit blocking release; working on this

    • PB: have one compute spec on this submission DF-CCSD(T)/CBS that will take up to 150 GiB of memory for 16 heavy atoms

    • PB: typically also use 48 cores

  • JH: working on #223, blocked by validation issues as well

    • going to add some ANI specs, will need some ANI workers

      • will still use ANI2x we had issues with, but relaxing convergence criteria

      • DD: until we have a release from Adrian, unfortunately can’t use the latest fixes to this in prod

  • JH: ML stuff, adding HDF5 support for QCSubmit

    • instead of a ton of SDFs, can use one file

    • JW: what are the contents

    • JH: conformers and mapped SMILES

    • JW: is this file going to contain the same content as the other files, or is there something fundamentally different here?

      • one thing that makes SDF safer is that readers and writers are not something we’re defining

      • JH: there’s a lot of repeated info in the SDF

        • also want to pave the way for multimolecule support, dimers, etc.

      • JW: good point

      • JH: understand concerns on future variability; would like to get a spec down as much as possible

    • JH: any feedback anyone has on this issue ( ) appreciated

  • DD: concerned about collection size; will run into same issue as before

    • SB+JH: not clear if it’s a single collection with a million conformers, or spread across several collections, or multiple million conformer collections

    • BP: the metadata object for a collection gets very big as more and more objects are involved (molecules, specs), so this becomes an issue in the way collections are currently implemented

      • is getting fixed in the next branch

    • SB: can see this taking another month for John and Peter to resolve; what is the timeline for next branch deployment?

      • BP: end of the year earliest? Can’t make a guarantee there, though

    • JW: DD, would you be willing to jump onto next OpenMM call to lay out constraints?

      • DD: yes

  • JH: is a test submission still in play?

    • SB: yeah, can push for this, also as a way to prove the core idea of the dataset works before we push through a massive set

  • SB: Chapin’s dataset; what’s the status?

    • DD: worked with him to set up manager on UCSD resources; can switch on and off at will; waiting for word on new submission status

    • SB: think there may still be some ambiguity on what data, how it will be different from the Cerutti sets; will coordinate with Chapin and see where we’re at

Science support needs

Infrastructure needs

  • BP: with this fix for the submission deduplication, can also include the fix for the slow queries encountered recently

    • this is adding indices to a single table; remove combined index, add a bunch of single indices to columns

    • is a DB migration in practice

    • will require more memory on the server; shouldn’t be an issue

  • PB: question for Josh

    • using one spec in new submission where method is a “ + “ joined method, no basis (https://psicode.org/psi4manual/master/cbs.html )

    • JW: short form for this method looks like it might present a discoverability issue; perhaps use a long form instead?

    • PB: should I leave it like this, or use long form?

      • JH: I think long form is supported through keywords; qcsubmit won’t like None for basis in psi4

Action items

  • David Dotson will turn managers back on for industry datasets, put them on their own compute tag to monitor behavior and progress
  • Joshua Horton will cut a new release of openff-qcsubmit to unblock new users submissions
  • David Dotson will spin up workers specifically for Pavan’s high memory spec, Josh’s ANI submission once submitted
  • David Dotson will chime on openmm/qmdataset on limits of large collection submissions for current QCArchive; get a sense for the numbers of entities involved and assess if this presents problems for collection metadata
  • Simon Boothroyd will push for a test submission from John/Peter for openmm/qmdataset to assess scientific value before pursuing larger sets
  • Ben Pritchard will prioritize Collection and next branch development on QCFractal; aiming tentatively for end-of-year deployment (cannot guarantee)
  • Simon Boothroyd will follow up with Chapin Cavender on status of dipeptide dataset, identify ambiguities and resolve if possible
  • Ben Pritchard will include a fix for submission task duplication and slow queries in upcoming Fractal release and deployment/migration
  • Joshua Horton will follow up with Pavan Behara on long form / keyword support in openff-qcsubmi for psi4 specs that include basis=None

Decisions

  • No labels

0 Comments

You are not logged in. Any changes you make will be marked as anonymous. You may want to Log In if you already have an account.