/
2021-06-18 QCA Submission meeting notes

2021-06-18 QCA Submission meeting notes

Participants

  • Ben Pritchard

  • @Trevor Gokey

  • @Pavan Behara

  • @David Dotson

Goals

  • User questions/issues, new submissions

  • Science support needs

  • Infrastructure needs / advances

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Server side updates

Ben

  • BP: Updated the server, updated QCPortal package on conda-forge; can still use existing QCFractal managers

    • new fractal release, fixes bugs in QCPortal

    • small but time-consuming database migration complete

      • updates outdated QCElemental models for molecules in DB

  • BP: preparing teaching lessons over next two months, teaching late Aug - Oct

    • will be balancing this against QCArchive management, but won’t be as active

  • BP: once conda-forge package is up for QCFractal 0.15.6, can update managers gradually

User issues / questions



  • PB: compute expansion #210; one validation issue

  • JH: industry benchmark compute expansion #208 looks stuck

    • DD: request to server too big, may be addressed by QCSubmit#108; I will re-review implementation there to verify that the submission chunking should be able to handle cases like this

    • JH: can I use the staging server to try and submit the 76,000 mols?

      • BP: that server is running the next branch, so behavior is different and many things on collections are broken; not sure how useful this will be

      • JH: will stand up a local Snowflake instance to try out submission

  • DD: large datasets: looks like we can handle them gradually, but may hit file size and repo size limits at some point on qca-dataset-submission

    • JH: could we move toward a conda-forge style model that supports many repos, perhaps even single repo for a dataset, and this gives that source data provenance in a way that avoids the issue of a single repo growing without bound

    • TG: another way I can see this happening: we talked about replicating servers, this sounds similar to that?

    • BP: each folks run their own server?

    • TG: I would really like to see federation; like to push datasets to my own local server; using a Github approach would work for me, especially if just push to a repo and the dataset gets handled

    • BP: federation and import/export functionality all comes back to hashing, which I’m working to sort out and remove where possible

    • JH: could have Trevor’s way through Github Actions as well; folks dump their pre-computed data schema into Github, then absorbed into the main QCArchive

      • can also handle what we currently do, which is taking submissions and computing them

    • DD: this approach has some nice properties; could give us nice answers for versioning, DOIs via Zenodo, etc.

    • TG: is there support for querying server via DOI, other identifiers?

      • BP: not currently

    • DD: will write up a proposal for how we could do this, what benefits it would bring, challenges

    • TG: open question - is there any experience with running optimization datasets with more than 100k?

      • problems to anticipate?

      • BP: no real fundamental limit or things you have to do differently at that order of magnitude

Action items

Ben Pritchard will notify when QCFractal 0.15.6 conda-forge package is up; we will update managers gradually afterward
@David Dotson will review qca-dataset-submisison#210; submit
@Joshua Horton will try openff-qcsubmit#108 against a test server of his own, attempt to submit a very large (~76,000 mol) optimization dataset with many compute specs
@David Dotson will review openff-qcsubmit#108 to verify parallelization/chunking approach expected to work well for a submission like qca-dataset-submission#208
@David Dotson will assemble a proposal for next-generation dataset submission and provenance with QCFractal, perhaps adopting a model a la conda-forge

Decisions