2021-06-18 QCA Submission meeting notes

Participants

Ben Pritchard
@Trevor Gokey
@Pavan Behara
@David Dotson

Goals

User questions/issues, new submissions
Science support needs
Infrastructure needs / advances

Discussion topics

Item	Presenter	Notes

Item	Presenter	Notes
Server side updates	Ben	BP: Updated the server, updated QCPortal package on conda-forge; can still use existing QCFractal managers new fractal release, fixes bugs in QCPortal small but time-consuming database migration complete updates outdated QCElemental models for molecules in DB BP: preparing teaching lessons over next two months, teaching late Aug - Oct will be balancing this against QCArchive management, but won’t be as active BP: once conda-forge package is up for QCFractal 0.15.6, can update managers gradually
User issues / questions		PB: compute expansion #210; one validation issue JH: industry benchmark compute expansion #208 looks stuck DD: request to server too big, may be addressed by QCSubmit#108; I will re-review implementation there to verify that the submission chunking should be able to handle cases like this JH: can I use the staging server to try and submit the 76,000 mols? BP: that server is running the `next` branch, so behavior is different and many things on collections are broken; not sure how useful this will be JH: will stand up a local Snowflake instance to try out submission DD: large datasets: looks like we can handle them gradually, but may hit file size and repo size limits at some point on `qca-dataset-submission` JH: could we move toward a `conda-forge` style model that supports many repos, perhaps even single repo for a dataset, and this gives that source data provenance in a way that avoids the issue of a single repo growing without bound TG: another way I can see this happening: we talked about replicating servers, this sounds similar to that? BP: each folks run their own server? TG: I would really like to see federation; like to push datasets to my own local server; using a Github approach would work for me, especially if just push to a repo and the dataset gets handled BP: federation and import/export functionality all comes back to hashing, which I’m working to sort out and remove where possible JH: could have Trevor’s way through Github Actions as well; folks dump their pre-computed data schema into Github, then absorbed into the main QCArchive can also handle what we currently do, which is taking submissions and computing them DD: this approach has some nice properties; could give us nice answers for versioning, DOIs via Zenodo, etc. TG: is there support for querying server via DOI, other identifiers? BP: not currently DD: will write up a proposal for how we could do this, what benefits it would bring, challenges TG: open question - is there any experience with running optimization datasets with more than 100k? problems to anticipate? BP: no real fundamental limit or things you have to do differently at that order of magnitude

Action items

Ben Pritchard will notify when QCFractal 0.15.6 conda-forge package is up; we will update managers gradually afterward

@David Dotson will review qca-dataset-submisison#210; submit

@Joshua Horton will try openff-qcsubmit#108 against a test server of his own, attempt to submit a very large (~76,000 mol) optimization dataset with many compute specs

@David Dotson will review openff-qcsubmit#108 to verify parallelization/chunking approach expected to work well for a submission like qca-dataset-submission#208

@David Dotson will assemble a proposal for next-generation dataset submission and provenance with QCFractal, perhaps adopting a model a la conda-forge

2021-06-18 QCA Submission meeting notes

Participants

Goals

Discussion topics

Action items

Decisions