Dataset Long-term Storage Approach 1: Initial plan
Initial approach as suggested stakeholder consensus.
Overview
Summary | Generate standard operating procedure for dataset long-term storage (LTS-SOP) to copy key OpenFF datasets into a secondary storage platform (i.e., Zenodo) in a future-proof file format. |
---|---|
GitHub link |
|
Status | NOT STARTED IN PROGRESS COMPLETED WON'T PROGRESS |
Milestones and metrics
Stage | Milestone/Benchmark | Contributors | Deadline | Status |
---|---|---|---|---|
Generate Standard Operating Procedure | Determine qcportal capability to download datasets locally | @Jennifer A Clark |
| In progress |
| Determine final future proof file format for datasets | @Jennifer A Clark |
| COMPLETED |
| Determine file conversion strategy from output of qcportal to a future proof file format | @Jennifer A Clark |
| Not started |
| Determine location to aggregate or reference Zenodo DOIs | @Jennifer A Clark |
| In progress |
| Combine pipeline elements into LTS-SOP | @Jennifer A Clark |
| Not started |
Create dataset collection on qcportal | Debug QCA-Dataset-Submission issues, or establish record keeping mechanism for direct QCPortal use. | @Jennifer A Clark |
| In progress |
| Sage 2.0.0 | @Jennifer A Clark |
| In progress |
| Sage 2.1.0 | @Jennifer A Clark |
| In progress |
| Sage 2.2.0 | @Jennifer A Clark |
| Not started |
| Industry Benchmarking | @Jennifer A Clark |
| Not started |
Publish OpenFF datasets | Apply LTS-SOP to Sage 2.0.0 | @Jennifer A Clark |
| Not started |
| Apply LTS-SOP to Sage 2.1.0 | @Jennifer A Clark |
| Not started |
| Apply LTS-SOP to Sage 2.2.0 | @Jennifer A Clark |
| Not started |
| Apply LTS-SOP to Industry Benchmarking dataset | @Jennifer A Clark |
| Not started |
| Determine other benchmarking datasets of interest | @Jennifer A Clark |
| Not started |
Progress and findings
Generate Standard Operating Procedure
Although not supported now, within the timeframe of this project MolSSI is expected to have restored the qcportal capability of “dataset views” to allow downloading the files in some format. It should be trivial to export from there to qcschema molecules in hdf5 format. [QCA Users 2025-01-07]
Create Dataset Collection on QCPortal
This initiative was expected to be achieved by combining record lists from the published list from several datasets. However, after pushing to QCPortal additional specification keywords are added and new records are spawned. It is expected to have to do with the recent QCPortal upgrade? Strongly considering fallback to creating a collection directly with QCPortal, bypassing qc-submit… 2025-01-09 JCl/LW check-in
Publish OpenFF Datasets