Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Excerpt

Initial approach as suggested stakeholder consensus.

👀 Overview

Summary

Generate standard operating procedure for dataset continuity (DS-Continuity-long-term storage (LTS-SOP) to copy key OpenFF datasets into a secondary storage platform (i.e., Zenodo) in a future-proof file format.

GitHub link

Status

Status
titleNOT STARTED
Status
colourYellow
titleIN PROGRESS
Status
colourGreen
titleCOMPLETED
Status
colourRed
titleWON'T PROGRESS

Table of Contents
minLevel2
maxLevel6
outlinefalse
stylenone
typelist
printabletrue

\uD83D\uDEA9 Milestones and metrics

Stage

Milestone/Benchmark

Contributors

Deadline

Status

Generate Standard Operating Procedure

Determine qcportal capability to download datasets locally

Jennifer A Clark

Status
colourYellow
titleIn progress

Determine final future proof file format for datasets

Jennifer A Clark

Status
colourGreen
titleCOMPLETED

Determine file conversion strategy from output of qcportal to a future proof file format

Jennifer A Clark

Status
titleNot started

Determine location to aggregate or reference Zenodo DOIs

Jennifer A Clark

Status
colourYellow
titleIn progress

Combine pipeline elements into DSLTS-continuity-SOP

Jennifer A Clark

Status
titleNot started

Create dataset collection on qcportal

Debug QCA-Dataset-Submission issues, or establish record keeping mechanism for direct QCPortal use.

Jennifer A Clark

Status
colourYellow
titleIn progress

Sage 2.0.0

Jennifer A Clark

Status
colourYellow
titleIn progress

Sage 2.1.0

Jennifer A Clark

Status
colourYellow
titleIn progress

Sage 2.2.0

Jennifer A Clark

Status
titleNot started

Industry Benchmarking

Jennifer A Clark

Status
titleNot started

Publish OpenFF datasets

Apply DS-continuityLTS-SOP to Sage 2.0.0

Jennifer A Clark

Status
titleNot started

Apply DS-continuityLTS-SOP to Sage 2.1.0

Jennifer A Clark

Status
titleNot started

Apply DS-continuityLTS-SOP to Sage 2.2.0

Jennifer A Clark

Status
titleNot started

Apply DS-continuityLTS-SOP to Industry Benchmarking dataset

Jennifer A Clark

Status
titleNot started

Determine other benchmarking datasets of interest

Jennifer A Clark

Status
titleNot started

📊 Progress and findings

Generate Standard Operating Procedure

Although not supported now, within the timeframe of this project MolSSI is expected to have restored the qcportal capability of “dataset views” to allow downloading the files in some format. It should be trivial to export from there to qcschema molecules in hdf5 format. [QCA Users 2025-01-07]

Create Dataset Collection on QCPortal

This initiative was expected to be achieved by combining record lists from the published list from several datasets. However, after pushing to QCPortal additional specification keywords are added and new records are spawned. It is expected to have to do with the recent QCPortal upgrade? Strongly considering fallback to creating a collection directly with QCPortal, bypassing qc-submit… 2025-01-09 JCl/LW check-in

Publish OpenFF Datasets