Dataset Long-term Storage Phase 1

Dataset Long-term Storage Phase 1


@Jennifer A Clark


@Lily Wang



Other stakeholders

OpenFF forcefield and benchmark dataset users


Establish pipeline for copying datasets from MolSSI resources to Zenodo and apply pipeline to datasets used to train forcefields and benchmarking datasets.

Time frame

12/01/2024 - 12/01/2025

Key outcomes

Publish datasets used in forcefield fitting and benchmarking on Zenodo in a future proof format

Key metrics

  • Produce Zenodo publication for 3 sage forcefields and at least 1 benchmarking dataset.


In progress

GitHub repo

Slack channel


Designated meeting

QCSubmit discussion

Released datasets



 Problem Statement and Objective



Must have:

  • Produce long-term storage standard operating procedure (LTS-SOP) for transferring a dataset from a QCA-Dataset-Submission reference to MolSSI files to the publication of a dataset on Zenodo in a future proof format.

  • Publish datasets for previous OpenFF forcefields using LTS-SOP

  • Publish benchmarking datasets using LTS-SOP

Must have:

  • Produce long-term storage standard operating procedure (LTS-SOP) for transferring a dataset from a QCA-Dataset-Submission reference to MolSSI files to the publication of a dataset on Zenodo in a future proof format.

  • Publish datasets for previous OpenFF forcefields using LTS-SOP

  • Publish benchmarking datasets using LTS-SOP

Nice to have:

  • Automated pipeline for LTS-SOP

Not in scope:

  • Dataset sourced and maintained by teams other than OpenFF

Project Approaches


Related content