/
Dataset Long-term Storage Phase 1

Dataset Long-term Storage Phase 1

Driver

@Jennifer A Clark

Approver

@Lily Wang

Contributors

 

Other stakeholders

OpenFF forcefield and benchmark dataset users

Objective

Establish pipeline for copying datasets from MolSSI resources to Zenodo and apply pipeline to datasets used to train forcefields and benchmarking datasets.

Time frame

12/01/2024 - 12/01/2025

Key outcomes

Publish datasets used in forcefield fitting and benchmarking on Zenodo in a future proof format

Key metrics

  • Produce Zenodo publication for 3 sage forcefields and at least 1 benchmarking dataset.

Status

In progress

GitHub repo

Slack channel

https://openforcefieldgroup.slack.com/archives/CJQ4DCWN8

Designated meeting

QCSubmit discussion

Released datasets

Publication

 

 Problem Statement and Objective

 

 Scope

Must have:

  • Produce long-term storage standard operating procedure (LTS-SOP) for transferring a dataset from a QCA-Dataset-Submission reference to MolSSI files to the publication of a dataset on Zenodo in a future proof format.

  • Publish datasets for previous OpenFF forcefields using LTS-SOP

  • Publish benchmarking datasets using LTS-SOP

Must have:

  • Produce long-term storage standard operating procedure (LTS-SOP) for transferring a dataset from a QCA-Dataset-Submission reference to MolSSI files to the publication of a dataset on Zenodo in a future proof format.

  • Publish datasets for previous OpenFF forcefields using LTS-SOP

  • Publish benchmarking datasets using LTS-SOP

Nice to have:

  • Automated pipeline for LTS-SOP

Not in scope:

  • Dataset sourced and maintained by teams other than OpenFF

Project Approaches

References