2025-05-06 QCA dataset submission meeting

2025-05-06 QCA dataset submission meeting

Participants

  • @Jennifer Clark

  • @Jeffrey Wagner

Discussion topics

 

Item

 

Item

Update Dataset Tracking

Project Board; Slides

  • Stalled PR 434: TM PDB CCD

    • Need new docker to resume with assessing errors

  • Running PR 440: Chodera tmQM

    • Still plowing

  • Stalled PR 449

    • Need new docker for updated basis sets.

    • Need memory assessment for alternative model chemistries

  • Stalled: Sage 2.0.0 Opt / Torsion

    • 115 records of 713 were duplicated. Reached out to Ben for advice.

New Psi4 docker image

Jeff regrets being happy to help

This is done, but need a new release of QCPortal before I have what I need from a new image.

QDS handling of non-QCSubmit dataset.

New module in a PR for QCFractal: qcportal.external.scaffold.py to create a json of the dataset and recapitulate a json back into a QCFractal dataset. See notes

QDS Lifecycle:

  • Validation: Will write validation for QCPortal use of scaffold

  • Queued Submission: Will adapt CI to process scaffold.json files and submit

Will tackle in May 18th iteration

MolSSI Info / Align Priorities on MolSSI Asks

2025-04-29 QCA Users Meeting

New from last QCAUM meeting:

  • Great new front end GUI in the works and “projects” capability is getting there.

  • Kudos to Ben is getting us a plot of our utilization this year

  • Ben got a lot of feedback from us, but he already gave me statistics of compute_tag/status groupings and the number of associated records.

    • Getting these stats took 3 days for the tmQM dataset

    • Now locally it takes 0.3 s, and for the real dataset it takes ???

Requests:

  • We’re looking forward to the QCFractal 0.61 release so we can get error messages

Update on clean force field releases

QDS Submissions

Docker Image

  • We should be able to use the same one generated for QDS (6 GB).

    • Before we said that if it was small enough we would combine the docker image and dataset view into one Zenodo entry, otherwise we might consider separating

  • JC: Even though I think this is a little high, I think we should put them together and include a python script in the image to easily process the data.

    • JW – 1000% agree. Zenodo storage is cheap so we might just plan on always including image in artifacts, even if somewhat redundant

Old Issue of the Week

Last time we decoded that for QDS Issue 122: Add life cycle collection tags, we will leaving it open, but not acting on it. Can I add a tag to keep track which issues we’ve discussed in this segment, or should I not disturb it?

  • JW: Maybe we should always leave a comment and a tag “reviewed-2025“, this was completed for Issue 122

Potential Dataset: SureChEMBL (7/2019)

Because the ChemBL dataset isn’t fully covered in the industry benchmark, we will keep this open. This issue was labeled as “reviewed-2025”

 

 

Action items

Decisions