2025-06-17 QCA Dataset Submission Meeting

2025-06-17 QCA Dataset Submission Meeting

Participants

  • @Jennifer Clark

  • @Jeffrey Wagner

  • @Lily Wang

Discussion topics

 

Item

 

Item

Update Dataset Tracking

Project Board; Slides

  • End of Life PR 434: TM PDB CCD

    • Looked at Errors, the XTB output is prepended to the expected json string, causing a read json error. Submitted QCFractal issue for posterity.

      • JW – Does this mean all xtb calcs will fail?

        • JC – Only 26ish% of them in this dataset. And the errors are persistent over restarts.

      • LW – This is concerning if we want to use xtb for future geo opts.

      • JC – xtb doesn’t capture pi bonding, which requires DFT, so I’m using richard’s favorite method for that. Also, support for both xtb and its qcengine wrapper is being discontinued so we’ll need to move to tblite or another alternative anyway.

      • (General) – We’ll end-of-life this dataset

  • Running PR 440: Chodera tmQM

    • Still moving

  • PR 453: Hessians

    • Still moving

      • LW – I’m fine with skipping everything over 600 MW. Not cancel, just stop them running.

  • Stalled PR 449

    • Need to determine proper orbitals for CASSCF method

QDS handling of non-QCSubmit dataset.

Both Scaffold Validation and Scaffold Submission PRs are completed and ready for review

Update on clean force field releases

QDS Submissions

Decision on Zenodo Authors?

  • LW: Just the data curation authors, you and I

Need QCADS SOP PR reviewed

Need Zenodo Sage 2.0.0 - 2.2.0 entries reviewed

Clean benchmark releases

Need script to filter Industry benchmark, I’ll do the rest with the release

  • LW: For reference there are records with wrong linear angles in rings

Danny Cole gave a few datasets that I’m not sure what to do with. Slides

  • JW – I think these should each be separate zenodo entries

  • LW – Largely agree, with the exception that making a 1.1 verison of MLPepper with everything combined would be good.

I can go to the Newcastle check in tomorrow to discuss.

  • JW – Would be good to being up the possible scf properties thing

psi4_utilities

  • Another module is meant to estimate the amount of memory needed for a psi4 calculation for Multi-reference methods

  • Could eventually be used to automating NRP submissions

MolSSI Info / Align Priorities on MolSSI Asks

2025-05-27 QCArchive User Meeting; No notes from June 10th meeting

New from last QCAUM meeting:

  • (5/27) Dataset lock was discussed. Ben thinks it would be easy to implement but there was discussion on whether it was worth the time needed.

  • (6/10) Discussed the future of our data hosted by MolSSI and action items to ensure continuity. Worst case scenario options include:

    • We get a full copy of postgreSQL database (5TB) and host on our own server

      • JW – This seems like plenty for us, if we didn’t know/trust a MolSSI shutdown day we could preemptively make a backup of this file.

    • QCFractal updates to allow for copying from a database on one server to a database on another server (with or without dataset-view intermediates). This is complicated with reindexing records.

      • JC – this may be useful for us. Would have been helpful when I needed to do submission tests in my own with QCF snowflake.

      • JC – If this were done it would supersede the first option.

Requests:

  • Ben has recognized receiving my benchmarking notebook but hasn’t gotten to looking into how to speed up record access.

    • I ran a notebook to benchmark the disparity between iterating over entries and records to make a case to Ben that the latter is prohibitive.
      It takes an order of magnitude longer to iterate through records than entries, even though I’ve fetched both to cache ahead of time. With this in mind, it takes ~90 min to fetch the entries for my large dataset, which would correspond to 15 hours for records

  • Add ability to copy records from a database on one server to a database on another server. It sounds like we don’t feel that this is a priority.

Old Issue of the Week

Fix Error Jobs in "OpenFF Group1 Torsions" TorsionDriveDataset

  • Yudong Qiu requested an update to the dataset keywords to resolve an error. At the time of writing the issue there were 798 complete of 820.

  • Today there are 810 complete, 4 errored, and 6 invalid.

  • I propose we close with a comment containing the current status showing that the issue was resolved.

  • Yes, closed

Bonus: Migrate Roche fragment set torsion drive input scripts here

  • Resolved by PR86?

  • Yes, closed

 

 

Action items

Decisions