2025-05-20 QCA Dataset Submission Meeting

2025-05-20 QCA Dataset Submission Meeting

Participants

  • @Jennifer Clark

  • @Matt Thompson

  • @Jeffrey Wagner

  • @Lily Wang

Discussion topics

 

Item

 

Item

  • URGENT QCADS: batch response: This repository exceeded its LFS budget. The account responsible for the budget should increase it to restore access.

GitHub Action

  • JW: I’ll take care of that right now and I’ll greatly increase the limit

  • LW: How much is it?

    • JW: $20 per month, …oh wait .. oh no we are good

  • LW: Do we have a budgeted ceiling on this? Fun fact, the Mac runners are the most expensive

    • JW: I need to increase it every so often, but it’s usually $20

  • LW: I’m not seeing any LFS costs

  • JW: Fixed it! Added a $30/month limit for LFS here

New Psi4 docker image

Jeff regrets being happy to help

Updated PR to require QCFractal 0.61 (the most recent) or later version. Can Jeff approve? See GitHub Actions

  • JW: I’m assigning that to myself and I’ll review

QDS handling of non-QCSubmit dataset.

New module in a PR for QCFractal: qcportal.external.scaffold.py to create a json of the dataset and recapitulate a json back into a QCFractal dataset. See notes

QDS Lifecycle:

  • Validation: Will write validation for QCPortal use of scaffold

  • Queued Submission: Will adapt CI to process scaffold.json files and submit

Will tackle in May 18th June 2nd iteration since Sage dataset releases were pushed back by QCFractal release.

  • LW – Sounds good.

  •  

Update on clean force field releases

QDS Submissions

Sage 2.0.0 is Done, Sage 2.1.0 and Sage 2.20 will be submitted at the end of the week.

Docker Image

  • We should be able to use the same one generated for QDS (6 GB).

    • Before we said that if it was small enough we would combine the docker image and dataset view into one Zenodo entry, otherwise we might consider separating

  • From Last Meeting: JC: Even though I think this is a little high, I think we should put them together and include a python script in the image to easily process the data.

    • JW – 1000% agree. Zenodo storage is cheap so we might just plan on always including image in artifacts, even if somewhat redundant

Action Item: JC needs to make this code/notebook to package in docker, permission to make a ticket?

  • LW: I agree that 6 GB is not small but fine, yes go ahead and make a ticket.

  • JC – I’ll try to do this this week so JW is still around.

  • JW – I’d keep the notebook really general so the same format can be used for other entries with no/minimal modifications.

  •  

Update Dataset Tracking

Project Board; Slides

  • Sage 2.0.0 Opt / Torsion

    • Done! Thanks to new QCFractal release

  • Stalled PR 434: TM PDB CCD

    • Need new docker to resume with assessing errors

  • Running PR 440: Chodera tmQM

    • Still moving

  • Stalled PR 449

    • Need new docker for updated basis sets.

    • Need memory assessment for alternative model chemistries

  • PR 453

    • LW – These are taking forever (on track to be literally years), I’m thinking of deprioritizing these and resubmitting a subset that I’m focused on.

      • Hessians are taking 8 hrs per job

    • JC: I’ll assess the MW and adjust the deployments accordingly

      • JW: I’ll make a ticket

psi4_utilities

  • When contributing basis sets, Lori requests that a basis set with diffuse functions be separated into the base-basis set and an auxiliary set of basis functions. I made a module to do this and used it successfully to add additional atom support for aug-cc-pvtz (set for next release…last psi4 release was Feb 2024).

    • LW: Is the psi4 PR done?

      • JC: Yes

  • Another module is meant to estimate the amount of memory needed for a psi4 calculation. I have a draft done and am running it on my TM Benchmarking dataset. There are two ways to estimate the number of basis functions, auxiliary, and virtual functions, an educated number, and explicitly with the wavefunction. The latter is not recommended due to the amount of time it takes but it’s required for the 3c methods which don’t use traditional basis sets like other methods…. my script is still running… Lily and I can discuss more at our check-in.

MolSSI Info / Align Priorities on MolSSI Asks

2025-05-13 QCArchive User Meeting

New from last QCAUM meeting:

  • New release unblocks a few things for me this week.

  • Statistics of compute_tag/status groupings and the number of associated records.

    • Before 3 days to get statistics, now it takes 1.3 s.

Requests:

  • I’m ran a notebook to benchmark the disparity between iterating over entries and records to make a case to Ben that the latter is prohibitive. (Just sent results today)
    It takes an order of magnitude longer to iterate through records than entries, even though I’ve fetched both to cache ahead of time. With this in mind, it takes ~90 min to fetch the entries for my large dataset, which would correspond to 15 hours for records

  • JW: On one hand, we shouldn’t request a change in output since we want to consistently encourage BP to keep a stable API, but on the other hand this is a young function and we’re the only people who use this.

    • JC – It would also suffice if BP could improve the speed at which records are iterated over. So maybe we ask for that first and keep this request as a fallback.

    • JW – Sounds good.

  • JW: We had an issue with the new QCFractal release breaking our CI

    • LW: What if we change our CI to pull from the main branch of QCPortal to foresee these issues before a release?

    • MT: That’s an amazing idea, I’m making a Zenhub ticket for it now.

Can we have the spin density calculated and output without long term saving of the wavefunction.

energy, wfn = psi4.energy("uhf/6-31G(d)", return_wfn=True)
psi4.cubeprop(wfn)

  • SDA_Cube_0_0.cube is generated

Old Issue of the Week

Decided to always leave a comment and a tag “reviewed-2025“. Skip this segment for this week

 

 

Action items

Decisions