Page Comparison

Versions Compared

Old Version 1

changes.mady.by.user Matt Thompson

Saved on Aug 14, 2024

compared with

New Version Current

changes.mady.by.user Jeffrey Wagner

Saved on Aug 14, 2024

Key

This line was added.
This line was removed.
Formatting was changed.

...

\uD83D\uDC65 Participants

Matt Thompson

...

\uD83D\uDDE3 Discussion topics

...

Time

...

Item

...

Presenter

...

Item
Notes
Benchmarking on GHA
BW will post slides here
Output - Is sqllite database needed?
LM – I trawl through the sqllite database by looking at MM minimized conformers and looking at specific things.
BW – Could export structures somehow?
LM – I also use absolute energies
LW – Can get around size restrictions by uploading workflow artifacts? Those have an expiration date though. 10GB size limit. Eg https://github.com/lilyminium/qca-datasets-report/actions/runs/10097937681
JW – Longer term storage?
MT – Is each run basically using a python script to generate them?
BW – Yes
MT – Using a yaml seems like a nicely extensible way to specify tasks. Will expand well to new capabilities. Though I think that handing off the sqlite database for further analysis is a good idea.
BW – Yeah, and one of the big goals is to systematize how we do data generation.
LW – One thing that’s really nice about this idea is to have a strong view on data provenance.
BW –
JW – I think it’s still a big win to do the physics in a systematic way. (ie OE vs. RDKit, AM1BCC charge provider, OMM versions)
MT – If different inputs could create different results, then that’s a big problem.
BW – This leads to a question of “how do we handle versions of dependencies changing?”
JW – We should have a policy ahead of time about what we do if dependency updates change benchmarking results. This WILL happen and it’s important we have a plan.
BW – I’d started wriiting up a section about how we should never delete or modify previous submissions
JW – Agree
LW – Good argument for Zenodo
BW – So maybe I should look to figure out how to compress the sqlite database.
MT – I liked the idea of using GH artifacts
BW – The time limit was too low
LW: looking it up – “By default, GitHub stores build logs and artifacts for 90 days, and this retention period can be customized.” https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/storing-workflow-data-as-artifacts
LW – Could use zenodo as long term storage. And could look for duplicated info (like coords) between QC records and molecule records.
BW will look into compressing sqllite output and uncompressing it for future analysis
BW will look into automatically uploading results to Zenodo (using test zenodo server initially)
Trello review
https://trello.com/b/dzvFZnv4/infrastructure
(skipped)

✅ Action items

⤴ Decisions

...