Benchmarking on GHA | BW will post slides here Output - Is sqllite database needed? LM – I trawl through the sqllite database by looking at MM minimized conformers and looking at specific things. BW – Could export structures somehow? LM – I also use absolute energies
LW – Can get around size restrictions by uploading workflow artifacts? Those have an expiration date though. 10GB size limit. Eg https://github.com/lilyminium/qca-datasets-report/actions/runs/10097937681 MT – Is each run basically using a python script to generate them? BW – Yes MT – Using a yaml seems like a nicely extensible way to specify tasks. Will expand well to new capabilities. Though I think that handing off the sqlite database for further analysis is a good idea. BW – Yeah, and one of the big goals is to systematize how we do data generation. LW – One thing that’s really nice about this idea is to have a strong view on data provenance. BW – JW – I think it’s still a big win to do the physics in a systematic way. (ie OE vs. RDKit, AM1BCC charge provider, OMM versions) MT – If different inputs could create different results, then that’s a big problem. BW – This leads to a question of “how do we handle versions of dependencies changing?” JW – We should have a policy ahead of time about what we do if dependency updates change benchmarking results. This WILL happen and it’s important we have a plan. BW – I’d started wriiting up a section about how we should never delete or modify previous submissions JW – Agree LW – Good argument for Zenodo
BW – So maybe I should look to figure out how to compress the sqlite database. MT – I liked the idea of using GH artifacts BW – The time limit was too low LW – Could use zenodo as long term storage. And could look for duplicated info (like coords) between QC records and molecule records.
BW will look into compressing sqllite output and uncompressing it for future analysis BW will look into automatically uploading results to Zenodo (using test zenodo server initially)
|