2024-08-14 Meeting notes

 Date

Aug 14, 2024

 Participants

  • @Matt Thompson

  • @Brent Westbrook

  • @Alexandra McIsaac

  • @Lily Wang

  • @Jeffrey Wagner

 Discussion topics

Item

Notes

Item

Notes

Benchmarking on GHA

  • BW will post slides here

  • Output - Is sqllite database needed?

    • LM – I trawl through the sqllite database by looking at MM minimized conformers and looking at specific things.

    • BW – Could export structures somehow?

    • LM – I also use absolute energies

  • LW – Can get around size restrictions by uploading workflow artifacts? Those have an expiration date though. 10GB size limit. Eg https://github.com/lilyminium/qca-datasets-report/actions/runs/10097937681

    • JW – Longer term storage?

  • MT – Is each run basically using a python script to generate them?

    • BW – Yes

    • MT – Using a yaml seems like a nicely extensible way to specify tasks. Will expand well to new capabilities. Though I think that handing off the sqlite database for further analysis is a good idea.

    • BW – Yeah, and one of the big goals is to systematize how we do data generation.

    • LW – One thing that’s really nice about this idea is to have a strong view on data provenance.

    • BW –

    • JW – I think it’s still a big win to do the physics in a systematic way. (ie OE vs. RDKit, AM1BCC charge provider, OMM versions)

      • MT – If different inputs could create different results, then that’s a big problem.

      • BW – This leads to a question of “how do we handle versions of dependencies changing?”

      • JW – We should have a policy ahead of time about what we do if dependency updates change benchmarking results. This WILL happen and it’s important we have a plan.

      • BW – I’d started wriiting up a section about how we should never delete or modify previous submissions

      • JW – Agree

      • LW – Good argument for Zenodo

    • BW – So maybe I should look to figure out how to compress the sqlite database.

    • MT – I liked the idea of using GH artifacts

    • BW – The time limit was too low

    • LW – Could use zenodo as long term storage. And could look for duplicated info (like coords) between QC records and molecule records.

  • BW will look into compressing sqllite output and uncompressing it for future analysis

  • BW will look into automatically uploading results to Zenodo (using test zenodo server initially)

    •  

  •  

  •  

Trello review

https://trello.com/b/dzvFZnv4/infrastructure

(skipped)

 Action items

 Decisions