Skip to end of metadata
Go to start of metadata
Participants
Discussion topics
Item | Notes |
---|
General updates | JW BW Slides about benchmarking plans BW will upload slides here: BW – On HPC3 I usually request 48 CPUs and about 50GB RAM, takes 4ish hours to do a YAMMBS run with standard datasets. JW – May be simpler to just do the jobs on GHA - Looks like 64 CPU machines are about $0.25/min, so for 4 hours this would be about $60. That makes a lot of sense since human-time is expensive. So lets' plan to start on GHA. BW – Where to store files? Git tree? Release assets? JW – Versions/provenance? Could have it take a conda environment yaml. BW – That could work. Though if we’re interested in making this standardized, it may make more sense to have env pinned for all benchmarks. JW – Should definitely DUMP an env summary with results…. Thinking about how folks can eventually do tests against different versions of things, or using custom functional forms, or ambertools vs. openeye vs. nagl… BW – Yeah, so maybe allowing an input env yaml would be good here. JW – Agree. And instead of allowing, how about “requiring” JW will double check with lead team that this spend is OK
BW – I think we should have the tool be able to take pre-fetched-and-filtered datasets. Currently charge filtering takes 34 hours. … (decision) – Initially the repo will just host some pre-made cached datasets. This will encourage standardization, and folks can further filter things out of the results if some data points weren’t appropriate, or add more datasets if that becomes of interest.
JW – Could be good to have all the result files committed in a branch, NOT in main, so that folks can iterate without gumming up the git history. Then the final merge could squish everything into main. BW – I’ll start work on a proof of concept. I think I’ve got what I need for this. I’ll put this on the internal benchmarking agenda if it’s ready in time.
|
Trello | https://trello.com/b/dzvFZnv4/infrastructure |
Action items
Decisions
Add Comment