2024-08-12 Westbrook/Wagner Check-in meeting notes

Participants

  • @Brent Westbrook

  • @Jeffrey Wagner

Discussion topics

Item

Notes

Item

Notes

General updates

  • JW

    • I’ll be offline next Monday

    • QCSubmit RMSD filter bugfix may be quite significant - could be worth re-filtering when the new release comes out. Want to see how we make software releases?

  • BW

    • Slides about benchmarking plans

    • BW will upload slides here:

    • BW – On HPC3 I usually request 48 CPUs and about 50GB RAM, takes 4ish hours to do a YAMMBS run with standard datasets.

    • JW – May be simpler to just do the jobs on GHA - Looks like 64 CPU machines are about $0.25/min, so for 4 hours this would be about $60. That makes a lot of sense since human-time is expensive. So lets' plan to start on GHA.

    • BW – Where to store files? Git tree? Release assets?

      • BW – It looks like a standard YAMMBS run can have its output CSVs compressed down to 6ish MB.

      • JW – Let’s just put those in git tree then

    • JW – Versions/provenance? Could have it take a conda environment yaml.

      • BW – That could work. Though if we’re interested in making this standardized, it may make more sense to have env pinned for all benchmarks.

      • JW – Should definitely DUMP an env summary with results…. Thinking about how folks can eventually do tests against different versions of things, or using custom functional forms, or ambertools vs. openeye vs. nagl…

      • BW – Yeah, so maybe allowing an input env yaml would be good here.

      • JW – Agree. And instead of allowing, how about “requiring”

      • JW will double check with lead team that this spend is OK

    • BW – I think we should have the tool be able to take pre-fetched-and-filtered datasets. Currently charge filtering takes 34 hours.

      • (decision) – Initially the repo will just host some pre-made cached datasets. This will encourage standardization, and folks can further filter things out of the results if some data points weren’t appropriate, or add more datasets if that becomes of interest.

    • JW – Could be good to have all the result files committed in a branch, NOT in main, so that folks can iterate without gumming up the git history. Then the final merge could squish everything into main.

      • BW – Good idea. I’ll see how I can stage it

    • BW – I’ll start work on a proof of concept. I think I’ve got what I need for this. I’ll put this on the internal benchmarking agenda if it’s ready in time.

    •  

    •  

Trello

https://trello.com/b/dzvFZnv4/infrastructure

Action items

Decisions