2024-12-16 Westbrook/Wagner/Thompson Check-in meeting notes

Participants

  • @Brent Westbrook (Unlicensed)

  • @Jeffrey Wagner

  • @Matt Thompson

Discussion topics

Item

Notes

Item

Notes

General updates

JW

  • It’s been great working with you! Best of luck at Astral

    • MT – Do you know what you’ll work on?

    • BW – Thanks, I’ll be on the Ruff team. Unsure exactly what I’ll be working on.

  • Account offboarding Fri - When’s a good time?

    • (scheduled for 10 AM Pacific / 1 PM Eastern)

  • I see your major roles on the infrastructure side as being:

    • YDS (Matt will take this over)

    • YAMMBS testing

    • QC compute management (I’ll take this over, will finalize Fri)

    • Anything else come to mind?

  • I have no new infrastructure work for you - Please focus on getting Jen Clark up to speed on your understanding of the organometallics project

YDS handoff

  • What do you asses as the best way to do a knowledge transfer to MT? Thoughts include:

    • Do nothing

    • Have MT try doing a YDS run and ask questions

    • Anything else?

    • (after discussion)

    • JW – I don’t think we’ll have big feature requests or anything coming, so “do nothing” isn’t a terrible plan.

      • MT – The two major science-facing uses are 1. I refit a FF and want to see how it compares in these benchmarks, which I think is in good shape, and 2. I want to use a custom dataset for benchmarking, and I’m not sure about how a new dataset would be added. So some documentation about how to add a new dataset would be helpful.

        • BW – I have a readme in the datasets folder that explains this, is that sufficient?

        • MT – Yes, that’s sufficient.

  • BW – From my perspective, MT has seen most of it during the recent torsion work. MT forked it and duplicated a lot of functionality in this process and so has seen the stack top-to-bottom.

  • MT – Agree that I’ve seen it, but I wouldn’t be able to make it from scratch. From what I can tell it reuses a lot of Josh’s earlier work. My major handover questions are

    • “are there big things we’re planning to add?”

      • MT – my guess is that this is kinda an ongoing discussion with LW with no firm goals yet

      • JW – The only one I can think of is the eventual inclusion of evaluator

      • BW – There’s also the GH page for result visualization https://openforcefield.github.io/yammbs-dataset-submission/ - I think this should keep working without modification - see _layouts/, _config,.yaml index.html. There’s an action that runs every push to master to rebuild the page. See index.html and that it includes.

        • (discussion)

        • (decision) – BW will open a PR to remove these components, so there’s a history in the git tree and we can revert the changes in the future. MT will be tagged to review.

        •  

    • “what upstream updates may break this and on what timescale?”

      • MT – YAMMBS is the major upstream, but there are lots of other packages that I don’t have much insight into. Maybe big changes in the POSE runners, or general GH configuration changes.

      • BW – Agree with this assessment. Uses GH actions script, which is versioned so hopefully we’ll be able to manually select old versions if there is a break. Also dependence on the Zenodo API, but I found their docs to be high quality and I’m not aware of any plans for this to change.

    • “how are we expecting usage to scale and in what dimensions?”

  • MT – BW, I vaguely remember you had another website project going, will that be continues?

    • BW – There was a dashboard where LM and I were looking at correlation. LM and I agreed they weren’t very helpful. I don’t think they’re worth keeping around, and we didn’t find it very useful. The pages were only really served locally, so there’s not like a live webserver to maintain.

      • JW – It sounds like there’s not much to maintain there, and I’ll explicitly say let’s not maintain it.

      • (BW will try to find a link/reference to this for bookkeeping purposes, but won’t spend too long if it’s hard to find)

      • https://github.com/ntBre/lipoma

      • JW – I think this is LW’s responsibility, will bring it up to her

    • BW – There’s also https://github.com/ntBre/curato - I was using this to generate datasets from ChemBL and lipidMAPS - but I’m the only user.

      • JW – I think this is LW’s responsibility - I’ll make sure she’s aware of that.

      •  

  • BW – Just so I don’t forget - I wanted to parallelize the metric retrieval methods in YAMMBS - I think that’s one of the last single-threaded parts that’s limiting YDS runtime/moving to a larger image.

    • MT – I know what you’re thinking of, but I' havent' gotten to this

    • BW – I started a PR where I naively applied multiprocessing, but it got kinda jumbled up with switching to geomeTRIC

    • MT – Roughly agree, there’s a good enough paper trail for this so I’m not too worried about knowledge loss. But if there’s a place where you have this (like a branch) please post it somewhere for discoverability later.

    •  

Trello

https://trello.com/b/dzvFZnv4/infrastructure?filter=member:brentwestbrook3

Action items

Decisions