2024-12-16 Westbrook/Wagner/Thompson Check-in meeting notes

Participants

@Brent Westbrook (Unlicensed)
@Jeffrey Wagner
@Matt Thompson

Discussion topics

Item	Notes

Item

Notes

General updates

JW

It’s been great working with you! Best of luck at Astral
- MT – Do you know what you’ll work on?
- BW – Thanks, I’ll be on the Ruff team. Unsure exactly what I’ll be working on.
Account offboarding Fri - When’s a good time?
- (scheduled for 10 AM Pacific / 1 PM Eastern)
I see your major roles on the infrastructure side as being:
- YDS (Matt will take this over)
- YAMMBS testing
- QC compute management (I’ll take this over, will finalize Fri)
- Anything else come to mind?
I have no new infrastructure work for you - Please focus on getting Jen Clark up to speed on your understanding of the organometallics project

YDS handoff

What do you asses as the best way to do a knowledge transfer to MT? Thoughts include:
- Do nothing
- Have MT try doing a YDS run and ask questions
- Anything else?
- (after discussion)
- JW – I don’t think we’ll have big feature requests or anything coming, so “do nothing” isn’t a terrible plan.
  - MT – The two major science-facing uses are 1. I refit a FF and want to see how it compares in these benchmarks, which I think is in good shape, and 2. I want to use a custom dataset for benchmarking, and I’m not sure about how a new dataset would be added. So some documentation about how to add a new dataset would be helpful.
    - BW – I have a readme in the datasets folder that explains this, is that sufficient?
    - MT – Yes, that’s sufficient.
BW – From my perspective, MT has seen most of it during the recent torsion work. MT forked it and duplicated a lot of functionality in this process and so has seen the stack top-to-bottom.
MT – Agree that I’ve seen it, but I wouldn’t be able to make it from scratch. From what I can tell it reuses a lot of Josh’s earlier work. My major handover questions are
- “are there big things we’re planning to add?”
  - MT – my guess is that this is kinda an ongoing discussion with LW with no firm goals yet
  - JW – The only one I can think of is the eventual inclusion of evaluator
  - BW – There’s also the GH page for result visualization https://openforcefield.github.io/yammbs-dataset-submission/ - I think this should keep working without modification - see _layouts/, _config,.yaml index.html. There’s an action that runs every push to master to rebuild the page. See index.html and that it includes.
    - (discussion)
    - (decision) – BW will open a PR to remove these components, so there’s a history in the git tree and we can revert the changes in the future. MT will be tagged to review.
- “what upstream updates may break this and on what timescale?”
  - MT – YAMMBS is the major upstream, but there are lots of other packages that I don’t have much insight into. Maybe big changes in the POSE runners, or general GH configuration changes.
  - BW – Agree with this assessment. Uses GH actions script, which is versioned so hopefully we’ll be able to manually select old versions if there is a break. Also dependence on the Zenodo API, but I found their docs to be high quality and I’m not aware of any plans for this to change.
- “how are we expecting usage to scale and in what dimensions?”
MT – BW, I vaguely remember you had another website project going, will that be continues?
- BW – There was a dashboard where LM and I were looking at correlation. LM and I agreed they weren’t very helpful. I don’t think they’re worth keeping around, and we didn’t find it very useful. The pages were only really served locally, so there’s not like a live webserver to maintain.
  - JW – It sounds like there’s not much to maintain there, and I’ll explicitly say let’s not maintain it.
  - (BW will try to find a link/reference to this for bookkeeping purposes, but won’t spend too long if it’s hard to find)
  - https://github.com/ntBre/lipoma
  - JW – I think this is LW’s responsibility, will bring it up to her
- BW – There’s also https://github.com/ntBre/curato - I was using this to generate datasets from ChemBL and lipidMAPS - but I’m the only user.
  - JW – I think this is LW’s responsibility - I’ll make sure she’s aware of that.
BW – Just so I don’t forget - I wanted to parallelize the metric retrieval methods in YAMMBS - I think that’s one of the last single-threaded parts that’s limiting YDS runtime/moving to a larger image.
- MT – I know what you’re thinking of, but I' havent' gotten to this
- BW – I started a PR where I naively applied multiprocessing, but it got kinda jumbled up with switching to geomeTRIC
- MT – Roughly agree, there’s a good enough paper trail for this so I’m not too worried about knowledge loss. But if there’s a place where you have this (like a branch) please post it somewhere for discoverability later.

Trello

https://trello.com/b/dzvFZnv4/infrastructure?filter=member:brentwestbrook3

2024-12-16 Westbrook/Wagner/Thompson Check-in meeting notes

Participants

Discussion topics

Action items

Decisions