YDS handoff | What do you asses as the best way to do a knowledge transfer to MT? Thoughts include: Do nothing Have MT try doing a YDS run and ask questions Anything else? (after discussion) JW – I don’t think we’ll have big feature requests or anything coming, so “do nothing” isn’t a terrible plan. MT – The two major science-facing uses are 1. I refit a FF and want to see how it compares in these benchmarks, which I think is in good shape, and 2. I want to use a custom dataset for benchmarking, and I’m not sure about how a new dataset would be added. So some documentation about how to add a new dataset would be helpful. BW – I have a readme in the datasets folder that explains this, is that sufficient? MT – Yes, that’s sufficient.
BW – From my perspective, MT has seen most of it during the recent torsion work. MT forked it and duplicated a lot of functionality in this process and so has seen the stack top-to-bottom. MT – Agree that I’ve seen it, but I wouldn’t be able to make it from scratch. From what I can tell it reuses a lot of Josh’s earlier work. My major handover questions are “are there big things we’re planning to add?” MT – my guess is that this is kinda an ongoing discussion with LW with no firm goals yet JW – The only one I can think of is the eventual inclusion of evaluator BW – There’s also the GH page for result visualization https://openforcefield.github.io/yammbs-dataset-submission/ - I think this should keep working without modification - see _layouts/, _config,.yaml index.html. There’s an action that runs every push to master to rebuild the page. See index.html and that it includes.
“what upstream updates may break this and on what timescale?” MT – YAMMBS is the major upstream, but there are lots of other packages that I don’t have much insight into. Maybe big changes in the POSE runners, or general GH configuration changes. BW – Agree with this assessment. Uses GH actions script, which is versioned so hopefully we’ll be able to manually select old versions if there is a break. Also dependence on the Zenodo API, but I found their docs to be high quality and I’m not aware of any plans for this to change.
“how are we expecting usage to scale and in what dimensions?”
MT – BW, I vaguely remember you had another website project going, will that be continues? BW – There was a dashboard where LM and I were looking at correlation. LM and I agreed they weren’t very helpful. I don’t think they’re worth keeping around, and we didn’t find it very useful. The pages were only really served locally, so there’s not like a live webserver to maintain. JW – It sounds like there’s not much to maintain there, and I’ll explicitly say let’s not maintain it. (BW will try to find a link/reference to this for bookkeeping purposes, but won’t spend too long if it’s hard to find) https://github.com/ntBre/lipoma JW – I think this is LW’s responsibility, will bring it up to her
BW – There’s also https://github.com/ntBre/curato - I was using this to generate datasets from ChemBL and lipidMAPS - but I’m the only user.
BW – Just so I don’t forget - I wanted to parallelize the metric retrieval methods in YAMMBS - I think that’s one of the last single-threaded parts that’s limiting YDS runtime/moving to a larger image. MT – I know what you’re thinking of, but I' havent' gotten to this BW – I started a PR where I naively applied multiprocessing, but it got kinda jumbled up with switching to geomeTRIC MT – Roughly agree, there’s a good enough paper trail for this so I’m not too worried about knowledge loss. But if there’s a place where you have this (like a branch) please post it somewhere for discoverability later.
|