2022-07-19 Protein-ligand benchmarks meeting notes

Participants

Goals

DD : fah-alchemy - current board status
- fah-alchemy : Phase 1 - MVP
- 6 weeks out from 9/1 deadline for biopolymer benchmarking
JC : advantages of using data across transformation runs for DG calculation vs. treating them as separate samples
DD : protein-ligand-benchmark - 0.3.0 update
- we’re currently stalled on #52; recommendations for next steps?
  - IA: metadata requirements + fixing caps?
MH : ProtocolSettings taxonomy update
DS : orchestration update
DD : Request rework of Milestones and Deadlines with DN to reflect current state, forward milestones

Discussion topics

Item	Presenter	Notes
`fah-alchemy` - current board status	David Dotson	fah-alchemy : Phase 1 - MVP DD – I’m reviewing #30, will take a bit longer DD – Working on #36, will work with MH and IP to advance this. Related to #12 DD – I’ll keep working on the S3 storage subclass (#11) DD – Also added several items to the “available” column for how to progress other components on this board based on what I learned at SciPy. Happy to 6 weeks out from 9/1 deadline for biopolymer benchmarking JW – This is a bit flexible if needed - So if there are major stresses or places where this is incurring a lot of tech debt for the sake of implementation velocity, I’d be happy to discuss. JC – I could use the Moonshot implementation as soon as it’s ready. DD – JC, I’d like to leave MSKCC1 alone moving forward - Could I set up a small box for testing this?
advantages of using data across transformation runs for DG calculation vs. treating them as separate samples	John Chodera	DD – Previosuly in discussions with OpenFE, JC – For example, in noneq switching, we have this hierarchy concept of “projects/runs/clones/gens”. The original concept was that we’d heavily leverage independent calcs andthen analyze them later by averaging and propagating error. But with noneq switching, our algorithm would be to run ~100 runs at a time, but this would give an incredibly poor estimate. So we should pull these together into a single estimate using BAR. So we should think about a framework where we can run these independently, but combine them, possibly using something with a weighting scheme based on the uncertainty of each. Big picture, I want to make sure we’re not going to be stuck doing trivial averaging, and instead will have flexibility in how this is done. We can’t rely solely on small numbers of replicates, we need to allow for bulk analysis of mutliple things that are happening at the same time andtheir serial extensions IA – Last we discussed this, my 2c was that we should have both - The idea that we should have completely independent experiments that are independent replicates… JC – What’s an “independent experiment” - Is is duplication of nodes or edges? or something else? DS – This is kinda bakes into the soteage stuff - It’s why the labels are really important. The label system we’d discussed before has the idea of multiple keys… The analysis class needs to understand how to interpret labels, but if we do it right then those should be descriptive enough for theanalysis component to understand the relation between items. DD – Within an edge, we can do multiple runs, and we can do extensions/expansion of sampling. I think the idea was, on the duplicate/replicate axis, we’d expose a way to get a dG for each one as if they’re independent experiments. But maybe we should think about how to expose a top-level dG for everything (maybe by pooling it all together and running it rhoguh MBAR) JC – We can still permit replicates, we just need to allow it in parallel. The concern is that we need to have a way to have multiple cloines/replicates to be launched, suspended, and resumed. For noneq cyc, you need to be able to launch 100 clones and repeat each cycle until you have enough data. So you might have 100*3 generations. And then get a single dG from all of them. DD – I suspect that we can support both of these, but we haven’t spent a lot of time figuring out how protocolresults work. So it will depend on developments in some other places like Perses. If we can agree on common api points for resultprotocol objects… RG – This isn’t so much an API question, it’s a science question - Each DAG will produce a single result. So it’s a scientific question, not a technical one. DD – I think the idea is that you could dig down to the replicates themselves RG – The results object takes all the calcs that have been done for an edge and mushes them into a single dG value. DD – I think we’d want to get dG for the entire results, but also for the individual replicates. RG – That could work in some cases, but it may not always make sense, like if you’re doing mean-of-means then you wouldn’t want to look at the individual replicates DD – Will think about this, agree that it may make sense to vary by protocol. DS – Agree that some things make sense in certain protocols but not in others. RG – JC, if I have many replicates in a repex protocol, how much does it make sense to pool those stats to get an answer? JC – It’s always better to put in more trajectories/data. The more you have, the better the estimates. You can do statistical uncertainty multiples to do this. Same thing for TRAM, which is more accurate with multiple replicates. … RG – Does combining multiple time series already exist in MBAR? JC – ComputeStatisticalEfficiencyMultiple. JC – Also FYI there’s a pyMBAR4 release coming DD – Alchemlyb is planning to support this update. LN – This release is expected tomorrow. Just need to get the last few things merged in and cleaning up docs. MH – Will JAX stuff make it in? LN – Yes. This is looking good. LN -- Adorable noises. DD – So, overall, the plan is to expose both approaches, but we’ll make sure to expose a way to combine results across replicas.
`protein-ligand-benchmark` - 0.3.0 update	David Dotson	we’re currently stalled on #52; recommendations for next steps? What are requirements for structures and docked ligands? IA: metadata requirements + fixing caps? IA – Two things holding us up: Capping issues In discussion with bobym, it seems like some chains shouldn’t be capped. Not sure if there will be resolution on this/whether bobym will make additional changes. A lot of things in #52 need small changes, etc renaming hydrogens. Could fix this using MDAnalysis parser. currently loses metadata, however question is whether or not we care, or if we just care about contents of atom map, SEQRES, etc. if so, would prefer this programmatic approach JC – My thinking is that it’d be best to automate this if at all possible. We tried using Spruce but it was tricky. DD – So you’re OK with losing metadata? JC – We need CONECT records to be correct and AA names, and the SSBOND and SEQRES should be right. I don’t think we need box vectors. JW – Agree with JC IA – Sounds good. I’ll just wait on bobym to get back to me on the remianing issues. DD – JC, we could use as much info on manual steps as possible. JC – bobym should have used pretty much automated tools IA – I think I saw an image of this, but I’d like more detail, down to the level of “which maestro version” JC – I’ll set up a meeting this week with IA, bobym, and myself. IP – On #52, we talked about some of the edge info not being in sync with the current state of the dataset. I can help with that. IA – This is in regard to the `01` folders having more ligands and edges. There’s an incoming PR with new edges from LOMAP, and that will delete the current edges file and should stanardize this DD – I’ve made sure that only the ligands that could be docked are present in the ligands.yaml file. But IA is right, a future PR will repopulate the edges given this change. Like I mentioned before, the original edges can’t be reused with the current results because some ligands couldn’t be redocked. So the idea is that new users can use the newly-provided `03-edges`, or they can provide their own. IA – Does perses have its own network generator? BRies put in an interface from OpenFE to perses, and we were going to bring our own edges, but if you have your own we can use it. JC – We don’t have one IA – Query to JC, IP, and MH, any issues with use also creating a set of edges using the Perses mapper/scorer? JC - Sounds good
`ProtocolSettings` taxonomy update	Mike Henry	MH – I’m just about to open a PR to GUFE. I’ve got FF parsing ready. This won’t have atom typing ready for units. There are a lot of people in our ecosystem using pydantic, and it’s be great if we could standardize on using a single units package. This would keep us from needing to write validators for everything to figure out what’s coming to our API point, instead we could just use a single unit package. MH – My first pass used OpenMM units, but the stack was in the process of migrating to OpenFF units. So that caused some confusion. MH – I’d like to make a typing package that, for example would provide types beyond units. So this would provide dimensionality checking at the typing level (?) MH – Concretely, this could ensure that all of our quantities serialize to JSON in the same way JW – If the typing package will be tied to the openff-units package, is there a reason not to put the typing package IN openff-units? MH – This could let us iterate on this rapidly without breaking production. JW – Let’s discuss this separately from this meeting.
Orchestration update	David W.H. Swenson	DS – (Will post material shown here) Used QCF setup docs as inspiration Architecture diagram JC – Does staging area need to be a shared file system? DS – Not necessarily. It could be a variety of things. But the worker will be given a job ID and then connect to the staging area to get the full job data. JC – Is this definitely better than sending all of the data required for the calculation as a DASK task? DS – It’s a question of how to get data BACK from the worker. If we JC – Doesn’t DASK send a results object back at the end of the job? DS – No, DASK doesn’t keep the connection open. JC – I guess I wonder if it’s so hard to implement a DASK manager than we need to engineer an alternative to the work already done in DASK. DS – We could do this using “approach A”. But not all compute nodes will be able to dial in/out to the internet. BR – Is it possible to unlink a DASK server from its workers? DS – I don’t think so. There’s a ping every 60 seconds between the server and the workers. JC – But this… DS – There are many HPC systems where workers/head nodes can’t dial out DD – Seconded - I’ve seen this a lot at pharma partners. The F@H effort doesn’t have this constraint but openFE/GUFE does. JC – I don’t doubt that there are challenges to working with industry clusters, but it seems like a huge lift to re-engineer so much of DASK’s functionality. Also, we’ve had more failures with dask-jobqueue than we’ve had successes. LN – Separately, I’ve worked a lot on the job scheduler for QCF. So I have a good understanding of what it’s doing under the hood. We still have the problem of “sysadmins don’t like you running long jobs on the head node”. But our experience running these in practice is that the process on the head node can be extremely lightweight. And we didn’t see the limitation that it needs to use whole nodes… Happy to discuss this further or advise on design. DS … DD – In working with BSwope on deploying at genetech, we found that the process on the head node did occasionally have spikes in memory usage. So things got a bit messy, we found a compromise where we ran the server as a job, but this had trouble with job lifetime and termination. JC – Agree that there are complications here, but I’d really like to stick with an “off-the-shelf” solution. DS – What I see the staging thing as is two functions: move data from here, move data to there. JC – But it creates additional complexity+assumptions.
Request rework of Milestones and Deadlines with DN to reflect current state, forward milestones	David Dotson

Participants

Goals

Discussion topics

Action items

Decisions

0 Comments