Item | Presenter | Notes |
---|
fah-alchemy - current board status
| David Dotson | fah-alchemy : Phase 1 - MVP DD – I’m reviewing #30, will take a bit longer DD – Working on #36, will work with MH and IP to advance this. Related to #12 DD – I’ll keep working on the S3 storage subclass (#11) DD – Also added several items to the “available” column for how to progress other components on this board based on what I learned at SciPy. Happy to
6 weeks out from 9/1 deadline for biopolymer benchmarking JW – This is a bit flexible if needed - So if there are major stresses or places where this is incurring a lot of tech debt for the sake of implementation velocity, I’d be happy to discuss. JC – I could use the Moonshot implementation as soon as it’s ready.
DD – JC, I’d like to leave MSKCC1 alone moving forward - Could I set up a small box for testing this?
|
advantages of using data across transformation runs for DG calculation vs. treating them as separate samples | John Chodera | DD – Previosuly in discussions with OpenFE, JC – For example, in noneq switching, we have this hierarchy concept of “projects/runs/clones/gens”. The original concept was that we’d heavily leverage independent calcs andthen analyze them later by averaging and propagating error. But with noneq switching, our algorithm would be to run ~100 runs at a time, but this would give an incredibly poor estimate. So we should pull these together into a single estimate using BAR. So we should think about a framework where we can run these independently, but combine them, possibly using something with a weighting scheme based on the uncertainty of each. Big picture, I want to make sure we’re not going to be stuck doing trivial averaging, and instead will have flexibility in how this is done. We can’t rely solely on small numbers of replicates, we need to allow for bulk analysis of mutliple things that are happening at the same time andtheir serial extensions IA – Last we discussed this, my 2c was that we should have both - The idea that we should have completely independent experiments that are independent replicates… JC – What’s an “independent experiment” - Is is duplication of nodes or edges? or something else? DS – This is kinda bakes into the soteage stuff - It’s why the labels are really important. The label system we’d discussed before has the idea of multiple keys… The analysis class needs to understand how to interpret labels, but if we do it right then those should be descriptive enough for theanalysis component to understand the relation between items. DD – Within an edge, we can do multiple runs, and we can do extensions/expansion of sampling. I think the idea was, on the duplicate/replicate axis, we’d expose a way to get a dG for each one as if they’re independent experiments. But maybe we should think about how to expose a top-level dG for everything (maybe by pooling it all together and running it rhoguh MBAR) JC – We can still permit replicates, we just need to allow it in parallel. The concern is that we need to have a way to have multiple cloines/replicates to be launched, suspended, and resumed. For noneq cyc, you need to be able to launch 100 clones and repeat each cycle until you have enough data. So you might have 100*3 generations. And then get a single dG from all of them. DD – I suspect that we can support both of these, but we haven’t spent a lot of time figuring out how protocolresults work. So it will depend on developments in some other places like Perses. If we can agree on common api points for resultprotocol objects… RG – This isn’t so much an API question, it’s a science question - Each DAG will produce a single result. So it’s a scientific question, not a technical one. DD – I think the idea is that you could dig down to the replicates themselves RG – The results object takes all the calcs that have been done for an edge and mushes them into a single dG value. DD – I think we’d want to get dG for the entire results, but also for the individual replicates. RG – That could work in some cases, but it may not always make sense, like if you’re doing mean-of-means then you wouldn’t want to look at the individual replicates DD – Will think about this, agree that it may make sense to vary by protocol. DS – Agree that some things make sense in certain protocols but not in others. RG – JC, if I have many replicates in a repex protocol, how much does it make sense to pool those stats to get an answer? JC – It’s always better to put in more trajectories/data. The more you have, the better the estimates. You can do statistical uncertainty multiples to do this. Same thing for TRAM, which is more accurate with multiple replicates. … RG – Does combining multiple time series already exist in MBAR? JC – ComputeStatisticalEfficiencyMultiple.
JC – Also FYI there’s a pyMBAR4 release coming DD – Alchemlyb is planning to support this update. LN – This release is expected tomorrow. Just need to get the last few things merged in and cleaning up docs. MH – Will JAX stuff make it in? LN – Yes. This is looking good. LN -- Adorable noises.
DD – So, overall, the plan is to expose both approaches, but we’ll make sure to expose a way to combine results across replicas.
|
protein-ligand-benchmark - 0.3.0 update
| David Dotson | we’re currently stalled on #52; recommendations for next steps? What are requirements for structures and docked ligands? IP – On #52, we talked about some of the edge info not being in sync with the current state of the dataset. I can help with that. IA – This is in regard to the 01 folders having more ligands and edges. There’s an incoming PR with new edges from LOMAP, and that will delete the current edges file and should stanardize this DD – I’ve made sure that only the ligands that could be docked are present in the ligands.yaml file. But IA is right, a future PR will repopulate the edges given this change. Like I mentioned before, the original edges can’t be reused with the current results because some ligands couldn’t be redocked. So the idea is that new users can use the newly-provided 03-edges , or they can provide their own.
IA – Does perses have its own network generator? BRies put in an interface from OpenFE to perses, and we were going to bring our own edges, but if you have your own we can use it. JC – We don’t have one IA – Query to JC, IP, and MH, any issues with use also creating a set of edges using the Perses mapper/scorer? JC - Sounds good
|
ProtocolSettings taxonomy update
| Mike Henry | MH – I’m just about to open a PR to GUFE. I’ve got FF parsing ready. This won’t have atom typing ready for units. There are a lot of people in our ecosystem using pydantic, and it’s be great if we could standardize on using a single units package. This would keep us from needing to write validators for everything to figure out what’s coming to our API point, instead we could just use a single unit package. MH – My first pass used OpenMM units, but the stack was in the process of migrating to OpenFF units. So that caused some confusion. MH – I’d like to make a typing package that, for example would provide types beyond units. So this would provide dimensionality checking at the typing level (?) MH – Concretely, this could ensure that all of our quantities serialize to JSON in the same way JW – If the typing package will be tied to the openff-units package, is there a reason not to put the typing package IN openff-units?
|
Orchestration update | David W.H. Swenson | |
Request rework of Milestones and Deadlines with DN to reflect current state, forward milestones | David Dotson | |
Add Comment