/
2022-05-31 Protein-ligand benchmarks meeting notes

2022-05-31 Protein-ligand benchmarks meeting notes

Participants

  • @Irfan Alibay

  • @Richard Gowers

  • @Mike Henry

  • @John Chodera

  • David Swenson

  • @David Dotson

  • @Diego Nolasco (Deactivated)

  • @Jeffrey Wagner

  • @Iván Pulido

  • @Lily Wang

  • Antonia Mey

Goals

  • DD : protein-ligand-benchmark - 0.3.0 milestone review, issue assistance, rebalance

    • current main ready for 0.2.1 release?

    • deadline EOW for 0.3.0 completion; working session for #52 needed?

  • DD : fah-alchemy project backlog prioritization

  • DS : ResultStore working group update

  • MH : ProtocolSettings taxonomy working group update

Discussion topics

Item

Notes

Item

Notes

protein-ligand-benchmark - 0.3.0 milestone review

  • DD : , issue assistance, rebalance

    • current main ready for 0.2.1 release?

      • Decision: No objection, ready for release

    • deadline EOW for 0.3.0 completion; working session for #52 needed?

      • DD – Happy to have a working session with JC and bobym on cleanups. JC is having tech issues now.

        • JC (later) – I’d be happy to hop on a call for this. Different sets (JACS vs schindler/Merck vs others) had different issues. I have no time to meet this week but you could meet with bobym to move things forward.

        • DD – What issues should I look at?

        • JC – There are issues with

          • having no docked poses and other clash-related things

          • Protonation issues

          • Some ligands unable to be docked with core constraints

        • It’d also be find to check these post-merge. These criteria can be relaxed for Janssen sets since those targets are challenging. But JACS/schrodinger sets are already docked and we should be strict about them.

        • DD has left a comment on the PR summarizing next steps.

      • IA – I made a request for info on #52 that isn’t fulfilled - At this time I can’t verify that the contents have provenance. Can anyone clarify that?

        • DD – It’s in the OpenFF best practices repo. I’ll link it in the discussion of #52.

        • RG – Thanks. bobym mentioned that parts of the PR contents were sources from the manuscript, but I didn’t klnow which manuscript. So this should resolve that blocker.

      • IA – We had wanted to do CI-level checks using MDAnalysis and BioSimSpace. But both of those are GPL, so we’d need the action to happen outside the repo.

        • (General) – Are we forbidden from using GPL code in our CI? Is there a difference between distributing code and using it in CI?

        • IA – I talked with numfocus lawyers and they said that we shouldn’t test using GPL.

        • JC + JW + RG – This seems ambiguous, but we’d be comfortable testing with GPL code

        • IA – I’ll go ahead and set up CI in the repo then.

        •  

        •  

fah-alchemy project backlog prioritization

  • fah-alchemy : Phase 1 - MVP

  • identify appropriate place for Executor, ProtocolDAGResult versioned schema

    • DD – I was hoping that we could generalize the F@H executor with OpenFE, but it seems like OpenFE’s goals are to have a smaller in-process executor, whereas the F@H executor may need larger-scale things like a RESTful API.

    • RG – We’re not sure which executor architecture we’ll use at this time.

    • JC – It seems like this is a clear target for standardization. So the F@H executor could be a subclass of the executor API. Are there reasons this couldn’t happen?

      • RG – I think the nature of the implementation of the executor is separate from the chemistry of the problem. So this would go outside GUFE.

      • JC – This does seem like a critical part of the “grand unified” part. Like, having a serial implementation would be in scope for living in GUFE.

      • RG – We’re pretty focused on something DASK or DASK-like.

      • DD – For F@H alchemy, we need to have a concept of access permissions, users, roles, etc. That’s not something that OpenFE will need.

      • RG – Right, that’s not chemistry.

      • JC – The user roles+auth stuff can live above the executor, though

      • DD –

      • JC – Executor takes in…

      • DD – Multiple compute hosts could subscribe to the same executor. For fah-alchemy, I’m interested in moving forward with something that meets our known needs.

      • JC – … Is there an example of how the code would be calling each other? What will the API and component cross-talk look like?

      • DD + RG – Let’s meet later this week to better understand this, and whether it can fit in OpenFE.

      • JC – I do think that the separation of chemistry and technical details is reasonable…

      • RG – ….

      • RG and DD will meet later this week to get a better sense of these details.

    • DD – Results storage+format. What can we plan to do now, how can we load old results?

      • JC - I wouldn’t worry too much about this, final results are what really matter, next most useful data are the snapshots, and then the final tier (and my not be able to be maintained) is the raw data from the intermediate steps

    •  

      • DD – Are you advocating for storage of intermediate results?

      • JC – No, I think results should be ephemeral, and after some time it’s reasonable to require regeneration.

      • DD – This is a similar conclusion to a discussion I had with DSwenson earlier. There’s a lot of trust placed on protocol authors to handle the complexity between the input and output.

      • JC – Sounds great. Just make sure there’s a way for important error information/detailed run data to be recoverable.

      • DD – Sounds good.

  • DD – Prioritization of items on this board? Should any of these come first?

    • JC – I’d switch “hamiltonian replica exchange” to “nonequilibrium swtiching”. The latter is way more tractable on F@H. Also, independent lambdas is way easier than times square.

    • DD – Agree

  •  

ResultStore working group update

  • DS – I’ll send out a note on slack to find a time for the ResultsStore working group meeting. Right now, OpenFE has a PR up that addresses the raw files that come out of simulations. So this is something that needs to be agreed upon between different projects. We need a way that addresses information, probably like a file path, that could map to a local file, S3 store, hdf5 file content, etc. This is something that should be agreed upon beforehand but can change over time.

    • DD – I’m making an S3 external storage subclass in GUFE.

    • https://docs.google.com/document/d/1wcTy4TqBIGbxdt9p2reVTY4J5iKAzF673DQsZeZ8vg8/edit#heading=h.oluheussxfp2

    •  

MH : ProtocolSettings taxonomy working group update

  • MH – I’m working on setting up a meeting with MThompson. After that meeting I’ll loop in a larger group.

 

  • RG – What limits a protocol from being used on F@H? If that were clearer I’d be able to understand how much of the project board can be done outside of F@H.

  • DD – One concept right now is that we have to solve how we store a registry of protocols. This will be important to ensure that we get the same thing when serializing and deseriailizing. This is an important component of storing the DAG.

    • RG – This would basically be monkeypatching the namespace of the protocol.

    • JC – This is ultimately related to the science of how we decompose the science into pieces. I’d love to see a world where we decouple the protocol from the engine being used to compute.

    • DD – …

    • RG – I’m looking at these protocols from the perspective of: how can I present this to our board in a way that makes the overlap clear.

    • DD – The big question is where to have the separation between protocols and schedulers.

 

  • JW – Are any user stories declared as in- or out-of-scope?

    • DD – I’ve grouped 6 of the 10 user stories into milestones for OpenFF’s needs and ASAP’s needs.

    • JW – Thanks for doing this. I’ll review the other stories and let you know next week if I’d propose any changes.

Action items

Decisions

Related pages