2023-02-21 Protein-ligand benchmarks meeting notes

Participants

  • @Richard Gowers

  • Jenke Scheen

  • @Jeffrey Wagner

  • @John Chodera

  • @David W.H. Swenson

  • @Mike Henry

  • @David Dotson

  • Levi Naden

Recording: https://drive.google.com/file/d/1nt1srVmyvb1Y6Vrc_iAsCjyWZf2n2X8a/view?usp=share_link

Goals

  • RG : non-beta version of gufe

  • DD : sprint retrospective

    • Review Done cards

      • what went well?

      • what didn’t?

      • what do we need to improve our approach?

  • DD : next sprint begins tomorrow, spans 2/22 - 3/6

    • architecture overview : https://drive.google.com/file/d/1ZA-zuqrhKSlYBEiAIqxwNaHXvgJdlOkT/view?usp=share_link

    • alchemiscale 0.1.0 milestone

    • coordination board : alchemiscale : Phase 1 - MVP

    • updates on In Review, In Progress, and Available cards

    • create/nominate new cards for inclusion in this sprint

    • will post on #free-energy-benchmarking when next sprint is finalized

Discussion topics

Notes

Notes

  • RG : non-beta version of gufe. Will publicly declare the API is stable. Expect around next month. So this is a warning for people interfacing with it - If you need breaks, let us know in the next month. Then we’ll do a 2.0 in about a year where we can break it again.

    • JC – Do you think we’ll have kicked the tires enough by then?

    • RG – No, we expect that there will be problems, but it’s important to get something published without a leading 0.X

    • DS – And recall that there is expected to be a break in 8-12 months. We know that there will be structural problems and those will be annoying, but we need to move forward.

    • JW + DD – This sounds like a good plan

  • DD : sprint retrospective

    • Review Done cards

      • DD – Alchemiscale 89 – HMO knocked this out quickly. Mostly cosmetic.

      • DD – Alchemiscale 80 – Big move from taskqueues to taskhubs. Simplified our model for how tasks are actioned - Instead of linked lists where elements can be members of multiple linked lists, now tasks are just members of taskhubs. This makes it easy to make and break relationships and should reduce the number of ways we can get into an inconsistent state. Do note that this removed determinism - When things were a linked list, there was a deterministic order that things would be run, but now with taskhubs it’s no longer deterministic.

      • DD – Alchemiscale 39 – Created an example notebook, and migrated to OpenFreeEnergy/ExampleNotebooks (OFE/EN #36)

        • RG – I may switch over to a repex protocol to simplify things and remove OE requirement.

        • DD – Re: Perses OE requirement, is this a hard requirement

        • JC – The OE backend is needed for fragment mapping, MCSS tools, other funcitonality. Was hoping to move enough functionality into OpenFF toolkit to replace this in the long run.

        • JW – Seems out of scope

        • JC – would open up opportunities for many uses beyond FF application

        • MH – In perfect world we’d use OFFTK as much as possible, and then hop into RDKit+OE when we need something else. So I see it initially being as RDK/OE code, and then

        • JC – can go from RDKit small molecules to OE; this gets hard when you’re dealing with fragments of molecules, including proteins; OE makes it fairly straightforward

          • warrants a conversation on whether this fits into OpenFF purview

        • JW – OpenFF does have a vision, and it’s pretty clearly scoped around FF application

          • we currently have both RDKit and OE backends, and we have to build variants of features for each; do our best to smooth out differences, but of course these are different packages so they do give different behavior

          • JC – what if that’s relaxed? As in, is it okay for different behavior in use of the two backends within the toolkit?

          • JW – I would love to have a fully-usable RDKit-based OpenFF Toolkit; separate from OpenEye-based

          • JC – may be worth checking in with OE contact; not really clear they care if you are using OE underneath the hood; cares more about quality FFs

        • JC – We really want an open topology format that can interface with PDB format and other formats, and can go back and forth between identities and mappings. So I’m hoping that in the great refactoring after the next release, we can avoid a lot of these issues.

        • RG – My vision for the long-term vision of GUFE is to convert into RDKit to do the MCS stuff. I think trying to provide a wrapper for MCS will be super complex, instead I’d rather have it

        • JW – So would you ever offer a OE-only functionality? Or would everything have an RDK equivalent first?

        • DS – GUFE mols are just a wrapper around RDKit mols.

        • RG – Right, it looks like RDKit can do everything we need, even for proteins

        • JC – I’d love to chat about how this could be done with RDKit, including how to do partial mappings involving proteins.

        • MH will put together a meeting for how to do atom mappings and MCS using RDKit, inviting RG, JC, IP, MH, and JW.

        • JW – Just want to make sure that the basic functionality of alchemiscale won’t have a hard OE dependence.

        • JC – This should be the case for OpenFF and OpenFE, we’ll discuss in this meeting how to get Perses/other deps away from hard OE deps.

      • DD – GUFE 128 – Looking good.

        • RG – We should do an audit to make sure everything is serializable.

    • what went well?

    • what didn’t?

      • DD – I’m the bottleneck on a lot of reviews. I’d love help reviewing. HMO is probabaly getting into a position where I can tag him in for reviews. Would anyone else be available?

      • MH – I could get involved, feel free to tag me in.

      • DD – Great, I think we’re on the home stretch for milestone 0.1.

    • what do we need to improve our approach?

  • DD : next sprint begins tomorrow, spans 2/22 - 3/6

    • architecture overview : https://drive.google.com/file/d/1ZA-zuqrhKSlYBEiAIqxwNaHXvgJdlOkT/view?usp=share_link

    • alchemiscale 0.1.0 milestone

    • coordination board : alchemiscale : Phase 1 - MVP

    • updates on In Review, In Progress, and Available cards

      • JC – Perses 1128 – I’m working on that

      • DD – Alchemiscale 83 – I think this is done, waiting on HMO to click merge.

      • DD – Alchemiscale 94 – I had some feedback for HMO. The point is that, if the user adds like hundreds of tasks, they’re executed correctly. This ensures that a task that extends another task will only be run if the task it extends is complete.

      • MH – Alchemiscale 93 – There are a few ways to organize the logging. I think log streams are the way to do ti, I showed an example of how to do this in the discussion, including container ID in the log streams.

        • DD – This looks like a good approach.

        • JC – 2 Qs:

          • IS it useful to have a single UUID that’s associated with the startup instance? Or can they be started and stopped independently?

            • DD – No, you can scale individual jobs

          • Also, the log output has the completion time showed by default, would it aslo help to have an initiation time?

            • MH – I don’t think so, normally I’m always looking at the most recent log event.

          • JW – I’d prefer not to have slashes in the filename (though I understand if there’s some significance to that on this hosting service)

            • MH – Ok, this is a common thing I see other people doing, I’ll dig a little into this to make sure that this is a best practice/doesn’t mangle usage.

        • DD – Alchemiscale 85 – Task lifecycle. Any comments?

        • – Perses 1066 – IP not in attendance, no update

        • DD – Alchemiscale 34 – I have yet to raise a specifc PR to finish out the behavior of synchronous compute service. I have yet to begin on this.

          • JW – Is there another ComputeService that’s already implemented?

          • DD – We’ll make AsyncComputeService later but it’ll be more complex. SyncCompServ will be the reference implementation asnd we’ll build it first.

          • JC – There may be an iossue where multiple GPUs are available, would there be seaprate processes with different CUDA masks, or several gpus in a single process pool…? It would be good if tehby could see multiple GPUIs if they’re available.

            • DD – If you’re using an HPC system, the GPUs are exposed when the process lands.

            • JC – So each process will see one GPU? This may be different on cloud.

            • DD – It’d be nice to be able to do one service per box. This is on my plate to resolve.

        • JC – Perses 1128 – If we can rip out OE entirely then that will change how we can pursue this.

        • DD – OFE/EN 36 – RG, can I assign you to this? Not super urgent but would be good to get this out.

          • RG – Yes, I’ll self-assign and will do it.

          • DD – This notebook doesn’t have any use of atom mappings. I’d tried to use a perses protocol and it needed mappings. I wasn’t sure oabout the best way to make a bunch of mappings for the system here (TYK2), maybe you can pull them from the PLD or OFE/benchmarks repo.

        • DD – I’m on the hook for determining whether we can refer only to edge properties in perses protocol… Still working on this

        • DD – Alchemiscale 88 – Simple fix to regex matching.

    • create/nominate new cards for inclusion in this sprint

    • will post on #free-energy-benchmarking when next sprint is finalized

 



Action items

Decisions