2023-08-29 alchemiscale Working Group meeting notes

Participants

  • @David Dotson

  • @Richard Gowers

  • @Irfan Alibay

  • @David W.H. Swenson

  • Ian Kenney

  • @Iván Pulido

  • Jenke Scheen

  • @Mike Henry

  • @Jeffrey Wagner

  • Meghan Osato

  • @John Chodera

Recording: https://drive.google.com/file/d/1lYJIG-FMHmCruWNtigafc0eUIXJAhdbf/view?usp=drive_link

Goals

  • JS – JHorton is off on parental leave for the next month or so. If folks need stuff from him, instead contact DCole or myself during that time.

  • alchemiscale.org user group

    • user questions / issues / feature requests

    • results to share?

    • compute resources status

    • call for new users

    • current stack versions:

      • alchemiscale: 0.1.4

      • gufe: 0.9.1

      • openfe: 0.11.0

      • perses: protocol-neqcyc

  • IP : Protein-ligand benchmarks working group update

  • alchemiscale development : current sprint runs 8/23 - 9/4

    • current aim is to complete release 0.2.0 by end of this sprint, includes first round of documentation

    • architecture overview : PL Benchmarks on FAH - Architecture v6.drawio

    • coordination board : alchemiscale : Phase 2 - User Feedback and Documentation

    • alchemiscale 0.2.0 milestone:

    • updates on In Review, In Progress, and Available cards

  • new discussion items from ASAP roadmap: ROADMAP: Computational Chemistry Core alchemiscale-related roadmap | Notion

Discussion topics

Notes

Notes

  • alchemiscale.org user group

  • user questions / issues / feature requests

  • results to share?

    • IA – One comment from HB and I is that we saw a relatively large number of errors. We haven’t dug into this yet, but what do you see, DD?

    • DD – We did see a bunch of jobs being submitted by openfe 0.12 envs, which are incompatible with the deployed version and error during deserialization.

    • IA – I saw a lot of timeouts on my end.

    • DD – I think the server was a bit overwhelmed for a while because it needed to flush through the failures (run them enough to hit the retry limit).

  • compute resources status

  • call for new users

  • current stack versions:

    • alchemiscale: 0.1.4

    • gufe: 0.9.1

    • openfe: 0.11.0

      • DD – Should we update this?

      • RG + IA – No, we can use the beta deployment for newer versions.

      • DD – Glad that’s getting used. Let’s make sure we don’t forget to make the migration machinery to bring everything up.

    • perses: protocol-neqcyc

  • IP : Protein-ligand benchmarks working group update – I uploaded the manually-curated maps last week, HBaumann reviewed. It turns out I need to change how I create them, there was a bug in the atom indices. I think it’ll be best to use the atom mapper inside OpenFE instead of the perses one. On the other hand, I ran the radial maps that HB shared, haven’t analyzed but we’ll look into it tomorrow.

    • IA – Fair warning for tomorrow’s meeting - It looks like the previous Perses maps may have used the scores the wrong way around, so I think HB may have fixed this and be running calcs now.

    • DD – I could believe that, we’re running on about 500 GPUs right now. MHenry has added about 100 workers on Open Science Grid at the moment. 500 is a new peak for us.

    • MH – I’m hovering around 60 jobs on open science grid, may be able to tune resource requests to get more simultaneous jobs. OSG is a pre-emptible queue running nationwide on unused cluster time.

  • IA – MH, you mentioned that there may be other types of architectures available and we may be able to do non-gpu jobs. I think some calcs, especially vacuum, could be quite performant. MH, could you check this out?

    • MH – Yeah, that’d make sense. I could set up scopes to route these CPU-friendly jobs to CPU workers

    • DD – cc alchemiscale #79 - Issue to track compute tags in alchemiscale.

  • alchemiscale development : current sprint runs 8/23 - 9/4

    • current aim is to complete release 0.2.0 by end of this sprint, includes first round of documentation

    • architecture overview : PL Benchmarks on FAH - Architecture v6.drawio

    • coordination board : alchemiscale : Phase 2 - User Feedback and Documentation

    • alchemiscale 0.2.0 milestone:

      • DD – Pruned a little bit

      • DD – Most of these are small, but “result path convergence” is large so we may end up punting on that.

    • updates on In Review, In Progress, and Available cards

      • DD – Alchemiscale 174 – online example cookbook/notebook - IA, were you going to give me feedback on this?

        • IA – Yes, will do. One of the graphs needs changing

        • MH – Informally, this worked really well for me. I was able to use this tutorial and submit jobs easily.

      • IP – PLB 93 – Should be on track - Need to get feedback from HB. There are still some other failures that we’re seeing but I think it’s OK to merge with those.

        • IA – For the MSTs the failures may be because the networks were made using a misinterpretation of the score around (flipping the meaning of good and bad). I do see that the automatic restart is often enough to get a failure to work. So there’s more to look into here.

        • IP – Agree, we can keep iterating on this after the release.

      • IP – Perses 1066 – Noneq cycling – We need to finish the bugfix release first. Also discussed having protocols centralized in a repo. But what this means to me is that we get it into main and we can install from main.

        • DD – right, dev/pip installs will work for us if we can get everything updated in main.

      • DD – Gufe 215 – Waiting on review from DS.

        • DS – Apologies, I’ll get to this.

        • DD – No huge rush, we don’t see this issue arising often.

    • In progress

      • DD – Alchemiscale 31 – Developer docs – Still a WIP, no reviews needed. Added a core dvelopers section that goes through architecture. Plan to add other sections, including how to contribute.

        • JW – Is there a playbook/operator docs?

        • DD – Yes, there’s already basic sections on operations. MHenry is using these to good effect.

      • DD – Doing user story review for ASAP Discovery and OpenFF.

    • Available

      • DD – Alchemiscale 43 – Adding scope read/write/domain permissions – Intends to empower users more to grant permissions that they already have, and not require too much from admins. Aiming for 0.2.0 release but it may not make the cut.

      • DD – Alchemiscale 125 - conda-forge packaging - MH what are your thoughts?

        • MH – Could make sense to have client and maybe compute on there. But server may not make sense because of the details of how we deploy.

        • DD – With the server, we already have a yaml for testing in the repo.

        • MH – That makes sense, though the pip installs from that yaml could be messy

        • DD – the async_lru package may not be available on c-f

          • (Looked again, it is on c-f, just a dash-to-underscore conversion)

        • DD – May overlap with the concept of splitting a perses protocol…

        • MH – Does server need perses?

        • DD – Yes, server needs to deserialize right now. Eventually we could make this not necessary but right now the server tries to deserialize what comes in.

        • JW – Would it be better to wait on this until the perses release?

        • MH – There wouldn’t be a lot of additional work to release this now and correct for the perses release later.

        • DD – Kick to 0.3?

          • JW + MH – Yes

      • DD – Alchemiscale 134 – API enforced upper bounds on state changes to add some safety guardrails for users. I think this is a good idea.

      • DD – Alchemiscale 154 – Batch pulling of all results for a network. JHorton was particularly interested in this.

      • DD – Alchemiscale 104 – REsult path conversion - Our first iteration of trying to get files from a protocol to get uploaded to server and make available to users. DS, this is related to GUFE PR for path conversion. Anything relevant to share?

        • DS – Nothing to think of… This will affect how protocols work but will make it easier touse protocols - they’ll look like a pathlib.path. See the docs in that PR for details. Nothing is using this yet, but it will change how protocols work on the inside. We’ll be putting this into the executedag function. So instead of shared/ and scratch/, there will also be permanent/.

        • DD – Great, I’ll plan to give this a try. I’m not sure how to test this… Will reach out to you once I get into testing.

        • DS – Sounds good. The big thing is that some protocols will have to change, since some stuff will need to be marked as permanent.

        • DD – Got it. I’ll still try to get this into alchemiscale 0.2.1, but it looks a bit hairy so it may end up in 0.2.1

  • new discussion items from ASAP roadmap: ROADMAP: Computational Chemistry Core alchemiscale-related roadmap | Notion

  • JC – We’ve been playing with this via ASAP-cli, JS knows more.

  • JS – That’s right, we’re seeing some small performance issues but need to dig into those more.

  • IP – There were some retrieval issues but a recent release mayt have fixed those.

  • DD – Asapdiscovery 424 – JH is out so I’ll look at this.

Action items

Decisions