2023-09-19 alchemiscale Working Group meeting notes

Participants

  • @David Dotson

  • @John Chodera

  • @David W.H. Swenson

  • @James Eastwood

  • Jenke Scheen

  • Meghan Osato

  • @Jeffrey Wagner

  • Levi Naden

Recording: https://drive.google.com/file/d/1LgkvkHJ7XSz7YqlRgwo5clM2Kfw976xj/view?usp=sharing

Goals

  • alchemiscale.org user group

    • user questions / issues / feature requests

    • results to share?

    • compute resources status

    • call for new users

    • current stack versions:

      • alchemiscale: 0.1.4

      • gufe: 0.9.1

      • openfe: 0.11.0

      • perses: protocol-neqcyc

  • DD : gufe#184 - openmmforcefields next release?

  • IP : Protein-ligand benchmarks working group update

  • alchemiscale development : current sprint ended 9/18; next sprint spans 9/20 - 10/2

    • worked with Swenson on Wednesday and identified mismatch between how we currently handle result storage in alchemiscale vs. how the gufe storage system is designed; makes more sense to refactor alchemiscale to align with this approach than to hack it in backwards

    • architecture overview : https://drive.google.com/file/d/1ZA-zuqrhKSlYBEiAIqxwNaHXvgJdlOkT/view?usp=share_link

    • coordination board : alchemiscale : Phase 2 - User Feedback and Documentation

    • alchemiscale 0.2.0 milestone:

    • review Done cards

    • updates on In Review, In Progress, and Available cards

  • new discussion items from ASAP roadmap: ROADMAP: Computational Chemistry Core alchemiscale-related roadmap | Notion

Discussion topics

Notes

Notes

  • alchemiscale.org user group

    • user questions / issues / feature requests

      • MO – Seeing high error bars with hif2a on alchemiscale - We’re rerunning with local openfe stack (no alchemiscale) to see where the problem is

      • JS – Not much to report, still working on pulling down errors

    • results to share?

    • compute resources status

      • DD – NRP and Lilac workers are active, there’s nothing more in queue.

      • JC – Are pre-emptible lilac workers running now?

        • DD – Yup, seems to be working as expected.

    • call for new users

    • current stack versions:

      • alchemiscale: 0.1.4

      • gufe: 0.9.1

      • openfe: 0.11.0

        • DD – I know we’re a bit behind on this - OpenFE is up to 0.14.

        • DS – Right, we’re pushing out some early things on to conda-forge. But we haven’t done the file retrieval stuff which is what you were primarily interested in.

        • DS – GUFE 227 - Also some tentative plans to provide data migration scripts. Will be versioning every GUFE tokenizable serialization so we know what to migrate.

        • DD – The focus of the 0.3 alchemiscale release will be a number of targeted refactors. And so in order to handle various stack versions we’ll need migration functionality. So it’s great to see that you’re working on this.

        • JC – Re: conda-packaging, we’re putting in place a new contract with Quansight, so if you have tricky packaging Qs we may be able to help.

      • perses: protocol-neqcyc

      •  

  • DD : gufe#184 - openmmforcefields next release?

  • IP : Protein-ligand benchmarks working group update

  • (IP not in attendance)

  • alchemiscale development : current sprint spans 9/6 - 9/18

    • worked with Swenson on Wednesday and identified mismatch between how we currently handle result storage in alchemiscale vs. how the gufe storage system is designed; makes more sense to refactor alchemiscale to align with this approach than to hack it in backwards

    • architecture overview : https://drive.google.com/file/d/1ZA-zuqrhKSlYBEiAIqxwNaHXvgJdlOkT/view?usp=share_link

    • coordination board : alchemiscale : Phase 2 - User Feedback and Documentation

    • alchemiscale 0.2.0 milestone:

      • DD – Alchemiscale 102 - Got feedback from IA on docs, looks good.

      • DD – Alchemsicale 104 - result path conversion and upload - Worked with DS and spotted a slight structural mismatch between alchemiscale and gufe - alchemiscale only passes REFERENCES to actual objects in the object store between server and workers. Whereas GUFE passes the actual objects around. So I think it makes sense to refactor Alchemiscale to follow the GUFE conventions. Opened alchemiscale 180 with a plan for this.

        • JC – All of this complexity could be resolved by making the protocol units short.

        • DS – DD isn’t quite right - We don’t upload as you go by default - instead we save all the files and only send a summary json around. The reason we do this is so that we can later do things in parallel

        • JC – Two things:

          • Want to run things in parallel

          • Short vs long runs

        • DS – We decided that a single DAG in GUFE was going to be capable of statistics, so a single DAG can have 3 copies.

        • JC – So a DAG can have 3 parts, why would those need to be ordered sequentially? Why can’t you run it in parallel?

        • DS – This was discussed internally a long time ago and this has already been decided.

        • JW – I’d appreciate a description of this from the beginning

        • DD – A protocol defined a protocoldag by way of its settings. Executing a single rpotocoldag should be able to get you a dG (DS: and uncertainty). For example, the PersesProtocol is enough to give youa dG, but it doesn’t guarantee that it’s converged. So there you’re run many protocoldags and stack them together to get more accurate dG and uncertainty. I know that’s different from what OpenFE does. So the alchemiscale model is closer to the perses model, and the OpenFE model is different.

        • JC – It sounds like, in OpenFE, a single DAG needs to run in a single task…. Is checkpointing related to running things in parallel?

        • DD –

        • … (recording, ~20 minutes)

        • JC – Some ways to do this - Keep massive checkpoint file of everything, OR keep states of everything at the end of each sim to initiate the next one. I think the latter is better, and requires just a little refactoring of how we do storage.

        • DD – I don’t recall that being a priority on the OpenFE end

          • DS – Right, it’s something that’s possible for the roadmap next year.

          • JC – This would be useful for Perses - we’re having trouble converging some edges and being able to do this would really help.

        • DD – For actionable conclusions, I think alchemiscale 180 wuold be an improvement. The big points are that keeping these files will enable better error analysis, inspection of trajectories (like for science) and eventually extension for some protocols.

        • DD – I’d hoped to put this into 0.2…

        • JW – I approve adding this refactor to the 0.3 milestone, but think the F@H interface is more important.

        • JC – Agree and approve accordingly

        • DS – No veto from me - I’d just recommend not going too fast on this, since I need more time to work on the pieces that will interface with this/want to avoid duplicate work.

    • review Done cards

    • updates on In Review, In Progress, and Available cards

Action items

Decisions