2024-08-27 alchemiscale : user group meeting notes

Participants

  • @Iván Pulido

  • @Mike Henry

  • @Jeffrey Wagner

  • Jenke Scheen

  • @Irfan Alibay

  • Ian Kenney

  • @David Dotson

  • @Mike Henry

  • @James Eastwood

 

Meeting recording: 2024-08-27-alchemiscale.mp4

Goals

  • alchemiscale.org

    • user questions, issues, requests

    • compute resources status

    • current stack versions:

      • python 3.12

      • alchemiscale: 0.5.0

      • neo4j: 5.22

      • gufe: 1.0.0

      • openfe: 1.0.1

      • feflow: 0.1.0

      • openmmforcefields: 0.14.1

      • openmm: 8.1.2

  • DD : alchemiscale roadmap 2025: possible components

    • additional FAH protocols in alchemiscale-fah

    • parallel execution of ProtocolDAGs on conventional compute, GPU saturation e.g. for feflow.NonEquilibriumCyclingProtocol

    • merging and copying AlchemicalNetworks in alchemiscale server

    • additional Strategy implementations beyond NetBFE, e.g. CBayesMBAR

    • compute autoscaling for HPC, Kubernetes clusters

    • support for result file retention and retrieval?

    • user-facing dashboard, visualizer?

    • admin-facing dashboard, management interface (e.g. adding/removing users, scopes, permissions)

    • automated, online FE estimates for transformations, whole-network estimators?

    • compute cost estimate for planned networks based on previously-executed networks?

    • others?

  • DD : proposal: reorganize alchemiscale project coordination

    • migrate alchemiscale repo under OpenFreeEnergy Github org

      • question: migrate alchemiscale-fah as well?

    • create alchemiscale channel in OpenFE Slack for developer communication

    • create alchemiscale.org channel for users of that production instance, announcements related to instance issues, deployments

    • use Github Discussions as a hub for user questions that fall outside of issues, are more to do with usage questions

    • host working group under OpenFE org, use infrastructure for tracking meeting notes

      • will share read-only meeting notes link publicly via Github Discussions

  • DD : alchemiscale-fah live test performed with FAH volunteers

    • executed 190 ProtocolDAGs on FAH using FahNonEquilibriumCyclingProtocol on ['tyk2', 'mcl1', 'hif2a', 'shp2'] without additional human interaction, corresponding to 1900 FAH work units

    • results on FAH comparable to equivalent compute performed on NRP for each edge:

image-20240827-213926.png

 

image-20240827-213938.png

 

Discussion topics

Notes

Notes

  • alchemiscale.org

    • user questions, issues, requests

      • JW – One of my goals this year is to learn to submit jobs to alchemiscale

        • DD – Happy to work with you on this. There’s a tutorial on the alchemiscale docs.

        • IA – The tutorial is probably not updated for the new API.

        • JW – I may be able to update this as I work through it. No guarantees though.

        • IA – We haven’t actually run all the benchmarks to validate the new alchemiscale.

        • JW – I don’t want to become involved in a formal scientific project at this phase - I won’t have time to responsively debug/iterate.

        • IA –

        • JS – Can use TYK2 but it’s super

        • IP – JAK2 dataset from relay - It’s open data so I can work on adding this to PLB dataset. But I’m not sure if it needs like med chemist review.

        • IA – I’d advise not getting into dataset generation

        • IP –

        • IA – My suggestion for a dataset is the MCL1 fragment collab set. It’s tiny and converges well.

        • IP – Maybe CDK8 is good.

        • IA – CDK8 has charge transformations, which we can’t really do

        • DD – Yeah, JW should use TYK2.

        • JW – Can I modify protein FF?

          • IA – Small mol FFs can be modified. Protein FF is restrictetd to openmmforcefields, so basically just AMBER. Using OMMFFs is hairy and probably will fail for proteins. When will you need protein stuff?

          • JW – I’ll discuss with OFF lead team.

      • JS – I have a real-world example of running OpenFE at a large scale. Ran 400 compounds for a high priority project. Used MST to get it done quick. However, somewhere in the network was an edge that wasn’t doable. I tried rerunning a few times, but it kept failing, so I had two disconnected networks. So on a tight deadline, MST isn’t the best idea.

        • IA – The feedback we’re getting from industry benchmarking is that NOT using MSTs was a good idea. We’re using LOMAP networks, so when we DO have failed edges we can work around it. There’s a potential issue we’re encountering with kartograf and lomap networks is that they’ll generate really some bad edges (and score them as very bad), but the sims will fail and we have to rely on redundant edges.

        • JS – LOMAP is kinda structurally aware of potential network disconnection(?) and builds in redundant edges(?).

        • DD – Did bad edges score poorly?

        • JS – Unsure in this case. But in any large network you’ll get edges that exhibit weird pathologies.

        • IP – We saw some networks generated for the PLB network by LOMAP had some disconnections (before simulating). But we didn’t have a similar problem with kartograf

        • JS – We’re talking about something different - When the network planner has everything connected but failed sims cause there to be disconnection.

        • DD – Proposed action to take here?

        • JS – Nothing really in scope for alchemiscale. We’d want to implement some sort of network healing program that watches for this case to emerge and either automatically fix or encourage manual intervcention.

        • IA – On the GUFE level, there was a discussion about adding edges/modifying networks.

        • DD – Is this for ligand networks? Alchemical networks are immutable.

    • compute resources status

    • current stack versions:

      • python 3.12

      • alchemiscale: 0.5.0

      • neo4j: 5.22

      • gufe: 1.0.0

      • openfe: 1.0.1

      • feflow: 0.1.0

        • IP – I should write a quick notebook on how to write feflow on alchemiscale

        • DD – That’d be great. Could link to alchemiscale docs and point interested users in that direction.

      • openmmforcefields: 0.14.1

      • openmm: 8.1.2

  • DD : alchemiscale roadmap 2025: possible components

    • additional FAH protocols in alchemiscale-fah

    • parallel execution of ProtocolDAGs on conventional compute, GPU saturation e.g. for feflow.NonEquilibriumCyclingProtocol

    • merging and copying AlchemicalNetworks in alchemiscale server

    • additional Strategy implementations beyond NetBFE, e.g. CBayesMBAR

    • compute autoscaling for HPC, Kubernetes clusters

      • JW – Selfishly, this would be really nice for me, since this is exactly what I do manually

      • JS – Agree. Helps because of the difficulty of orchestrating compute over time zone difference, so this can help a lot with turnaround.

      • DD – This would be handy for me too. If we always have a backlog this won’t be that important, but if we don’t it’ll help a lot.

      •  

    • support for result file retention and retrieval?

    • user-facing dashboard, visualizer?

      • IP – Progress on this?

      • DD – (Some specific ideas here, could serve users really well) (recording ~45 mins)

    • admin-facing dashboard, management interface (e.g. adding/removing users, scopes, permissions)

    • automated, online FE estimates for transformations, whole-network estimators?

    • compute cost estimate for planned networks based on previously-executed networks?

    • others?

    • JS – Pulling down final snapshots from simulation output

      • DD – This would be somewhat protocol-dependent. So alchemiscale-adjacent, but would need to be implemented in the protocol.

      • IP – There’s also a question of how to handle paths in protocols.

      • DD – Currently, if a protocol returns paths as part of its return values, we don’t do anything with them. So there would need to be some new functionality to handle actual files. Would be a really big refactor though.

      • IA – Is this related to DS’s storage management stuff?

        • DD – It is, we have an open PR on alchemiscale (#104) that stalled out because of coordination costs and other priorities. So we could implement something about storage in the GUFE layer (GUFE #186) , that alchemiscale later uses. There’s also a possibly-simpler alternate proposal in alchemiscale #180.

        • IA – Last I heard from DS, this was “working in protocols” but I don’t know exactly what this means.

        • DD – There were a few levels of storage being implemented - this added a “permanent” storage layer that long-lived files could be added to, but nothing interfaced with this.

    • IP – Is extending protocols in scope?

      • DD – Alchemiscale already supports extends , but the protocol also needs to implement it.

      • IA – From the OpenFE standpoint, this MAY happen. Not this year but possibly future years.

      • DD – There’s kinda a difference in philosophy between noneqcyc and repex, where, if you ruin a bunch of NECs, you can combine them to get an uncertainty that narrows. But with repex, you don’t get convergence, you get finer resolution on the distribution of the outputs. Not sure if this is planning to change, and if extends support would play into this.

      • IA – Short version - Industry partners want to be able to restart sims. I’m thinking that restarting and extending may be neighbors.

      • DD – Is this in the case of a sim in the edge failing or something else?

      • IA – A sim in the edge failing midway through.(?)

      • DD – (see recording ~42 mins)

      •  

      • .

  • DD : proposal: reorganize alchemiscale project coordination

    • migrate alchemiscale repo under OpenFreeEnergy Github org

      • question: migrate alchemiscale-fah as well?

        • IA – Is there a separate set of logistical challenges to migrating this?

        • DD – Not really. It’s currently in OpenFF org.

        • JW – I’m fine with migrating to OpenFE. I’d just like to keep write access.

        • IA + JE – You should already have it at an org level, and if you don’t have access to these we’ll give it to you.

    • create alchemiscale channel in OpenFE Slack for developer communication

    • create alchemiscale.org channel for users of that production instance, announcements related to instance issues, deployments

    • use Github Discussions as a hub for user questions that fall outside of issues, are more to do with usage questions

    • host working group under OpenFE org, use infrastructure for tracking meeting notes

      • will share read-only meeting notes link publicly via Github Discussions

    • JW – This all sounds A-OK by me. Maybe also share the recording with the notes.

    • JE – I think this is a great idea and am in support. Might be some logistical hurdles we have to clear in terms of taking advantage of OpenFE’s resources (eg public/private split on OpenFE’s confluence). So there may be some logistics to the migration.

      • DD – That’s fine, we can do this in stages.

      • JE – We will also need to have some internal OpenFE discussion about other ramifications.

    • DD – We’d also migrate the project board to openfe - Will make the tagging/work assignment more straightforward.

      • JW – That’s fine by me.

      •  

  • DD : alchemiscale-fah live test performed with FAH volunteers

    • executed 190 ProtocolDAGs on FAH using FahNonEquilibriumCyclingProtocol on ['tyk2', 'mcl1', 'hif2a', 'shp2'] without additional human interaction, corresponding to 1900 FAH work units

    • results on FAH comparable to equivalent compute performed on NRP for each edge

    • (General) – There’s something really weird about this range. DD will follow up with IA. But overall the consistency between NRP and FAH is very promising.



Action items

Decisions