2024-07-02 alchemiscale : user group meeting notes

Participants

  • @Irfan Alibay

  • @Jeffrey Wagner

  • Ian Kenney

  • @David Dotson

  • Meghan Osato

  • @Mike Henry

  • @James Eastwood

 

Meeting recording: https://drive.google.com/file/d/1e4wTY4YYEESWYsJSMGT7U5QhDDh6mZDV/view?usp=sharing

Goals

  • alchemiscale.org

    • user questions, issues, requests

    • compute resources status

    • current stack versions:

      • alchemiscale: 0.4.0

      • neo4j: 5.18

      • gufe: 0.9.5

      • openfe: 0.14.0

      • perses: protocol-neqcyc

      • openmmforcefields: 0.12.0

  • DD: working with IK on finishing out alchemiscale release 0.5.0

    • includes:

      • openfe + gufe 1.0 compatibility

      • Folding@Home compute support

      • feflow inclusion, drop of perses

    • will be deployed on a new host, new database as api.alchemiscale.org with advance notice to users

      • current api.alchemiscale.org instance will be moved to api.legacy.alchemiscale.org, kept around for some time, but with no new compute provisioned

  • DD : working on testing MIG splitting on Lilac A100

  • IA : feflow and settings changes that are backwards-incompatible

Discussion topics

Notes

Notes

  • alchemiscale.org

    • compute resources status

      • DD – Nothing running currently. Plenty of bandwidth for submission. We’ve completed 109k tasks since we started this instance.

    • current stack versions:

      • alchemiscale: 0.4.0

      • neo4j: 5.18

      • gufe: 0.9.5

      • openfe: 0.14.0

      • perses: protocol-neqcyc

      • openmmforcefields: 0.12.0

        • JW – I ran into PE and asked for a release with the fix for GAFF/AmberTools that should allow us to have the newest openmmforcefields with the latest AmberTools

        • DD – Is this the only reason we’re constrained to an old AT?

        • JW – It’s certainly one reason. Not sure if there are others. Will this require a new release of OMMFFs?

        • MH – We’ll need a new release to re-enable gafftemplategenerator support.

          • PR for openmm 8.1.2

          • Issue for openmm 8.1.2

          • openmmff issues:

        • ….

        •  

      • IA – For benchmarking, I’m wondering which versions of things our industry partners should use

        • MH – RIght now we have a conda lock file that pins some deps in a platform-independent way. This uses the conda lock tool or micromamba directly. Here we can create a set of workers using these lock files to ensure things run with the right versions of deps.

        • DD – That’s a good point. So if the major version are set then the data models should be consistent. Could deploy specific workers in two ways… but depends on which compute backend we use.

        • MH – Probably running on iris.

        • DD – Ok, that should be straightfoward, don’t need to worry about docker images, can just make conda envs on HPC systems. I’ll talk with you offline about this.

        • IA – I haven’t thought much, but I wonder if this makes sense for this to be wider than ASAP. Like, should this be wider than ASAP, and be a benchmark environment that everyone uses.

        • MH – I can create the lock file for this run including the alchemiscale versions and share instructions.

          • https://industrybenchmarks2024.readthedocs.io/en/latest/installation.html#conda-lock-file

          • lock files attached to the release

        •  

        • DD – I think this should be fine. Can coordinate separately on getting workers spun up.

        • MH – We’ll need to coordinate to ensure that I can create a coherent env with our lock file.

        •  

          •  

  • DD: working with IK on finishing out alchemiscale release 0.5.0

    • includes:

      • openfe + gufe 1.0 compatibility

      • Folding@Home compute support

      • feflow inclusion, drop of perses

    • will be deployed on a new host, new database as api.alchemiscale.org with advance notice to users

      • current api.alchemiscale.org instance will be moved to api.legacy.alchemiscale.org, kept around for some time, but with no new compute provisioned

  • DD : working on testing MIG splitting on Lilac A100 to act like 7 smaller GPUs - No updates today, but this is still on my to-do list.

  • IA : feflow and settings changes that are backwards-incompatible

    • IA – What are timelines? It’d be good if feflow was at the point where it was stable at the point it was included for production. Like, if we migrate at this point, it’ll be a big pain if someone has simulations in flight since they’ll be reverse incompatible. So I’d like to wait until feflow 0.1 is out, but I don’t know whether it’ll be out in the next month-ish.

    • DD – I don’t consider feflow 0.1 to be a blocker. But I also don’t expect it to be stable - feflow is a young package and will undergo more changes. But we’ve communicated that results aren’t stable right now.

    • IA – If we look at the feflow 0.1 release, it’s not clear to me that this will achievable by the end of the month.

    • DD – Agree, lots of people are out over the summer. And I’m not in a position to demand progress here - I can’t command people involved in this.

    • IA – So, I’m wondering whether feflow 0.1 is a blocker for alchemiscale 0.5. Because if so then we need to consider if we need to put more resources into this.

    • DD – I don’t think it’s a blocker, and I don’t think openfe has effort they could divert to this. So, given that we can’t get feflow 0.1 out in a short period of time, we shouldn’t consider it a blocker. So the big downside is that this we

    • IA –

    • … (see recording ~26 minutes)

    • JW – Since we don’t use protocols from feflow, I don’t have any objections to using feflow installed from a git branch/specific hash instead of a release.

    • IA – We could also ask IP for a release of whatever’s there at the moment.

    • DD – Especially since the version number begins with “0.”, I think it’d be fine to release whatever’s there now as 0.1. Is IP out of office?

      • MH – I think he’s working. Nothing on chodera lab calendar.

      • DD – I’ll reach out to IP

    • IA – Not urgent, but I’m wondering - Based on this, what timeframes do we see for alchemiscale 0.5?

    • DD – Trying to get it out as soon as possible. Now that I’m not travelling, I’ve got more time. I’ve spun up the dev host and built new images, and currently I’m running dev testing. So this is imminent as in “next week or so”.

      • JE – Cool, this shouldn’t be a problem.

    •  

    •  

    •  

    •  

    •  



Action items

@David Dotson will ask Ivan if we can proceed with a 0.1.0 release of feflow given current main, any fundamental blockers from his perspective on doing this

Decisions