2024-03-19 alchemiscale : dev group meeting notes

Participants

  • @David Dotson

  • Ian Kenney

  • @Iván Pulido

  • @Irfan Alibay

  • @John Chodera

  • Meghan Osato

  • @Mike Henry

  • @Jeffrey Wagner

  • @Matt Thompson

Meeting recording: https://drive.google.com/file/d/1x7iga_p-VVyNJHnwnGofQv8g7yxLdsWs/view?usp=sharing

Goals

  • DD : alchemiscale roadmap

    • Q1 : complete “living networks” performance improvements

      • nearly complete; will appear in release 0.4.0

    • Q1 : Folding@Home compute services deployed in production

      • finish MVP, with integration test suite by 2024.03 2024.04

        • this is delayed; need an additional 2 weeks to finish this out

      • perform FAH tests with volunteers during 2024.04

        • public work server up by 2024.03.15 2024.04.15

        • confidential work server up by 2024.04.01 2024.05.01

    • Q2 : develop Strategy structure, initial implementations

    • Q3 : enable automated Strategy execution by end of Q3, 2024 (2024.10.01)

  • JW – Thanks a billion DD and JC for jumping on the NRP slides on Friday. I super duper appreciate it!

  • DD : alchemiscale-fah : NonEquilibriumCycling execution demo

  • MH : status of openfe + gufe 1.0 testing with alchemiscale main

  • DD : necessary to deploy current release of openfe + gufe?

  • alchemiscale development : new sprint spanning 3/20 - 4/1

  • DD : aiming for release 0.4.0 by end of this sprint

    • includes a ton of performance improvements from Ian Kenney, support for needs in alchemiscale-fah from David Dotson

    • will substitute feflow in place of perses

  • IP: feflow release plans?

Discussion topics

Notes

Notes

  • DD : alchemiscale roadmap

    • Q1 : complete “living networks” performance improvements

      • nearly complete; will appear in release 0.4.0

      • JC – JS is at ACS right now, but he and JH will be the initial users, so it’ll be good to keep them in the loop to ensure they’re making tools to prepare and submit systems. Eg, “lets do FE calcs between a bunch of xtal structures and generate the edges between them.” So the work is probably mostly on the mapping and network planning level.

      • IA – BR is working on network planning with konnector https://github.com/OpenFreeEnergy/konnektor

      • DD – And JH and JS are aware of this. Though doing multiple xtal structures will be new territory. Will follow up with them on this.

    • Q1 : Folding@Home compute services deployed in production

      • finish MVP, with integration test suite by 2024.03 2024.04

        • this is delayed; need an additional 2 weeks to finish this out

      • perform FAH tests with volunteers during 2024.04

        • public work server up by 2024.03.15 2024.04.15

        • confidential work server up by 2024.04.01 2024.05.01

    • Q2 : develop Strategy structure, initial implementations

    • Q3 : enable automated Strategy execution by end of Q3, 2024 (2024.10.01)

  • JW – Thanks a billion DD and JC for jumping on the NRP slides on Friday. I super duper appreciate it!

    • JC - Happy to help, looks like we won some goodwill.

    • DD – Agreed. It’s super important to have these resources available.

  • DD : alchemiscale-fah : NonEquilibriumCycling execution demo

    • (General) – The compute service needs to run setup for each protocoldag. That setup could be slow because it includes minimiszation+equilibration. So one idea is to pack min+eq into integrator that we feed F@H so it happens there instead.

    • JW – What it latency on F@H?

      • JC – Depends on the weight you’ve been assigned. There’s a ~1 minute lag between assignment servers and work servers. So there’s a pretty good clip. You can also set a deadline for when a job goes out for how long it takes to get rescheduled. So if people stop volunteering their compute mid-run for F@H, then it may take a little while to figure out to reschedule it, and there are parameters for how urgently to retry. So it may take a while to get a network totally complete.

      • DD – And we’re submitting multiple tasks to ensure that we get enough sampling/overlap even if some volunteer hosts drop off.

    • IP – We are changing some things in feflow to support GUFE and openfe 1.0. Do you foresee major changes would be needed here?

      • DD – Probably. I’d be happy to review PRs that you put through to update alchemiscale.

    •  

  • MH : status of openfe + gufe 1.0 testing with alchemiscale main

    • MH – We just got iris unblocked 2 hours ago. So we have a ton of jobs running in the queue (109 right now). The status of QA is that TYK2 and some HFE stuff looks fine. No software errors/crashes due to our code. So I think it’d be nice to deploy the rc to alchemiscale. This would also let us use a newer OpenFF toolkit .The big concern is that this might change things for users like alchemiscale. So this would let us test even more with more suer traffic.

    • DD – I don’t think we know whether asap-alchemy will work immediately with openfe and gufe 1. So could you try hooking up asap with the new version and having JS fire some submissions at it.

    • MH – Ah, I see, that might be hard to test. Like, it’s not as simple as just running CI or something.

    • DD – I’ll take this on.

    • MH – The changes are mostly small, like name changes.

    • DD – And did you migrate an existing neo4j database to 5?

    • JW – I don’t want to

    • IA – One thing to consider is that the there have been a lot of changes to openfe SINCE the 0.15.

    • JC – doesn’t it make more sense to run all the FFs you want to compare with the current software stack + benchmark systems?

      • JW – yeah, coming to that conclusion as well

      • MO – Before this came up, DM wanted me to run 2.2 and back-compare to IA and HB’s results when they did the benchmarking of 2.1 last march. If it’s looking like the infrastructure is considerably different then we can run both 2.1 and 2.2 using the same thing.

      • IA – For the systems that we’ve chosen, the changes that we’ve chosen are within error. But there are two benchmarks that can be done - the legacy dataset (run by melissa) (and we should run this again and see if there are significant changes), and the new version (upcming 0.3 release)…

      • DD – Maybe let’s just run both, we have lots of bandwidth.

        • JC + JW – Agree

      • IA – Issue is timing - If we

        • JW – Benchmarking openff-2.1 vs 2.2

        • IA – OpenFE’s plan is to benchmark is to run new versions of PLB. Changing softwarte stacks may affect OpenFF validation. At the same time, the original plan was for HB to run 2.1 and MO to run 2.2 and then cross-compare… But if we run multiple versions

        • MO – In talking with DM on Friday, he didn’t think we needed to rerun 2.1 benchmarks to do 2.2.

      • JW – Do these necessarily compete for resources?

        • MH – Kinda. Lilac and OSG and iris are shared.

        • (see recording, 55 minutes onwards)

        • Decision – MO will benchmark with FF modification hack, using OpenFE 0.14 and OFFTK 0.13.2. We won’t update the legacy alchemiscale client docker image. MO will benchmark with the 5 targets discussed with DM.

  •  

  •  

  • .

  • DD : necessary to deploy current release of openfe + gufe?

  • alchemiscale development : new sprint spanning 3/20 - 4/1

  • DD : aiming for release 0.4.0 by end of this sprint

    • includes a ton of performance improvements from Ian Kenney, support for needs in alchemiscale-fah from David Dotson

    • will substitute feflow in place of perses

  • IP: feflow release plans?

    • IP – We finished the hybridtopologyfactory migration and testing. We’re now working on PR #38, supporting openfe and GUFE 1.0.

    • IA – I will be looking at working out an API for units, so we can migrate the HREX(?) stuff. Hopefully that doesn’t affect…

    •  



Action items

@David Dotson will follow up with Jenke Scheen, @Joshua Horton on tools for building living networks available in time for alchemiscale 0.4.0 deployment
@David Dotson will create PR for asap-alchemy testing against openfe 1.0 + gufe 1.0

Decisions