/
2024-09-03 alchemiscale : dev group meeting notes

2024-09-03 alchemiscale : dev group meeting notes

Participants

  1. @David Dotson

  2. Ian Kenney

  3. @Irfan Alibay

  4. @John Chodera

  5. @Joshua Horton

  6. @Jeffrey Wagner

 

Meeting recording: https://drive.google.com/file/d/175OiPVu33qgrre6HIG3yDj0WbjHp75kL/view?usp=sharing

Goals

  • alchemiscale roadmap 2024

    • Q1 : complete “living networks” performance improvements

    • Q1 : Folding@Home compute services deployed in production

      • finish MVP, with integration test suite by 2024.03 2024.06

      • perform FAH tests with volunteers during 2024.04 2024.06 2024.07 2024.08

        • public work server up by 2024.03.15 2024.06.11 2024.07.19 2024.07.31 2024.09.15

        • confidential work server up by 2024.04.01 2024.07.01 2024.07.31 2024.09.30

    • Q2 Q3 : develop Strategy structure, initial implementations

      • currently in design phase

    • Q3 : enable automated Strategy execution by end of Q3, 2024 (2024.10.01) mid Q4, 2024 (2024.11.15)

      • performing design in parallel with Strategy structure above

  • alchemiscale-fah 0.1.0 release imminent

    • finishing out remaining changes noted from live test here:

  • DD : reorganizing alchemiscale project coordination

    • will migrate alchemiscale repo under OpenFreeEnergy Github org

      • will migrate alchemiscale-fah as well

    • created alchemiscale channel in OpenFE Slack for developer communication

    • created alchemiscale-org-instance channel for users of that production instance, announcements related to instance issues, deployments

    • will use Github Discussions as a hub for user questions that fall outside of issues, are more to do with usage questions

      • can always convert to issues, and vice versa

    • host working group under OpenFE org, use infrastructure for tracking meeting notes

      • will share read-only meeting notes link publicly via Github Discussions

      • will migrate once OpenFE has a solution in place for this

  • DD : will move deployment envs outside of alchemiscale repo, host in new alchemiscale.org-deployments repo (in OpenFE org)

    • decouples development of alchemiscale from specific stack choices for alchemiscale.org

    • would still publish Docker images, but would be built from alchemiscale.org-deployments repo

  • DD : proposal: alchemiscale.org website

  • DD : large ProtocolDAGResults produced by RelativeHybridTopologyProtocol

  • alchemiscale development : new sprint spanning 9/4 - 9/17

    • now focusing effort on milestones for alchemiscale v0.5.1 and v0.6.0 releases

    • aim is to complete 0.5.1 fairly quickly, and produce a 0.6.0 release with major new features by the end of the month

    • coordination board : alchemiscale : Phase 3 - Folding@Home, new features, optimizations, targeted refactors

Discussion topics

Notes

Notes

  • alchemiscale roadmap 2024

    • Q1 : complete “living networks” performance improvements

    • Q1 : Folding@Home compute services deployed in production

      • finish MVP, with integration test suite by 2024.03 2024.06

      • perform FAH tests with volunteers during 2024.04 2024.06 2024.07 2024.08

        • public work server up by 2024.03.15 2024.06.11 2024.07.19 2024.07.31 2024.09.15

        • confidential work server up by 2024.04.01 2024.07.01 2024.07.31 2024.09.30

      • DD – HMO is working on deploying changes to OMM core that allows minimization-on-user-machine.

    • Q2 Q3 : develop Strategy structure, initial implementations

      • currently in design phase

      • IK – Feeling unwell, will present on this next week

    • Q3 : enable automated Strategy execution by end of Q3, 2024 (2024.10.01) mid Q4, 2024 (2024.11.15)

      • performing design in parallel with Strategy structure above

  • alchemiscale-fah 0.1.0 release imminent

    • finishing out remaining changes noted from live test here:

    • DD – Once we get this done we’ll move ahead with release and larger tests

  • DD : reorganizing alchemiscale project coordination

    • will migrate alchemiscale repo under OpenFreeEnergy Github org

      • will migrate alchemiscale-fah as well

    • created alchemiscale channel in OpenFE Slack for developer communication

      • IA – OpenFE slack is open source tier, can support hundreds of users

    • created alchemiscale.org channel for users of that production instance, announcements related to instance issues, deployments

    • will use Github Discussions as a hub for user questions that fall outside of issues, are more to do with usage questions

      • can always convert to issues, and vice versa

    • host working group under OpenFE org, use infrastructure for tracking meeting notes

      • will share read-only meeting notes link publicly via Github Discussions

      • will migrate once OpenFE has a solution in place for this

      • DD – JW, is it OK to continue using OpenFF confluence until OpenFE has their knowledge management solution hammered out?

        • JW – Yes

      • JC – Will there be a public web presence other than the GH page?

        • DD – Agenda item on this later.

      • IA – On the existing free-energy-benchmarking slack channel, I have plans to keep OMSF-wide benchmarking coordination going there.

      • JC – Ariana Brenner-clerken is a postdoc that I’ll be bringing on for a year, and her job will be automating curation of protein-ligand binding datasets. Will interface with OpenFE and OpenFF. Start date is December.

        • DD – I recall that you had a vision that we’d focus just on the artifacts themselves and more on the PROCESS of making artifacts. I assume she’ll want to interface with this group?

        • JC – That would be one way to do it, or we could create a separate meeting (though I know we don’t want more meetings)

        • DD – I’d recommend keeping it in free-energy-benchmarking channel.

        • IA – A lot of what we do in these meetings is benchmarking coordination. Right now these meetings are a mix of benchmarking, alchemiscale, and OpenFE. I’m open to suggestions for how to re-organize.

        • DD – Good point, we have many groups like ASAP who use these meeting as a touchpoint. So one thought would be to keep things roughly the same, and if the agenda starts regularly filling up, we could spin some topics into another.

        • IA – Yeah, that would work. The alchemiscale meetings currently sometimes turn into OpenFE support time. …

        • DD – I kinda support an organic approach.

        • IA – Kinda depends on how JC’s postdoc ends up working.

        • JC – I recommend starting on the current set of meetings/working groups, and watch where the time gets eaten up and plan meeting reorganizations then.

        •  

  • DD : will move deployment envs outside of alchemiscale repo, host in new alchemiscale.org-deployments repo (in OpenFE org)

    • decouples development of alchemiscale from specific stack choices for alchemiscale.org

    • would still publish Docker images, but would be built from alchemiscale.org-deployments repo

  • DD : proposal: alchemiscale.org website

    • JC – Love the idea, I think this is essential to sustainability. Showing who’s using it and what’s it’s for will be excellent.

    • DD – Ok if we make this repo as part of the move to OpenFE?

      • IA – Yes, make as many repos as you like. We may eventually need to add some repo badges to identify which repos are being worked on by OpenFE project staff vs. other folks.

      •  

  • DD : large ProtocolDAGResults produced by RelativeHybridTopologyProtocol

    • IA – I haven’t had much time to look at this. I suspect the numpyarray solution is the easiest/best. But I’m not sure how well it’ll compress. Still concerned this will lead to big outputs.

      • DD – I tried a few things and don’t think it compresses very well. The size of this thing when serialized is 32MB (uncompressed). If we do KeyedChain representation, it takes a long time and only goes down to 27MB. Is we do zstandard representation we get down to 12MB. Could test messagepack but would need to do lots of work. If we do numpy arrayss down to bytestring and bzip, it’s 19MB.

      • JC – What is this data?

        • IA – Things like pairwise RMSDs. Things from openfe-analysis, that gather additional data beyond what’s standard. It’s here because there was an idea that putting it into the results object makes it accessible from alchemiscale.

        • JC – What are the object being stored?

        • DD – It’s an RMSD for each component for each replica - eg for 11 replicas, there are 11 x (n_iterations) x (n_components, generally 2), then it’s stored to json. The json storage is probably where a lot of bloat is added

        • JC – So something like messagepack, which is built for storing binary data, would be expected to be quite good here.

        • IA – I guess it depends on how much of an overhaul of GUFE we want to do.

        • DD – Right, every attribute of a GUFE tokenizable must be json serializable. So that means we could ALSO support messagepack. So we’d need to write a messagepack serializer in addition to the standard json serializer. I’ll open an issue to explore this.

        • IA – Makes sense. In the shorter timeframe, if this is a blocker for using alchemiscale, then we should remove/disable it. Could instead move results objects to .npz files.

        • DD – If users are running locally, they shoudl have access to these files

        • … (see recording ~33 mins)

        • IA – Unlikely anyone relies on this since it just came out a few weeks ago.

        • DD – Wonder how to best announce removing this.

        • … (recording through ~40 mins)

        • IA – How big of a blocker is this right now?

        • DD – We have some workarounds, but they’re not pretty. Currently takes a while to retrieve results.

        • IA – From my end, I can triage this on thursday and estimate how much work is necessary, then I’ll reach out to DD.

        • DD – Mostly curious whether this will break existing protocoldagresults.

        • IA – Good point. I guess we might need to do a migration.

        • DD – Depends on how we do it, I’d need to see the implementation to know.

  • alchemiscale development : new sprint spanning 9/4 - 9/17



Action items

@David Dotson will create proposal for msgpack support on gufe; use this as basis for experimenting to get smaller serialized artifacts for numerics

Decisions