2023-06-20 alchemiscale Working Group meeting notes

Participants

  • @Mike Henry

  • Levi Naden

  • @David Dotson

  • @David W.H. Swenson

  • @Irfan Alibay

  • Jenke Scheen

  • Meghan Osato

  • @Jeffrey Wagner

Recording: https://drive.google.com/file/d/1_6jH9wnFG_EUm76PT3JMdWJUpDFvyCq2/view?usp=sharing

Goals

  • alchemiscale.org user group

    • user questions / issues / feature requests

    • compute resources status

    • call for new users

    • current stack versions:

      • alchemiscale: 0.1.2

      • gufe: 0.7.3

      • openfe: 0.7.4

      • perses: protocol-neqcyc

  • JC : follow-up on extends support in RelativeHybridTopologyProtocol and NonEquilibriumCyclingProtocol

  • JC + JS : follow-up on tools needed for AlchemicalNetwork generation?

  • IP : Protein-ligand benchmarks working group update

  • alchemiscale development : sprint inflight, spans 6/14 - 6/26.

    • architecture overview : PL Benchmarks on FAH - Architecture v6.drawio

    • coordination board : alchemiscale : Phase 2 - User Feedback and Documentation

    • alchemiscale 0.1.3 milestone:

    • alchemiscale 0.2.0 milestone:

    • updates on In Review, In Progress, and Available cards

  • new discussion items from ASAP roadmap: ROADMAP: Computational Chemistry Core alchemiscale-related roadmap | Notion

Discussion topics

Notes

Notes

  • alchemiscale.org user group

    • user questions / issues / feature requests

      • JS – How are my jobs looking?

        • DD – I think they’re now complete

        • JS – Hm, I can’t pull results. Probably a technical thing.

        • We’ll try live-debugging this at the end of the meeting.

      • DD – The “TYK2 tutorial” jobs have been running for several days - IA, could you take a look?

        • IA – Yes

      • MO – I’ve submitted a few jobs. There are some errors I need to debug, just getting through finals so I should have time soon. Seems like the LOMAP thing that IA gave me probably worked, I just need to dive into the others.

      • DD – We have lots of resources, so please don’t hesitate to submit more jobs!

    • compute resources status

    • call for new users

    • current stack versions:

      • alchemiscale: 0.1.2

      • gufe: 0.7.3

      • openfe: 0.7.4

      • perses: protocol-neqcyc

  • JC : follow-up on extends support in RelativeHybridTopologyProtocol and NonEquilibriumCyclingProtocol

    • JC, IA, and DS are meeting tomorrow.

    • IA – We met with JC but it feels like a long time ago. We need to independently work on implementation. JC may get back to DD about things needed to extend from alchemicale starting point.

    • IA will link notes here: https://docs.google.com/document/d/1Rik5mKFYWI14Nf90hGeOp-KOztnyKShqY3SIGD_JEjo/edit?usp=sharing

      • IA – This isn’t super high priority stuff, and it will require some structural changes. So I don’t think these changes will be made until early 2024. I think IP and JC will work separately on extending noneq cycling since those are distinct components

      • DD – Distinction between restarting and extending. Does extending really require changes in OpenMMTools?.

      • IA – Right now we interact with netcdf/file system using openmmtools, but that may change with the new file/distibuted storage structure…. (see recording ~13 mins)

      • DD – Gotcha, so we want to have something better planned/architected than a quick hack to get it working.

      • DS – There was some thought that the data being transferred here could go through the protocoldagresult, which could fatten some objects.

      • DD – protocoldagresults are in the object store, not neo4j, so that shouldn’t affect db performance. But users who pull those results for other purposes will suffer from the additional bloat.

      • IA – OpenFE needs to discuss internally to determine how our resources should be allocated on this front

      • DD – Noneqcycling protocol handles result gathering in a different way than relativehybridtopology protocol, so that may change things a bit.

      • … (see recording ~15 minutes)

  • JC + JS : follow-up on tools needed for AlchemicalNetwork generation?

    • JS – I met with IA and RG, had a helpful discussion, aligned with what we’re doing with ASAP on FE calcs. Main goal was to see if we were duplicating work. I think there’s good separation already (largely from coordinating in this weekly meeting). Main outcome was that, for us, it’d be useful if where was a ligand-network-generating algorithm. So it seemed like low-hanging fruit for us to implement a version of LOMAP, possibly with additional optimization. RG has a PR to get LOMAP in…

    • IA – LOMAP should be available soon. I can’t recall if RG started work on fastLOMAP.

    • JS – We’ll be looking to see whether fastLOMAP makes the same results as LOMAP.

    • DD – So, LOMAP does BOTH atoms maps and network generation?

    • JS – Mostly network generation, I don’t think anyone uses the atom maps that come out. Perses has another way to make atom maps but neither OFE nor perses devs use it.

  • IP : Protein-ligand benchmarks working group update

    • JS – We primarily want to identify the main goal of the PLB repo. Right now we’ll primarily meet with the academic folks about PLB and in a later stage we’ll pull in industry.

    • IA – Re: when? I think we said we wouldn’t meet until we get 0.3 finished. That’s blocked by testing our network generation, which will require newer GUFE and OpenFE being deployed on alchemiscale.

      • bottleneck is alchemiscale 0.1.2 with old gufe/openfe; need new deployment to unblock 0.3.0

    •  

  • alchemiscale development : sprint inflight, spans 6/14 - 6/26.

    • architecture overview : PL Benchmarks on FAH - Architecture v6.drawio

    • coordination board : alchemiscale : Phase 2 - User Feedback and Documentation

    • alchemiscale 0.1.3 milestone:

    • alchemiscale 0.2.0 milestone:

    • updates on In Review, In Progress, and Available cards

      • Milestone 0.1.3 - Point release in response to inital user issues

      • DD – PR for addressing #126 is ready for review, improves API points. HMO gave me a review so I’ll try to merge later today.

      • DD – Also trying out some approaches for async retrieval (alchemiscale 126).

    • In progress

      • MH – Alchemiscale 130 – I’m working on backup solution and docs to do it. I think I can set it up as a periodic provess and we’lol need to plan some downtime. We may alos be able to hook it to s aslack channel and email notification.

        • DD – Estimate for time it takes to bring down DB, make backup, and bring DB back up?

        • MH – I’d have to check. Will need to plump up a DB with real-looking data to get realistic timing.

        • DD – Happy to test this in prod. It should be fine to bring the system down for 30-60 seconds.

        • MH – Yeah, with worker retry logic this should be stable.

        • DD – Agree

      • DD – #140 – Taking this work up now. This is being able to more quickly retrieve results by making hundreds of calls to get protocoldag objects at the same time. Prototypes in #148. Will be applying this to result retriveal. With that a faster status retrieval we should resolve 126 and 140.

      • DD – IA, Has 129 been a problem for you?

        • IA – Submitting tutorial took a few minutes. Always submitted from ym workstation in the UK. But if JS doesn’t have a problem then it might just be my network connection.

        • JS – takes about 2 mins for me.

        • DD – If it’s measured in minutes that’s a bit slow.

        • JS – Would be nice to have some sort of feedback so I know things are happening.

        • DD – Unfortunately it’s an all-or-nothing operation so we couldn’t show a meaningful progress bar. But it would be good to show some sort of UX to indicate things are happening.

        • JS – Possible to estimate upload time?

        • DD – Not really, internet transfer estimation is hard. I’ll see what I can do here before the release but I won’t let it slow the release.

      • DD – 0.2.0 release will come later. Not focusing on that now.

    • JW – Rough estimate for need for F@H (or are current resources sufficient?) and if we do need it, roughly what quarter do we expect implementation to start/end?

      • DD – JS/ASAP will need F@H regardless of OpenFF needs. Right now, having two compute sources is kinda nice, but it’ll be nice to have more.

        • JS – JC is applying to ACCESS cluster.

      • DD – Anticipating beginning work on F@H in late summer/early fall. For now we want to focus on releasing alchemiscale on the current schedule with identified issues. I think 0.3.0 milestone could happen in parallel to F@H work.

      • JW – Current runtime for running PLB on a new FF?

        • IA – The 2.1.0 release tests took about 3 days, though that was just on a subset (8 targets) of PLB

      • IA – Vsites will be hard, OFE needs implementation time and convincing if you want to go for this.

        • JW – Agree, I’ll be in charge of watching vsite FF progress and will start this conversation when the time is right.

      • DD – Will IA continue being the submitter for this? I think OFF could use their own submission manager.

        • IA – I can do small jobs

        • JW – MOsato will be taking over on the OpenFF side.

      • DD – And we could having something similar to qca-dataset-submission?

        • JW – Yup! that’d be great.

      •  

      •  

      •  

      •  

      •  

      •  

Action items

Decisions