2023-08-22 alchemiscale Working Group meeting notes

Participants

  • @David Dotson

  • Levi Naden

  • @David W.H. Swenson

  • Hannah Baumann

  • @Irfan Alibay

  • @Iván Pulido

  • @James Eastwood

  • Jenke Scheen

  • @John Chodera

  • Meghan Osato

  • @Mike Henry

  • Richard Gowers

  • @Jeffrey Wagner

Recording: https://drive.google.com/file/d/1XwRke_nSxpAZPbOpWzwuY0PXBW7Gi1b0/view?usp=sharing

Goals

  • alchemiscale.org user group

    • user questions / issues / feature requests

    • results to share?

    • compute resources status

    • call for new users

    • current stack versions:

      • alchemiscale: 0.1.4

      • gufe: 0.9.1

      • openfe: 0.11.0

      • perses: protocol-neqcyc

  • JW : future of openmmforcefields?

  • DD : update on multi-gpu execution for single ProtocolDAG

    • DS, RG, and DD will work on establishing labels on Protocols and ProtocolUnits communicating to executor its desired capabilities

    • DD and DS can work in parallel on alchemiscale and exorcist for making use of these labels to optimally allocate resources on compute, communicates back down to the ProtocolDAG on execution

  • DD : user guide now up on alchemiscale latest docs on RTD

  • IP : Protein-ligand benchmarks working group update

  • DD: which Github org should FAHComputeService be developed under?

  • alchemiscale development : current sprint complete; next sprint runs 8/23 - 9/4

    • current aim is to complete release 0.2.0 by end of next sprint, including first round of documentation

    • architecture overview : PL Benchmarks on FAH - Architecture v6.drawio

    • coordination board : alchemiscale : Phase 2 - User Feedback and Documentation

    • alchemiscale 0.2.0 milestone:

    • review of Complete cards

    • updates on In Review, In Progress, and Available cards

  • new discussion items from ASAP roadmap: ROADMAP: Computational Chemistry Core alchemiscale-related roadmap | Notion

Discussion topics

Notes

Notes

  • alchemiscale.org user group

    • user questions / issues / feature requests

      • IP – Message from Sweta - what’s going on with lilac queues?

        • DD – We can’t run on pre-emptible queues for now. The GPU usage was somehow interactiing with how LSF does job pre-emption and the GPU weren’t getting pre-empted. So FYI to ASAP folks - You’re not running on lilca preemptible right now.

        • JW – Right, OpenFF had to apply for PRP and we used the words “open” in our justification. So if ASAP has a different policy I’d feel weird about putting ASAP jobs in our namespace. But ASAP folks would likely get an allocation if they applied!

        • DD – I’ll ping JW about application process on behalf of ASAP.

    • results to share?

      • IA – I have some, but theyre not pulled up right now.

        • (IA will schedule meeting with IP to discuss recent results)

      • DD – This would be useful for me to collect feedback for future improvements.

    • compute resources status

      • DD – (Impressive stats, see recording ~15 minutes)

      • JC – Do we have metrics/statistics about how long jobs are taking?

        • DD – We do have GUFE recording start times and end times for protocolunits, so the protocoldagresult can be inspected to get runtimes. I can also produce throughput metrics for different resources (tasks over time)

        • JC – We should keep our eyes open for opportunities to add performance measurement toolkin to guide future rounds of optimization

        • JC – Would it be possible to have a GUFE/alchemiscale object to harvest something like a dict of {GPU:performance_metrics}

        • DS – I think standardizing across multiple protocols will be difficult and not worth the time. You’re free to add that into your protocol.

        • JC – …

        • IP – How are we with getting mid-run info? Openmmtools has a yaml that you can inspect mid-run to see progress

        • DS – GUFE 186 should have some of what we want there.

        • JC – May be a simple stopgap solution - we can add arbitrary dicts into result objects. So we could take the openmmtools yaml and put that into the result dict.

        • DD – Right, that would be the easiest path

        • IP – And this would actually get pulled into the OpenFE work, since they use openmmtools.

        • JC – We can take this offline,b ut it’d be helpful to clarify that there’s a “Statistics hole” where we dump performance info that we can access later.

        • DD – IP or JC, could you open an issue in openfe/openfe to collect thoughts on this.

          • JC – Yes, will do

        • DS – If you’re thinking of collecting info on different steps in the process, then the current steps should work, by the current steps could be made more modular to get more fine-grained info.

    • call for new users

    • current stack versions:

      • alchemiscale: 0.1.4

      • gufe: 0.9.1

      • openfe: 0.11.0

      • perses: protocol-neqcyc

      • RG – We did an OpenFE release on Friday

      • DS – But no changes essential to alchemiscale, so no need for upgrade

      • DD – Ok, I’ll hold off on upgrades. Are data model changes still coming?

      • IA – We held off on those because we needed a quick release, but plan for the data model changes in the next release.

      • DD – Ok, I’ve opened alchemiscale #168 to discuss migration machinery.

  • JW : future of openmmforcefields?

    • ecosystem dependent on openmmforcefields

    • bit of an unwieldy beast; looking for alternate paths for functionality

    • future of interchange looks different with openmmforcefields existence/nonexistence

      • what is the long-term intention for ommffs?

      •  

      •  

      • JC – Was initially scoped to support additional ffxmls on a “best effort” basis, intended to be a shim to extend the life of those ffxmls. So it would be hard to replace its GAFF functionality… The big thing we need is for Interhcange to ingest and produce OpenMM system objects.

      •  

      •  

      •  

      •  

      • getting rid of openmmforcefields mean we are forsaking things like CHARMM, GAFF, etc.

        • was always painful, and hard to maintain

        • all of these FFs have different assumptions on atom names, connectivity, etc.

        • Amber community not even sure what GAFF means at this point, too; so questionable how much more resource to expend on this support

      • JW – how important to OpenFE is GAFF support?

        • JC – do your funders know which GAFF they want?

        • IA – need to just be able to say what we used previously; almost don’t care which version it is prospectively

      • JC – if all you need is “standard” import and export for OpenMM, is that sufficient?

        • anything else you need it to do?

        • JW – Interchange will be able to do “standard” openmm import for our defintition of “standard” - We can revisit/discuss limitations if we encounter them!

      • JW – don’t generally need new features, but do need new versions of Python supported, CI fixes, etc.

        • JC – OpenMMForceFields will remain maintained (CI green, python support) but won’t plan on major new features/FF additions. Peter Eastman should be responsible for this level of support. Can reiterate this to him.



  • DD : update on multi-gpu execution for single ProtocolDAG

    • DS, RG, and DD will work on establishing labels on Protocols and ProtocolUnits communicating to executor its desired capabilities

      • DS – Agree, additionally - this is written only talking about protocolunit and protocolrequest - There should also be communicaiton in the other directions, where a compute manager tells the server how many resources were actually granted. Additionally, let me know where to put user stories about what this should support,

      • IP – Just to reiterate, protein mutations are costly, so in IZ’s work we ran lambda windows in parallel on multiple GPUs. So that what we’d want to do here.

      • DD – We should open an issue for this.

      • DS – This would make sense in GUFE

      • DD – I’ll open a GUFE issue for this, and will post a broader call for user stories

    • DD and DS can work in parallel on alchemiscale and exorcist for making use of these labels to optimally allocate resources on compute

    •  

  • IP : Protein-ligand benchmarks working group update. Two weeks ago, we mentioned that we’re adding star/radial maps that HBaumann preferred, and we’ll be running FE calcs with them. Also I’d be uploading KTakaba’s manually-curated networks. KT’s networks are already there so I just need a review. In the review I’d like to check that we get the same result using the OPenFE protocol.

    • IA – That’s on HBaumann’s to-do list

    • IP – Gotcha. I think that’ll be interesting but it’s not a requirement for merge.

    • IA – Agree, this should be ready to approve.

    • DD – Is this contributing to a 0.3.0 release?

    • IP – Yes, this would get us over the line for our minimal release requirements.

    • DD – It’d be good to remove the other items from the 0.3.0 milestone/project board.

    •  

  • DD: which Github org should FAHComputeService be developed under?

    • JW – I’ll throw OpenFF’s hat in the ring - We already have alchemiscale and the F@H interface is mostly for our needs.

    • JS – Not sure that ASAP would want this, could be good to see where it goes.

    • IP – OpenFF would make the most sense. Alchemiscale is already there.

    • DD – I’ll proceed with putting this in OpenFF.

  •  

  •  

  • alchemiscale development : current sprint complete; next sprint runs 8/23 - 9/4

    • current aim is to complete release 0.2.0 by end of next sprint, including first round of documentation

    • architecture overview : PL Benchmarks on FAH - Architecture v6.drawio

    • coordination board : alchemiscale : Phase 2 - User Feedback and Documentation

    • alchemiscale 0.2.0 milestone:

      • DD – Most of this is docs improvements, aiming to get this out in early September.

    • review of Complete cards

      • DD – Alchemiscale 28 - User guide - Done, but let me know if folks have more input and we can keep iterating.

    • updates on In Review, In Progress, and Available cards

    • In review

      • IP – PLB 93 – Waiting on comparison studies by HB, mentioned in meeting notes above.

      • IA – Agree, once we have benchmarks this should be good.

      • IP – Perses 1066 – Noneq cycling – We had a meeting earlier, the idea is we’ll be releasing the 0.10.3 release, and then after that this will get merged into main, and it will be closed that way.

      • DD – Alchemisclae 30 – Docs stuff – Should be ready for review, MH could you take a look?

      • DD – Alchemiscale 132 – Upstreamed proposed solution to GUFE 215 - DS, no rush on this, just wanted to make sure you were aware.

        • DS – Thanks, yeah, I’m keeping track of this. It’s just missing tests.

    • In progress

      • DD – User story reviews - I’m running through ASAP ones right now. I know some of the onging work in ASAP discovery is working on this - in particular JHorton’s work.

      • DD – achemsicale 29 – I’ll be adding a tutorial to alchemiscale docs

        • IP – Does it make sense if I share a notebook with noneq cycling?

        • DD – That’d be great - Please send it over and I’ll sculpt it to fit.

      • IP – ticket: Test noneq protocol against repex protocol – Goal is to compare to targets other than TYK2. Still in progress

      • DD – OpenFF user story review. Will get to this, just doing ASAP first since they’re harder.

      • DD – I’m working on another docs one

      • DD – And API docs are weird, might need to reach out

        • MH – Happy to help if you have issues

      •  

Action items

@David Dotson will pursue PRP access for ASAP Discovery computation
@David Dotson will create anchor issue on gufe for multi-GPU execution proposal, user stories

Decisions