2024-04-09 alchemiscale : user group meeting notes

Participants

Recording: https://us02web.zoom.us/rec/share/IYJmsCjIIB-G5zTwz1LHwxiC7JcIYqPyeYvByCs-chpgxExELVJMnIoAu9UtbhUx.s16itbJ-84E5Mh08?startTime=1712678565000

Goals

alchemiscale.org
- DD : alchemiscale 0.4.0 released and deployed Friday 2024.04.05: https://github.com/openforcefield/alchemiscale/releases/tag/v0.4.0
- user questions / issues / feature requests
- compute resources status
- current stack versions:
  - alchemiscale: 0.4.0
  - neo4j: 5.18
  - gufe: 0.9.5
  - openfe: 0.14.0
  - perses: protocol-neqcyc
  - openmmforcefields: 0.12.0
JS : networks with set higher priority don’t seem to be actioned more than networks with lower priority
JC : ways to split a single GPU into multiple virtual GPUs on Lilac + Iris?
- 7 MIG GPUs possible with A100s
  - more less powerful GPUs

Discussion topics

Notes

alchemiscale.org
- DD : alchemiscale 0.4.0 released and deployed Friday 2024.04.05: https://github.com/openforcefield/alchemiscale/releases/tag/v0.4.0
  - DD – Many improvements for ASAP - Things should be much faster now.
  - DD – Also adds network state - If set to inactive, the network/tasks won’t be returned when some API calls are made.
  - DD – Various other improvements that mean central server load is now way down. This is thanks to IK
  - DD – Upgraded to neo4j v5 and the new python-neo4j driver. Many many improvements from that upgrade as well.
  - JS – Nice work, I notived the difference almost immediately when the update is rolled out.
  - JS – JH asked re: living networks whether we’ll be able to add edges to an existing network.
    - DD – Fundamental to GUFE is that we can’t add edges to existing networks. Instead you can make a NEW network with a superset of the edges of the original one. If submitted to the same scope, the already-existing edges will be deduplicated.
    - JS – For the superset functionality, is that something that you’ll provide or that we should plan to build?
    - DD – I acknowledge that the retrieval of a big network is slow (~20 mins) and that submission also takes a while. But yeah, you’d do this client-side - Retrieve an existing network, make a superset of it locally, and then submit it.
    - (JS, BRies, and DD will have a session to whiteboard out how network planning with multiple target structures would look)
- user questions / issues / feature requests
- compute resources status
  - DD – Using as much of NRP as we can for ASAP-public compute (about 150 GPUs). Smaller number of private jobs running on Iris and Lilab.
  - (DD writes API query to see where jobs are running, see recording ~16 mins)
  - (34 minutes in ) JC – Has anyone experimented with fragmenting A100s (having them running multiple jobs)? Since each process is only using 8GB there should be room for several sims. This could be a way to get more mileage form lilac
    - IA – I asked MH to look into this but haven’t heard back.
    - DD – Last year I tried submitting multiple jobs to a single GPU. Would this be different?
    - JC – A100s have additional support for partitioning into logical sub-blocks. But this requires an admin to set this up. This might be beneficial given the current lack of GPUs.
    - DD – Do you think they’d be interested in doing this?
    - JC – Since lilac is being slowly dismantled, it would be good for them to know that people can do useful things with larger numbers of smaller GPUs. Otherwise we kinda have a risk of underutilizing powerful GPUs.
    - DD – Last year we saw that OpenFE’s repex protocol saturated GPUs quite well. Though I can’t recall if that was with A100s. So it’d be good to start a conversation with MSK HPC about how our utilization looks and whether they’d be interested in trying this.
    - JC – Sure. I’ll start this conversation.
    - DD – Cool, and to the scheduler this would just look like several small GPUS and we wouldn’t need to do anything special?
    - JC – Yes, I think that’s how it works.
    - IA – you might get 100% GPU util on paper but still get better throughput by using a smaller slice of the GPU - we had this a lot with gmx in the past
- current stack versions:
  - alchemiscale: 0.4.0
  - neo4j: 5.18
  - gufe: 0.9.5
  - openfe: 0.14.0
  - perses: protocol-neqcyc
  - openmmforcefields: 0.12.0
JS – Register for alchemiscale
- JC – @Jenke Scheen : “Register” = go to payment link?
- JS – https://cbd.eventsair.com/alchemical-free-energy-workshop-2024/registration/Site/Register
- JC – @Jenke Scheen : Just did this but…how do we get a payment receipt?
- JC – It didn’t show a receipt or ask for an email
- JS – did you create a user account with your email? that email should receive the confirmation
JS : networks with set higher priority don’t seem to be actioned more than networks with lower priority

Action items

David Dotson will schedule a whiteboarding session with Jenke, Josh, Irfan, and Ben for network planning with multiple protein target structures

Participants

Goals

Discussion topics

Action items

Decisions

0 Comments