Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Participants

Goals

  • alchemiscale.org

  • JS : networks with set higher priority don’t seem to be actioned more than networks with lower priority

  • JC : ways to split a single GPU into multiple virtual GPUs on Lilac + Iris?

    • 7 MIG GPUs possible with A100s

      • more less powerful GPUs

Discussion topics

Notes

  • alchemiscale.org

    • DD : alchemiscale 0.4.0 released and deployed Friday 2024.04.05: https://github.com/openforcefield/alchemiscale/releases/tag/v0.4.0

      • DD – Many improvements for ASAP - Things should be much faster now.

      • DD – Also adds network state - If set to inactive, the network/tasks won’t be returned when some API calls are made.

      • DD – Various other improvements that mean central server load is now way down. This is thanks to IK

      • DD – Upgraded to neo4j v5 and the new python-neo4j driver. Many many improvements from that upgrade as well.

      • JS – Nice work, I notived the difference almost immediately when the update is rolled out.

      • JS – JH asked re: living networks whether we’ll be able to add edges to an existing network.

        • DD – Fundamental to GUFE is that we can’t add edges to existing networks. Instead you can make a NEW network with a superset of the edges of the original one. If submitted to the same scope, the already-existing edges will be deduplicated.

        • JS – For the superset functionality, is that something that you’ll provide or that we should plan to build?

        • DD – I acknowledge that the retrieval of a big network is slow (~20 mins) and that submission also takes a while. But yeah, you’d do this client-side - Retrieve an existing network, make a superset of it locally, and then submit it.

        • (JS, BRies, and DD will have a session to whiteboard out how network planning with multiple target structures would look)

    • user questions / issues / feature requests

    • compute resources status

      • DD – Using as much of NRP as we can for ASAP-public compute (about 150 GPUs). Smaller number of private jobs running on Iris and Lilab.

      • (DD writes API query to see where jobs are running, see recording ~16 mins)

      • (34 minutes in ) JC – Has anyone experimented with fragmenting A100s (having them running multiple jobs)? Since each process is only using 8GB there should be room for several sims. This could be a way to get more mileage form lilac

        • IA – I asked MH to look into this but haven’t heard back.

        • DD – Last year I tried submitting multiple jobs to a single GPU. Would this be different?

        • JC – A100s have additional support for partitioning into logical sub-blocks. But this requires an admin to set this up. This might be beneficial given the current lack of GPUs.

        • DD – Do you think they’d be interested in doing this?

        • JC – Since lilac is being slowly dismantled, it would be good for them to know that people can do useful things with larger numbers of smaller GPUs. Otherwise we kinda have a risk of underutilizing powerful GPUs.

        • DD – Last year we saw that OpenFE’s repex protocol saturated GPUs quite well. Though I can’t recall if that was with A100s. So it’d be good to start a conversation with MSK HPC about how our utilization looks and whether they’d be interested in trying this.

        • JC – Sure. I’ll start this conversation.

        • DD – Cool, and to the scheduler this would just look like several small GPUS and we wouldn’t need to do anything special?

        • JC – Yes, I think that’s how it works.

        • IA – you might get 100% GPU util on paper but still get better throughput by using a smaller slice of the GPU - we had this a lot with gmx in the past

    • current stack versions:

      • alchemiscale: 0.4.0

      • neo4j: 5.18

      • gufe: 0.9.5

      • openfe: 0.14.0

      • perses: protocol-neqcyc

      • openmmforcefields: 0.12.0

  • JS – Register for alchemiscale

  • JS : networks with set higher priority don’t seem to be actioned more than networks with lower priority


Action items

  • David Dotson will schedule a whiteboarding session with Jenke, Josh, Irfan, and Ben for network planning with multiple protein target structures

Decisions

  • No labels