DD – Many improvements for ASAP - Things should be much faster now.
DD – Also adds network state - If set to inactive, the network/tasks won’t be returned when some API calls are made.
DD – Various other improvements that mean central server load is now way down. This is thanks to IK
DD – Upgraded to neo4j v5 and the new python-neo4j driver. Many many improvements from that upgrade as well.
JS – Nice work, I notived the difference almost immediately when the update is rolled out.
JS – JH asked re: living networks whether we’ll be able to add edges to an existing network.
DD – Fundamental to GUFE is that we can’t add edges to existing networks. Instead you can make a NEW network with a superset of the edges of the original one. If submitted to the same scope, the already-existing edges will be deduplicated.
JS – For the superset functionality, is that something that you’ll provide or that we should plan to build?
DD – I acknowledge that the retrieval of a big network is slow (~20 mins) and that submission also takes a while. But yeah, you’d do this client-side - Retrieve an existing network, make a superset of it locally, and then submit it.
(JS, BRies, and DD will have a session to whiteboard out how network planning with multiple target structures would look)
user questions / issues / feature requests
compute resources status
DD – Using as much of NRP as we can for ASAP-public compute (about 150 GPUs). Smaller number of private jobs running on Iris and Lilab.
(DD writes API query to see where jobs are running, see recording ~16 mins)
(34 minutes in ) JC – Has anyone experimented with fragmenting A100s (having them running multiple jobs)? Since each process is only using 8GB there should be room for several sims. This could be a way to get more mileage form lilac
IA – I asked MH to look into this but haven’t heard back.
DD – Last year I tried submitting multiple jobs to a single GPU. Would this be different?
JC – A100s have additional support for partitioning into logical sub-blocks. But this requires an admin to set this up. This might be beneficial given the current lack of GPUs.
DD – Do you think they’d be interested in doing this?
JC – Since lilac is being slowly dismantled, it would be good for them to know that people can do useful things with larger numbers of smaller GPUs. Otherwise we kinda have a risk of underutilizing powerful GPUs.
DD – Last year we saw that OpenFE’s repex protocol saturated GPUs quite well. Though I can’t recall if that was with A100s. So it’d be good to start a conversation with MSK HPC about how our utilization looks and whether they’d be interested in trying this.
JC – Sure. I’ll start this conversation.
DD – Cool, and to the scheduler this would just look like several small GPUS and we wouldn’t need to do anything special?
JC – Yes, I think that’s how it works.
IA – you might get 100% GPU util on paper but still get better throughput by using a smaller slice of the GPU - we had this a lot with gmx in the past
current stack versions:
alchemiscale: 0.4.0
neo4j: 5.18
gufe: 0.9.5
openfe: 0.14.0
perses: protocol-neqcyc
openmmforcefields: 0.12.0
JS – Register for alchemiscale
JC – @Jenke Scheen : “Register” = go to payment link?