changes needed for Folding@Home support via alchemiscale-fah
feflow inclusion, drop of perses
will be deployed on a new host, new database as api.alchemiscale.org with advance notice to users
current api.alchemiscale.org instance will be moved to api.legacy.alchemiscale.org, kept around for some time, but with no new compute provisioned
new stack versions:
python 3.12
alchemiscale: 0.5.0
neo4j: 5.22
gufe: 1.0.0
openfe: 1.0.1
feflow: 0.1.0
openmmforcefields: 0.14.1
openmm: 8.1.2
DD : propose changeover to new host on 2024.08.15 00:00:00 UTC
DD : alchemiscale-fah live test performed with FAH volunteers
executed 190 ProtocolDAGs on FAH using FahNonEquilibriumCyclingProtocol on ['tyk2', 'mcl1', 'hif2a', 'shp2'] without additional human interaction
working on performance bottleneck due to minimization in feflow and openmm-core
could use guidance here:
JW โ Thoughts for switching off from anaconda.org for env yaml hosting?
Discussion topics
Notes
Notes
alchemiscale.org
user questions, issues, requests
JS โ On the server side, are we logging as much as we can? OpenMM and OpenFE spit out a ton of useful logging info, Iโm wondering if I can fetch that.
DD โ We donโt capture standard out or logging from protocols - We neither capture nor preserve it. If we wanted to record this, we could, depending on how the protocol does logging, โฆ
IA โ Recently fielded a similar question. DSwenson put something in place that can be accessed via the CLI, but Iโd need to check what it is. Might be centralized in GUFE
MH โ Weโve got it set up to use the python standard library for logging, can pass in a config file and specify where the log file goes. So we could dump it to file and then vacuum it up as part of the results.
DD โ Thatโs encouraging, gives us something to hook into. So that would be outside the protocoldagresult but would get pulled into the object store. So for a given protocoldagresult, youโd have a log that you could pull. Something like alchemiscaleclient.get() targeted in such a way that pulls the log.
JS โ Thatโd be great, would help us debug systematic issues
IK โ It seems like the logs would be outside alchemiscale - some protocols wouldnโt have captured outputs. So this would need to be outside.a PDR
DD โ PDR contents are up to protocol authors...
IP โ We had logging in NEQCyc, but at some point it stopped working and now weโre getting empty logs. Not sure if this is related, but could be related to use/non-use of common logging module
DD โ Hm, sounds like there are multiple things to look into then.
โฆ
IA โ I do recall DSwenson working on this, might need to talk to him
MH โ IIRC, DSwensonโs implementation is just a fieldin a config file that passes a path to the default logger in python
IA โ There may have been some other details, I think this touches GUFETokenizable.
DD โ Yeah, I think there were a few more details.
IA โ It may also be worth looking into WHAT gets logged. There are multiple things that could be wired up to get captured, not sure what all gets captured.
MH โ โฆ
โฆ
IP โ OpenMMTools also outputs some information that may be useful to capture.
IK โ Any reason we canโt capture stdout and stderr and slurp that into the data with the PDR.
JS โ I think some things donโt get sent to stdout and stderr.
IK โ But thatโs up to the protocol author.
DD โ Yeah, if thereโs some other log stream we can implement a hook into that.
IA โ How does current workload interact with host changeover?
DD โ Once we do the changeover, please donโt submit to old host, but I can keep workers finishing those jobs. I donโt have a mechanism to shut off user submissions but Iโd like folks to not submit any more.
changes needed for Folding@Home support via alchemiscale-fah
DD โ Finished a round of testing with volunteers.
feflow inclusion, drop of perses
will be deployed on a new host, new database as api.alchemiscale.org with advance notice to users
current api.alchemiscale.org instance will be moved to api.legacy.alchemiscale.org, kept around for some time, but with no new compute provisioned
new stack versions:
python 3.12
alchemiscale: 0.5.0
neo4j: 5.22
gufe: 1.0.0
openfe: 1.0.1
feflow: 0.1.0
openmmforcefields: 0.14.1
openmm: 8.1.2
JW โ I think OE isnโt available for py3.12?
IA โ Do folks need OE?
MH+IP โ OE does now have py312 pacakages available.
JW โ My mistake.
(General) โ and people often do charge generation beforehand and submit user charges
DD : propose changeover to new host on 2024.08.15 00:00:00 UTC
(General) โ Sounds great.
DD : alchemiscale-fah live test performed with FAH volunteers
executed 190 ProtocolDAGs on FAH using FahNonEquilibriumCyclingProtocol on ['tyk2', 'mcl1', 'hif2a', 'shp2'] without additional human interaction.
DD โ Big thanks to MH, IP, HB for the help with this! Next step is to compare to other compute sources to ensure results are sane.
IA โ JS, could you do the same settings as JHorton is running?
JS โ Oh, slightly different conversation. It may be too late to switch that. For context:
IA โ If the FF is at least 2.2, we can do back to back hrex.
DD โ This test series is to test the compute services. Weโve discovered some performance bottlenecks and got feedback about the length of work units. So weโll be doing more cycles and bigger testing.
ย
working on performance bottleneck due to minimization in feflow and openmm-core
could use guidance here:
IP โ We donโt generally see improvements with longer runs in most cases. But maybe with large numbers of atoms changing or charge changes it can be better.
IA โ Paper from Coveney lab says that for some transformations longer sims (up to 250ps) improves things. Iโd think that solvent will converge quick.
DD โ I think the volunteers reporting fast work units may have been reporting on the solvent ones. Iโll ask them which WUs were longest.
IP โ Is it possible for alchemiscale-fah to decide platform? Maybe we make the solvent phases run on CPU.
DD โ I donโt think we expose a FAH core setting that allows you to specify CUDA/OpenCl/CPU.
JW โ Is this population representative? The volunteers might be power users with very highi-end video cards.
DD โ Unknown, but thatโs a reasonable guess. Though F@H contributors in general are generally using more high-end machines.
IA โ If this is actually an infrastructure problem, it probably doesnโt HURT to make it longer. The results wonโt get WORSE
IP โ Though do recall that we may need to chagne that for the solvent independent of the complex. And currently those share the same value.
DD โ In the GUFE model, those are already distinct.
IA โ Longer term, maybe the answer is to do both units in the same DAG? Eg, for AHFEs, the protocoldag runs the solvent+vacuum leg together. Our aim for RFE is toโฆ So if we changed it to run both cycles in the same work unit, would that help?
DD โ Not quite, the reason we have each cycle unit mapto a work unit, is because each one has a openmm system and integrator. This lets the core just pass systems and integrators and the workers just blindly run them. We COULD consider doing this but would require changes to the F@H core. But HMO is probably the only one with the necessary skillset for this.
DD โ IP, I know we have num cycles as part of the DAG. Would it be possible to set the number of โcycles within a cycleโ? Each one right now does eq-noneq-eq-noneq. Would there be a way to do more?
IP โ The change in the code would just be running it in a loop. It creates some problems with getting the results, youโd need to granularize the results for each. So if this is desperately needed it could be done, โฆ
โฆ
IA โ Yeah, if we WANT to do this, it should be a GUFE-level thing
โฆ
ย
JW โ Thoughts for switching off from http://anaconda.org for env yaml hosting?
DD โ Iโve already switched off this, hosting them on github now.
ย
Action items
@David Dotson will articulate an issue on alchemiscale for how we will capture logs emitted by Protocols, expose these to users
tag Jenke Scheen
@Ivรกn Pulido will create an issue for capturing logs in the feflowNonEquilibriumCyclingProtocol using the gufe logging system
@David Dotson will propose in feflow the ability to have multiple cycles within a single CycleUnit