2024-08-13 alchemiscale : user group meeting notes

Participants

@David Dotson
Ian Kenney
@Irfan Alibay
Jenke Scheen
@Iván Pulido
@James Eastwood
Meghan Osato
@Mike Henry
@Jeffrey Wagner

Meeting recording: https://drive.google.com/file/d/19yDo_XstA2dt2YMrGFxR3cyryJq2KyZb/view?usp=sharing

Goals

alchemiscale.org
- user questions, issues, requests
- compute resources status
- current stack versions:
  - python 3.10
  - alchemiscale: 0.4.0
  - neo4j: 5.18
  - gufe: 0.9.5
  - openfe: 0.14.0
  - perses: protocol-neqcyc
  - openmmforcefields: 0.12.0
DD: alchemiscale v0.5.0 released!
- https://github.com/openforcefield/alchemiscale/releases/tag/v0.5.0
- includes:
  - openfe + gufe 1.0 compatibility
  - changes needed for Folding@Home support via alchemiscale-fah
  - feflow inclusion, drop of perses
- will be deployed on a new host, new database as api.alchemiscale.org with advance notice to users
  - current api.alchemiscale.org instance will be moved to api.legacy.alchemiscale.org, kept around for some time, but with no new compute provisioned
  - new stack versions:
    - python 3.12
    - alchemiscale: 0.5.0
    - neo4j: 5.22
    - gufe: 1.0.0
    - openfe: 1.0.1
    - feflow: 0.1.0
    - openmmforcefields: 0.14.1
    - openmm: 8.1.2
DD : propose changeover to new host on 2024.08.15 00:00:00 UTC
DD : alchemiscale-fah live test performed with FAH volunteers
- executed 190 ProtocolDAGs on FAH using FahNonEquilibriumCyclingProtocol on ['tyk2', 'mcl1', 'hif2a', 'shp2'] without additional human interaction
- working on performance bottleneck due to minimization in feflow and openmm-core
- could use guidance here:
JW – Thoughts for switching off from anaconda.org for env yaml hosting?

Discussion topics

Notes

Notes

alchemiscale.org
- user questions, issues, requests
  - JS – On the server side, are we logging as much as we can? OpenMM and OpenFE spit out a ton of useful logging info, I’m wondering if I can fetch that.
  - DD – We don’t capture standard out or logging from protocols - We neither capture nor preserve it. If we wanted to record this, we could, depending on how the protocol does logging, …
  - IA – Recently fielded a similar question. DSwenson put something in place that can be accessed via the CLI, but I’d need to check what it is. Might be centralized in GUFE
  - MH – We’ve got it set up to use the python standard library for logging, can pass in a config file and specify where the log file goes. So we could dump it to file and then vacuum it up as part of the results.
  - DD – That’s encouraging, gives us something to hook into. So that would be outside the protocoldagresult but would get pulled into the object store. So for a given protocoldagresult, you’d have a log that you could pull. Something like alchemiscaleclient.get() targeted in such a way that pulls the log.
  - JS – That’d be great, would help us debug systematic issues
  - IK – It seems like the logs would be outside alchemiscale - some protocols wouldn’t have captured outputs. So this would need to be outside.a PDR
  - DD – PDR contents are up to protocol authors...
  - IP – We had logging in NEQCyc, but at some point it stopped working and now we’re getting empty logs. Not sure if this is related, but could be related to use/non-use of common logging module
  - DD – Hm, sounds like there are multiple things to look into then.
  - …
  - IA – I do recall DSwenson working on this, might need to talk to him
  - MH – IIRC, DSwenson’s implementation is just a fieldin a config file that passes a path to the default logger in python
  - IA – There may have been some other details, I think this touches GUFETokenizable.
  - MH – https://github.com/OpenFreeEnergy/gufe/blob/d5fe97955a86c2098ccf50b4954387ee0ef05e23/gufe/tokenization.py#L98
  - DD – Yeah, I think there were a few more details.
  - IA – It may also be worth looking into WHAT gets logged. There are multiple things that could be wired up to get captured, not sure what all gets captured.
  - MH – …
  - …
  - IP – OpenMMTools also outputs some information that may be useful to capture.
  - IK – Any reason we can’t capture stdout and stderr and slurp that into the data with the PDR.
  - JS – I think some things don’t get sent to stdout and stderr.
  - IK – But that’s up to the protocol author.
  - DD – Yeah, if there’s some other log stream we can implement a hook into that.
  - JS – Sounds good, thanks.
  - (MH started logging issue for discussion https://github.com/OpenFreeEnergy/openfe/issues/919)
- compute resources status
  - DD – Lots of compute available.
  - IA – How does current workload interact with host changeover?
  - DD – Once we do the changeover, please don’t submit to old host, but I can keep workers finishing those jobs. I don’t have a mechanism to shut off user submissions but I’d like folks to not submit any more.
- current stack versions:
  - python 3.10
  - alchemiscale: 0.4.0
  - neo4j: 5.18
  - gufe: 0.9.5
  - openfe: 0.14.0
  - perses: protocol-neqcyc
  - openmmforcefields: 0.12.0
DD: alchemiscale v0.5.0 released!
- Release Release v0.5.0 · OpenFreeEnergy/alchemiscale
- includes:
  - openfe + gufe 1.0 compatibility
  - changes needed for Folding@Home support via alchemiscale-fah
    - DD – Finished a round of testing with volunteers.
  - feflow inclusion, drop of perses
- will be deployed on a new host, new database as api.alchemiscale.org with advance notice to users
  - current api.alchemiscale.org instance will be moved to api.legacy.alchemiscale.org, kept around for some time, but with no new compute provisioned
  - new stack versions:
    - python 3.12
    - alchemiscale: 0.5.0
    - neo4j: 5.22
    - gufe: 1.0.0
    - openfe: 1.0.1
    - feflow: 0.1.0
    - openmmforcefields: 0.14.1
    - openmm: 8.1.2
  - JW – I think OE isn’t available for py3.12?
    - IA – Do folks need OE?
    - MH+IP – OE does now have py312 pacakages available.
    - JW – My mistake.
    - (General) – and people often do charge generation beforehand and submit user charges
DD : propose changeover to new host on 2024.08.15 00:00:00 UTC
- (General) – Sounds great.
DD : alchemiscale-fah live test performed with FAH volunteers
- executed 190 ProtocolDAGs on FAH using FahNonEquilibriumCyclingProtocol on ['tyk2', 'mcl1', 'hif2a', 'shp2'] without additional human interaction.
  - DD – Big thanks to MH, IP, HB for the help with this! Next step is to compare to other compute sources to ensure results are sane.
  - IA – JS, could you do the same settings as JHorton is running?
  - JS – Oh, slightly different conversation. It may be too late to switch that. For context:
  - IA – If the FF is at least 2.2, we can do back to back hrex.
  - DD – This test series is to test the compute services. We’ve discovered some performance bottlenecks and got feedback about the length of work units. So we’ll be doing more cycles and bigger testing.
- working on performance bottleneck due to minimization in feflow and openmm-core
- could use guidance here:
- IP – We don’t generally see improvements with longer runs in most cases. But maybe with large numbers of atoms changing or charge changes it can be better.
- IA – Paper from Coveney lab says that for some transformations longer sims (up to 250ps) improves things. I’d think that solvent will converge quick.
- DD – I think the volunteers reporting fast work units may have been reporting on the solvent ones. I’ll ask them which WUs were longest.
- IP – Is it possible for alchemiscale-fah to decide platform? Maybe we make the solvent phases run on CPU.
- DD – I don’t think we expose a FAH core setting that allows you to specify CUDA/OpenCl/CPU.
- JW – Is this population representative? The volunteers might be power users with very highi-end video cards.
- DD – Unknown, but that’s a reasonable guess. Though F@H contributors in general are generally using more high-end machines.
- IA – If this is actually an infrastructure problem, it probably doesn’t HURT to make it longer. The results won’t get WORSE
- IP – Though do recall that we may need to chagne that for the solvent independent of the complex. And currently those share the same value.
- DD – In the GUFE model, those are already distinct.
- IA – Longer term, maybe the answer is to do both units in the same DAG? Eg, for AHFEs, the protocoldag runs the solvent+vacuum leg together. Our aim for RFE is to… So if we changed it to run both cycles in the same work unit, would that help?
- DD – Not quite, the reason we have each cycle unit mapto a work unit, is because each one has a openmm system and integrator. This lets the core just pass systems and integrators and the workers just blindly run them. We COULD consider doing this but would require changes to the F@H core. But HMO is probably the only one with the necessary skillset for this.
- DD – IP, I know we have num cycles as part of the DAG. Would it be possible to set the number of “cycles within a cycle”? Each one right now does eq-noneq-eq-noneq. Would there be a way to do more?
- IP – The change in the code would just be running it in a loop. It creates some problems with getting the results, you’d need to granularize the results for each. So if this is desperately needed it could be done, …
- …
- IA – Yeah, if we WANT to do this, it should be a GUFE-level thing
- …
JW – Thoughts for switching off from http://anaconda.org for env yaml hosting?
- DD – I’ve already switched off this, hosting them on github now.

Action items

@David Dotson will articulate an issue on alchemiscale for how we will capture logs emitted by Protocols, expose these to users

tag Jenke Scheen

@Iván Pulido will create an issue for capturing logs in the feflow NonEquilibriumCyclingProtocol using the gufe logging system

@David Dotson will propose in feflow the ability to have multiple cycles within a single CycleUnit

2024-08-13 alchemiscale : user group meeting notes

Participants

Goals

Discussion topics

Action items

Decisions