/
2022-05-24 Protein-ligand benchmarks meeting notes

2022-05-24 Protein-ligand benchmarks meeting notes

dParticipants

  • @David Dotson

  • @Richard Gowers

  • @Diego Nolasco (Deactivated)

  • @John Chodera

  • David Swenson

  • @Lorenzo D'Amore

  • @Irfan Alibay

  • @Mike Henry

  • @Iván Pulido

  • @Jeffrey Wagner

Goals

  • DD : protein-ligand-benchmark - 0.3.0 milestone review, issue assistance, rebalance

  • DN : timescale/deadlines for OpenFF user stories

  • JC : timescale/deadlines for ASAP user stories

  • DD : gufe / fah-alchemy status update

  • JC : alchemical protocols supported by openmm-core currently, limitations

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

0.3.0 milestone review

@David Dotson

  • 0.3.0 milestone review, issue assistance, rebalance

    • JC – Reviewing PDB files with Melissa this afternoon

    • DD – Re: questions about PLB #35 - “additional PR” - Would this be the PR to add 2 more systems (ROS1 and PDE10)?

      • JC – Yes, I’d like to add those structures and then make a new release. What’s the status of the file addition? I’d like to get those merged, cut a release, and then have Melissa process those.

      • LD – Those are the only two targets I know about, we can make a release after those are added.

      • DD – I’ve merged that PR. Now we can cut 0.2.1 release.

      • DD – What’s left for PR 52?

        • JC – I’m not sure, touching base with Melissa this afternoon. I’ll need eyes on the protonation states and other structure things. We’re using schrodinger protein prep. LD, could you let us know which PDB files these structures came from? We’ll also need to update the table to include these targets.

        • JC and Melissa will process these new structures as well in PR 52.

        • LD will check whether there are additional structure that DHahn intends to add.

        • LD will fill in target.yaml metadata (especially PDB field, and as many others as possible) for ROS1 and PDE10 and add new targets to table

          • LD – David H. just told me that for one of the two targets there's no public PDB available, how should we proceed then? Leaving it apart? It’s attached to a publication.

          • JC – If there’s no published PDB in the protein data bank, should we continue to consider this target?

          • Decision - We’ll drop the ROS1 target

    • IP – For the new targets, are we making TWO entries (one “raw” PDB, and another “prepared” PDB?)

      • JC – No, after the 0.2.1 release we’re going to cut each target down to one single prepared protein PDB.

      • JW – Agree

  • 0.2.1 release for protein-ligand-benchmark#45: https://github.com/openforcefield/protein-ligand-benchmark/pull/45

  • IA – Issue #40 - DHahn linked his existing data. What should we do with it?

    • DD – I think it’s a historical artifact, so we shouldn’t try to modify it.

    • JC – I’d like to keep it as a static artifact, but once we have an object model specific we can convert it to that once (rather than keeping an adapter running to read it from its old form)

    • DD will walk through DHahn’s repo and study structure/contents.

timescale/deadlines for OpenFF user stories

@Diego Nolasco (Deactivated)

  • DN - At our first F@H meeting, we expressed a desire to have a working “green box” (from the architecture diagram) that could support a proof-of-concept in 3ish months. OpenFE needs a working toolkit by August. So I’d like to start planning backwards from a deadline (or several component deadlines) to understand when we need to have things done.

    • I spoke to OpenFE about how to avoid duplication of effort. We came up with a plan of staged deployments.

    • So I’d like to propose a deadline for the collection/inclusion/exclusion of user stories.

  • DD – I was hoping to understand OpenFF’s timescales here - When will Chapin need the automation running?

    • DN – CC is on track to be submitting jobs in October.

    • DD – Understood, this means we’ll want to be submitting jobs in August so we can start shaking the bugs out.

  • DN – OpenFE is a bit more urgent, needs to have something running by August.

  • DD – Does OpenFE require this system to be deployed in August? Or is there a subset of our goals that could be met?

    • DN – I think OpenFE really only needs a subset of these goals - The common components will be important but they may not be submitting to F@H.

    • DD – RG and IA, is there a timescale where this needs to be ready?

      • RG – No

    • JC – My guess for the OpenFE folks is that the PL benchmarks repo needs to be remediated. So they should be able to load all of those up, even if they don’t run on F@H.

    • JW – I think there are three areas…

    • RG – I’m not sure that the results storage will be the same for the F@H project and for non-F@H.

    • JW – Will this cause problems in the future? It seems like either duplication of effort or incompatible object design

    • DS – Final formats need to fit into alchemlyb and arsenic. We don’t need to use the same object model before, so long as we can convert to the correct input right before it goes into alchemlyb and arsenic.

    • JC – I think we shouldn’t take the current alchemlyb/arsenic input formats as being set in stone, but I do agree that the input formats would be a good standard.

    • DS – I think this is the right target for the component sharing. We could end up sharing earlier components if things shake out well, but that’s not a requirement for our design.

  • RG – FYI - We’re working on further development of Arsenic with Edinburgh crew (Toni and friends)

    •  

  • IA (chat) – re initial "benchmark of if the OpenFE components work" - we'll be doing an initial benchmark on a subset of the protein-ligand benchmark systems (probably 3-4, depends on resources / time). It will be ~ end of June / July time. At this point we don't need to have all the components finished, we just need to create all the mulstatereporter NC files (with the hope that we can read them in the future).

    This should be ~ 75-100 edges in triplicate w/ Sage + FF14SB. The aim of this benchmark is just to check that the engine side of things is stable now (and that we didn't mess things up versus perses).

    If things aren't standardised honestly it's not the end of the world in my opinion.

timescale/deadlines for ASAP user stories

@John Chodera

  • JC – “Thursday”, though we can limp along with the current infrastructure. Though we won’t be at full capacity until the end of the year.

    • DD – Is there a particular time we can target, or some sort of a “red line” that can be articulated?

    • JC – Good question. Right now we’re just dealing with mPro, which can be handled using current infrastructure. But as we go to other targets we’ll need to write more one-off scripts, and I’d prefer to avoid this. “Lead optimization” is another project and that won’t ramp up until later this year.

    • DD – So would it be fair to target November? (understanding that sooner is better)

      • JC – Yes

  • DN – Are we done collecting user stories?

    • DD – My understanding is “yes”.

    • DN – Are they converted to a prioritized list of items on a backlog?

    • DD – Not yet. I’ll do this and prioritize. But many user stories don’t map to a single feature - They share a lot of features.

    • DD – Hard to build a backlog before we know the strategy.

gufe / fah-alchemy status update

@David Dotson

  • DD – OpenFE and I have been working on GUFE #13. That’s pretty close to merging. So overall, fah-alchemy is close to using OpenFE/GUFE components. This will make it easier to make the backlog.

  • JC – Will we be able to start using these components in two weeks? That’s when we’ll be doing a development push for perses.

    • DD – I think so. OpenFE is looking to make their first alchemical protocol/DAG around then.

    • RG – That sounds reasonable. For the protein component, we eventually need the OFFTK protein representation. Right now we’re using the OpenMM PDBFile representation but would prefer to use the openFF repersentation ultimately.

    • JW – will contact you after this call to get handoff on reader

alchemical protocols supported by openmm-core currently, limitations

@John Chodera

  • DD – JC, what are the current alchemical protocols supported by openmm core? Was thinking about deduplication in the scheduler. Right now I’m thinking of nonequilibrium switching. How is that handled in the core?

    • JC – Right now the openmm core only support XML inputs for integfrator system and state. So anything you can code as thlambdas or nonequilbirium sampling). So the output from one cycle can eb the input to the next. We can do more complex things like replica exchange but those will neded to be implemented in c++. My top picks for next implementations are replica exchange and (something) sampling.

    • JC – Dev timescale to add new things is about two weeks - 1 week for implementation and 1 week for testing.

    •  

    • currently:

      • independent lambdas

      • nonequilibrium cycling

    • there are other protocols I’d like to support:

      • times square sampling

      • repex

      • self-adjusted mixture sampling

  • DD – So withing OpenMM core, we’re running….

  • JC – Right, we clip out those cycles from existing trajectories. So we take out nonequilibrium parts from a larger trajectory.

  •  

  •  

Shared components

 

  • JW – question on shared components: is it the case that we want to put as much in gufe as possible?

    • RG – I see it going the other way - That fah-alchemy will reuse OpenFE/gufe stuff as much as possible.

    • DD – I'm thinking that fah-alchemy will have a dep on OpenFE.

  • JW – API stability for OpenFE and fah-alchemy components?

    • RG – August will be the 1.0, that will be stable, then will do a API break for 2.0 after about a year.

  •  

  •  

Action items

Decisions

Related pages