2022-05-03 Protein-ligand benchmarks meeting notes

Participants

@Irfan Alibay
@David Dotson
@Jeffrey Wagner
@Iván Pulido
@Lorenzo D'Amore
@Diego Nolasco (Deactivated)
@Mike Henry
Antonia Mey
@Richard Gowers (late)
@John Chodera (late)

Goals

DD : protein-ligand-benchmark - 0.3.0 milestone review, issue assistance, rebalance
JC : OpenFE Settings Taxonomy - seeking feedback
- https://docs.google.com/document/d/10avT6qoE5tXjoCBvYDpP9vpI93FvhmgcAI9-0f-TElc/edit?usp=sharing
DD : Protocol execution architecture - current state
DS : Data model for results storage
- https://docs.google.com/document/d/1NOv01BuTM_Sw_VIc5pipRNOT2g4Tre2dGtZ7-ysy3UI/edit?usp=sharing
DD : analysis package inventory - establish current state, where effort may be effectively-directed
- openff-arsenic :
- alchemlyb :
- perses :

Discussion topics

Item	Presenter	Notes

Item	Presenter	Notes
0.3.0 protein ligand benchmark milestone	@David Dotson	DD – Is anyone stuck on their PR or issue and want assistance? IP – I added a citation file, but I’m not sure who is supposed to be there. We have paper authors and repo contributors. AM – In theory, the paper authors and the benchmark data set authors were meant to be the same. So now it would depend on the specific contributions. I think anyone who contributed to the library should be at least acknowledged. Also, I’d be happy to review PRs if I’m tagged, just keep me in the loop if this is desired. DD – AM, could you operate as the reviewer for this PR? AM – Yes, I’ll be the reviewer for the citation file PR DD – Probably best to start with folks who are on the LiveCOMS paper and expand from there. AM – My understanding is that the LiveCOMS is about to be resubmitted with updates based on referee feedback. And then we’ll have a release corresponding to version 1 of the liveCOMS paper. The goal that we laid out is that every system should be runnable. DD – Would the 0.2.0 release map to the LiveCOMS paper? AM – No, the next release (likely 0.3.0) would map to the paper. This would include resolution of the comments from the paper. DD – For the 0.3.0 release we’d gathered input that would handle some low-hanging fruit for the F@H project. Are there further changes needed to address reviewer comments? AM – I have a few more fixes to make to address reviewer comments. JW – What major tasks remain for paper? AM – The one major thing is the overlap between this and another paper I’m coauthoring. But the contents are OK. IA – Setting a 4-week deadline will depend on JChodera’s input. That because he’s tasked with fixing the poses. AM – I met with the PhD student that was tasked with doing this. The student reported that they’ll have this done in the next week. (DD asked JC for deadline about structure remediation on GH issue) Because JC isn’t here to approve a deadline, DN will make a meeting agenda for next week, populated with the decision to put a deadline on the 0.3.0 release. DD – PR 45 - What’s the checklist for adding targets? DHahn contributed these so I trust their judgement, but I’m not sure AM – Ideally there wold be a script that does a run-test with an energy eval/minimization. There could also be some visual checking, like a script that opens 3D sessions. One issue we’d had before is that there were so many clashes that we couldn’t run much. So a clash check would be the big one. Could say “this must minimize with three target engines” IA – Having things be able to pass through `openmm.app.PDBFile` is a big quality check. Also simple checks for long bonds or close contacts (I wrote something like this for MDAnalysis the other day). DD – Also, the spruce protein prep that JC’s grad student is doing will likely be needed AM – Spruce wasn’t actually running correctly on everything, so JC’s grad student may use Schrodinger for some structures.
OpenFE settings taxonomy		https://docs.google.com/document/d/10avT6qoE5tXjoCBvYDpP9vpI93FvhmgcAI9-0f-TElc/edit?usp=sharing DD – Any comments on current state? AM – I had a quick chat with JC about making things interoperable, so you can drag in an drop out different engines. I’ve been involved in BioSimSpace development, and I can point you toward our solutions for interoperability/plug and play issue. I think it had limited adoption becuase everything was based on SireMol. This is a Process object, you can subclass from it to interface with different engines. We cold bring in Lester Hedges to talk about the details of these files. DD – That’d be fantastic. I haven’t yet looked closely at the BioSimSpace implementation. AM – I’ll make an email introduction between LHedges, RGowers, and DDotson. JW – did raise a comment on system representation; wasn’t sure if it was the best place for it IA – How did you decide what went into XML files (like, things that would affect the energy of a system)? MH – important that we highlight which settings impact the energy AM – I’m not sure that we ever did single point energy comparison at BioSimSpace. We mostly compared at the point of the resulting free energies. There’s a paper that compared solvation free energies. JW – for OpenFF, SMIRNOFF format includes things that affect the single-point energies, whereas other FFs may not MH – want to IA – Asking Matt Thompson to compare items in Level I to his understanding of what impacts single-point calculations IA – Action Item: will organize a working session with Matt Thompson to hammer on Level I in the taxonomy specifically; objective is to identify missing items in Level I and decide where ambiguous settings (e.g. things that could go into Level I or Level III) should go (least bad option) JW – I’m maybe a little concerned that the subsections in this document are “totally modular”, but the result of a step in like “add in bulk solvent” in the “chemical composition” section will be intertwined with a choice like “water model” in the “force field” section AM – Water model will have an effect on single-point energies, protonation states too JW – I’m the naivest person here, if folks who have more experience don’t see this as a risk then I think we’re in good shape.
Protocol execution architecture	@David Dotson	DD – We need to figure out how exactly we execute. This was driven by gufe PR #13. DD – (Shows draft architecture diagram) AM – How do you the network learning the important details of the protein ligand complexes from the System A and System B inputs? DD – This will be in GUFE GitHub - OpenFreeEnergy/gufe: core data models of the Open Free Energy ecosystem MH – Maybe a better way to run: repo to notebook GitHub - OpenFreeEnergy/ExampleNotebooks: Notebooks demonstrating how to do simple tasks related to free energy calculations. and command to make the env conda env create openfe/openfe-notebooks IA - worth a look for how we currently use the guefe components (will change according to this): try.openfree.energy AM – So it’s all outside of the diagram we’re currently looking at? DD – Yes AM – So, how does the client interact/submit System A or System B? DD – All of the components are created by a user. So even the stuff on the left come from a user, the client box isn’t fully fleshed out. So I’ll make sfurther iterations of this. AM – We have a gateway class in BioSimSpace that may be helpful to understand here: RG – I’d been meaning to clean up this sort of diagram, and to show how differnt user stories would go through the diagram. RG – The way that “extension” would work is that the strategy box would see what needs running, see that something’s incomplete, and then it would dispatch a new simulation to be executed. DD – But different kinds of calculations would need different sorts of extensions. AM – So you’re trying to build a workflow engine as well? JC – It’ll be good to decouple task prioritization from the execution engine as much as possible - The F@H work server prioritization is based on flaoting point weights, whereas AWS is totally different. So keeping this modular will be really good. JC – Also, I'm wondering how different engines (like AMBER) will plug in here. RG – I’m thinking that can go in a red box JC – There’s no reason not to do that now DD – Say we have a nonequilibrium switching protocol - Part of that setting could be the engine. So when a protocol makes its work units, the substance of interfacing with the engine would happen in that stage. So it would look like two similar protocols could make AmberWorkUnits or GromacsWorkUnits. JC – So it’s more like an enginewrapper or enginemanager. So it’d need to know about high-level settings like integrator, and that would need to be passed through. And that would also need to return the output to some common format afterwards. DD – We want to reduce the number of user touchpoints as much as possible. So if there’s a common… JC – So if there’s algorithm that knows how to split a simulation into pieces, then that could handle the splitting and the rejoining of work, then nothing above it would need to know the details. DD – So, nothing above the workunits would know which engine was used RG – I don’t think it’s such a bad thing if ProtocolResults are something that could be somewhat different in different strategies JC – Why not have the protocolresults be engine-independent, and then have additional optional metadata? RG – I don’t think it’s possible to anticipate all possible data fields that may ever be generated. So we need to have some flexibility. DD – I was thinking we could hve something like a LambdaProtocol that could be subclassed… RG – It depends on … DD – Protocol and ProtocolResults operate on one level, and WorkUnit and WorkResult operate on a different level. So WorkResult and WorkUnit may need to be specific to each other, and Protocol and ProtocolResult could be coupled as well JC – One level of abstraction should say which ALGORITHM is being used - If there will be lambda windows, something needs to know how to split up and rejoin the work. On the other hand, there needs to be isolation of the specific engine from standard objects. Obviously each negine makes different objects, but we need to convert that to a common format. RG – So, there will be a layer between the results later and the protocol results that makes things engine-agnostic… JC – …
		DS – Looking at how to have results in an object that will be interacted with on disk. This is quite preliminary, but I’d really like feedback on it. https://docs.google.com/document/d/1NOv01BuTM_Sw_VIc5pipRNOT2g4Tre2dGtZ7-ysy3UI/edit?usp=sharing DD – It’d be great if people gave this a read-over similar to the settings taxonomy. JC – DS, you were thinking of this being extensible, or more like fixed-columns? DS – I was thinking of a mix. DD – One thing I thought of last night is that this particular project is centered around F@H, but that it might be different if it was centered around AWS. So can you predict the needs of your users/partners? DS – They will want a file. Big trajectory files could be accessed through a URI. But the idea is that errors should be raised at the last moment. So if there’s a network with no trajectories, if someone requests an analysis that requires a trajectory, they’ll get a failure when the analysis attempts to retrieve it. But if they just want to visualize the network (without any trajectories) then they should be able to do that for the same network. AM – I’m thinking that pharma partners could use either in-house HPC or AWS S3 JC – Agree with AM. The list already on this document is insufficient for most use cases (or even basic debugging). So for now we should think about the behavior/information content that we want without worrying too much about performance. Once we find the behavior/info content (after many iterations) we want, then we can worry about performance. DD – Agree
Use cases for AVIDD / ASAP (Covid Moonshot v2)		Many ligand transformations for single protein (SARS-CoV-2 Mpro) and solvent legs Many ligand transformations for multiple protein (many coronavirus Mpros) and single solvent leg ligandA:Mpro(SARS) → ligandB:Mpro(SARS) ligandA:Mpro(MERS) → ligandB:Mpro(MERS) ligandA:solvent → ligandB:solvent A more limited number of ligand transformations x many single protein mutations (SARS-CoV-2 Mpro) and single solvent leg (for both ligand transformations and protein transformations) : ligandA:Mpro(WT) → ligandB:Mpro(WT) ligandA:solvent → ligandB:solvent ligandA:Mpro(WT) → ligandA:Mpro(P49A) Mpro(WT):solvent → Mpro(P49A):solvent

2022-05-03 Protein-ligand benchmarks meeting notes

Participants

Goals

Discussion topics

Action items

Decisions