| | |
---|
0.3.0 protein ligand benchmark milestone | @David Dotson | |
OpenFE settings taxonomy |
| OpenFE Settings Taxonomy DD – Any comments on current state? AM – I had a quick chat with JC about making things interoperable, so you can drag in an drop out different engines. I’ve been involved in BioSimSpace development, and I can point you toward our solutions for interoperability/plug and play issue. I think it had limited adoption becuase everything was based on SireMol. DD – That’d be fantastic. I haven’t yet looked closely at the BioSimSpace implementation. AM – I’ll make an email introduction between LHedges, RGowers, and DDotson.
JW – did raise a comment on system representation; wasn’t sure if it was the best place for it IA – How did you decide what went into XML files (like, things that would affect the energy of a system)? MH – important that we highlight which settings impact the energy
JW – for OpenFF, SMIRNOFF format includes things that affect the single-point energies, whereas other FFs may not IA – Asking Matt Thompson to compare items in Level I to his understanding of what impacts single-point calculations IA – Action Item: will organize a working session with Matt Thompson to hammer on Level I in the taxonomy specifically; objective is to identify missing items in Level I and decide where ambiguous settings (e.g. things that could go into Level I or Level III) should go (least bad option) JW – I’m maybe a little concerned that the subsections in this document are “totally modular”, but the result of a step in like “add in bulk solvent” in the “chemical composition” section will be intertwined with a choice like “water model” in the “force field” section AM – Water model will have an effect on single-point energies, protonation states too JW – I’m the naivest person here, if folks who have more experience don’t see this as a risk then I think we’re in good shape.
|
Protocol execution architecture | @David Dotson | AM – How do you the network learning the important details of the protein ligand complexes from the System A and System B inputs? AM – So, how does the client interact/submit System A or System B? DD – All of the components are created by a user. So even the stuff on the left come from a user, the client box isn’t fully fleshed out. So I’ll make sfurther iterations of this. AM – We have a gateway class in BioSimSpace that may be helpful to understand here:
RG – I’d been meaning to clean up this sort of diagram, and to show how differnt user stories would go through the diagram. RG – The way that “extension” would work is that the strategy box would see what needs running, see that something’s incomplete, and then it would dispatch a new simulation to be executed. DD – But different kinds of calculations would need different sorts of extensions. AM – So you’re trying to build a workflow engine as well?
JC – It’ll be good to decouple task prioritization from the execution engine as much as possible - The F@H work server prioritization is based on flaoting point weights, whereas AWS is totally different. So keeping this modular will be really good. JC – Also, I'm wondering how different engines (like AMBER) will plug in here. RG – I’m thinking that can go in a red box JC – There’s no reason not to do that now DD – Say we have a nonequilibrium switching protocol - Part of that setting could be the engine. So when a protocol makes its work units, the substance of interfacing with the engine would happen in that stage. So it would look like two similar protocols could make AmberWorkUnits or GromacsWorkUnits. JC – So it’s more like an enginewrapper or enginemanager. So it’d need to know about high-level settings like integrator, and that would need to be passed through. And that would also need to return the output to some common format afterwards. DD – We want to reduce the number of user touchpoints as much as possible. So if there’s a common… JC – So if there’s algorithm that knows how to split a simulation into pieces, then that could handle the splitting and the rejoining of work, then nothing above it would need to know the details. DD – So, nothing above the workunits would know which engine was used RG – I don’t think it’s such a bad thing if ProtocolResults are something that could be somewhat different in different strategies JC – Why not have the protocolresults be engine-independent, and then have additional optional metadata? RG – I don’t think it’s possible to anticipate all possible data fields that may ever be generated. So we need to have some flexibility. DD – I was thinking we could hve something like a LambdaProtocol that could be subclassed… RG – It depends on … DD – Protocol and ProtocolResults operate on one level, and WorkUnit and WorkResult operate on a different level. So WorkResult and WorkUnit may need to be specific to each other, and Protocol and ProtocolResult could be coupled as well JC – One level of abstraction should say which ALGORITHM is being used - If there will be lambda windows, something needs to know how to split up and rejoin the work. On the other hand, there needs to be isolation of the specific engine from standard objects. Obviously each negine makes different objects, but we need to convert that to a common format. RG – So, there will be a layer between the results later and the protocol results that makes things engine-agnostic… JC – …
|
| | DS – Looking at how to have results in an object that will be interacted with on disk. This is quite preliminary, but I’d really like feedback on it. OpenFE results storage DD – It’d be great if people gave this a read-over similar to the settings taxonomy. JC – DS, you were thinking of this being extensible, or more like fixed-columns? DS – I was thinking of a mix. DD – One thing I thought of last night is that this particular project is centered around F@H, but that it might be different if it was centered around AWS. So can you predict the needs of your users/partners? DS – They will want a file. Big trajectory files could be accessed through a URI. But the idea is that errors should be raised at the last moment. So if there’s a network with no trajectories, if someone requests an analysis that requires a trajectory, they’ll get a failure when the analysis attempts to retrieve it. But if they just want to visualize the network (without any trajectories) then they should be able to do that for the same network. AM – I’m thinking that pharma partners could use either in-house HPC or AWS S3 JC – Agree with AM. The list already on this document is insufficient for most use cases (or even basic debugging). So for now we should think about the behavior/information content that we want without worrying too much about performance. Once we find the behavior/info content (after many iterations) we want, then we can worry about performance. DD – Agree
|
Use cases for AVIDD / ASAP (Covid Moonshot v2) | | Many ligand transformations for single protein (SARS-CoV-2 Mpro) and solvent legs Many ligand transformations for multiple protein (many coronavirus Mpros) and single solvent leg ligandA:Mpro(SARS) → ligandB:Mpro(SARS) ligandA:Mpro(MERS) → ligandB:Mpro(MERS) ligandA:solvent → ligandB:solvent
A more limited number of ligand transformations x many single protein mutations (SARS-CoV-2 Mpro) and single solvent leg (for both ligand transformations and protein transformations) : ligandA:Mpro(WT) → ligandB:Mpro(WT) ligandA:solvent → ligandB:solvent ligandA:Mpro(WT) → ligandA:Mpro(P49A) Mpro(WT):solvent → Mpro(P49A):solvent
|