Participants
David Swenson
Irfan Alibay
Goals
Discussion and feedback on overall impressions from DD review of user stories
Define initial deliverables for DD, others
data structure specs
architecture component API points
Discussion topics
Impressions from DD:
Think we are looking at a MultiDiGraph as fundamental to our structure.
I'm assuming directionality is important, but that depends a bit on the protocol; if not, can get away with a MultiGraph
if we can avoid directionality in the graph, this simplifies the data structure and algorithmic handling of the graph; I think we can avoid it, since even though some protocols may include a "direction", they don't fundamentally need to, and the resulting free energy differences between nodes have no directionality to them
JC – This woulodn’t be specific to just this project, it would be a good mentality for general data structures
DS – OpenFE is using multidigraphs for our internal objects.
directionality is important, in particular for the sign of the free energy difference
JC – Would it be possible for OpenFE and this F@H interface to use the same object?
DS – I think so. We just delivered this to our board.
JC – Is there code we could look at for this? What I’d seen before was RDKit-specific.So like how would we model a mutation in a kinase?
DS – Right now we have hard dep on RDKit, until we can use OpenFF mol fully. Right now our graph functionality is tailored to a ligand network. So systems (nodes) are identified by name, and edges would link two names.
JC – We know what kinds of transformations we want to support, so we could spec out the work with a larger scope and only implement a subset of it initially.
RG – We could always implement the small scope now, and subclass it to more general functionality later.
JW – If we already know we want to implement more generality, we should stat with speccing it out, even if implementation trails it. Speccing takes time, but will take less time than re-implementation after we are further down
JC – And we do know the domain pretty well.
JC – is it possible to have multiple edges between nodes (like, different ways of transforming one ligand into another?)
(General) – Yes, this is the “multi” in a “multidigraph”
RG – Could use comparison for OFFMol that does position-equality. Also need hashing that doesn’t graph comparison.
DS – User expects our molecule object to behave in ways that are pythonic with e.g. dictionaries
JW – There’s no clear definition of “equal” for mols
JC – If you want a dict that understands molecular structure, it’s possible, but the dict needs to understand that it’s going to have special keys
DS – tend to assume we have users who are not Python experts
JC – We need things to be interoperable
RG – Perhaps the problem is that the OpenFF object isn’t interoperable enough? If it doesn’t play well with e.g. a
dict
, this is a problem.JC – is the disagreement that there are different meanings of “molecule equality”?
DS – Yes
RG – the
Molecule
object in OpenFE is really intended to just be a network node; so perhaps renaming it avoids the confusion, and lets the object do only what it needs to do for the purposes of defining a transformation networkRG – another concern we have is that if we are upstreaming OpenFF, and OpenFF remains as beta for a long time, we can’t declare ourselves as stable. Tying ourselves to OpenFF’s timelines presents as project risk for us
JW – the API is still slated for breaks, so I can’t make guarantees on stability
JC – there are points that can be considered stable, however. Can we identify these?
some components may make sense outside of the toolkit?
JW – you want the Molecule class stabilized?
RG – I know that’s an unreasonable ask, and it wouldn’t be possible in the timescale we are committed to.
JW – could make guarantees around certain things though
JW – OpenFF does monitor a set of internal “core packages”, and guarantees that they will get updates + new releases that keep them running with 95%+ uptime. So OMSF could facilitate us extending this guarantee to OpenFE, so it’s our problem when we break our API.
RG – Would you take care of our downstreams as well?
JW – We can discuss extending this guarantee to other repos.
RG – We could put in a superclass that supports subclasses like LigandNode, ProteinNode, etc
DD – I could use more info about what a “node” is
For RBFE edges, each edge maps to at least two simulations (solvent transform, complex transform), but could include arbitrarily many.
More than one edge can connect two nodes, indicating a different protocol, atom map, etc.; other metadata could be used to indicate the differences as well
absolutely identical inputs for an edge that has already been submitted will be deduplicated within a project, however; allows idempotency of input with minimal churn
Definitely need RBAC built in from the start, with users able to be members of multiple orgs but with roles granted per-campaign; probably need an org-scoped Admin role, as well as a system-scoped Admin role for setting up orgs, granting org Admins
will need to lay out initial set of roles, scopes, grants; can evolve these over time
Should create a glossary of terms as we build the data models, since e.g. "transformation" is already overloaded as a term
a "transformation" is an edge in our data model; a node is a “microstate”; microstates can be connected by zero or many transformations
JC – In thinking about the most general setting from transformation work from the last decades, we need an edge that reflects a transition between environments. So like a protein env to an aqueous env. Sometimes also need to do two proteins and an aqueous env for selectivity workflows. So there’s be like different stacks of networks that are related to each other, but that each might reflect different environments where different methods are used to model transformations. And of course chemical transformations between ligands and amino acid mutations/modifications.
DD – So, I’d usually think of this as an input to an MD simulation.
JC – Like, a chemical microstate, with either coordinates or instructions on how to generate the coordinates… Like, if you had two microstates of a ligand, how do those relate to each other?
DD – Should a node be a single pose?
RG – I think positions are important. Are there workflows where you’re looking at two confs of a larger molecule?
JC – There are ways to do that, it may be more complicated that what we need to handle initially. Eg, catchecol binding lysozyme mutants in different binding modes… do we want to support that?
IA – I think that’d be useful. I just had a situation where I had to rerun HSP90 because of helix conf change. So I’d like a way for the network to recognize these separately.
JC – try to use diffnet approach as a guiding principle for deciding on the approach for a graph
JC – Huafeng Xu’s paper assumes gaussian error, not perfect but it worked pretty well. Could use that to relate things like different poses.
transformations/edges have no direction, even if some nonequilibrium protocols may have implicit directionality
Failure is a first-class citizen. The system should handle failures in a way that provides as much information as possible on what was done, why (if known) an execution failure occurred
A simulation of a molecular system with no alchemical transformation is a node with an edge that self-loops, with the self-loop edge carrying all parameters for the “protocol”, which in this case would be simulation parameters and no lambdas; multiple simulations for a single system would correspond to multiple self-loop edges on the node
many simulations would be a “network” of unconnected nodes with self-loops
Nodes should carry structure (coordinate) information, since binding pose is a key detail in defining what the molecular “system” even is for protein-ligand, host-guest cases
DD – Do current OpenFE plans for “node” definition include what the ligand is bound to?
RG – Not currently
DD – If you transforming from “ligand unbound” to “ligand bound to protein”, how would that look?
IA – We’re currently looking at ligand RBFE nodes. Next months we’re planning on looking more closely into states and possibly extending node definition.
System must have a control surface for weighting / prioritization of orgs, campaigns, projects, down to individual transformations if possible in order to manage resources fairly and meet timelines
Item | Notes |
---|---|
Next steps |
|
| |
|
Action items
- David Dotson will assemble candidate data model for transformation graph in
fah-alchemy
; use lessons learned from Covid Moonshot, attempt to build on OpenFE network classes - Diego Nolasco (Deactivated) and David Dotson will assemble agenda for next meeting, including updates to project timeline, new approver
Add Comment