2022-03-01 Protein-ligand benchmarks meeting notes

Participants

Goals

Discussion and feedback on overall impressions from DD review of user stories
Define initial deliverables for DD, others
- data structure specs
- architecture component API points

Discussion topics

Impressions from DD:

Think we are looking at a MultiDiGraph as fundamental to our structure.
- I'm assuming directionality is important, but that depends a bit on the protocol; if not, can get away with a MultiGraph
- if we can avoid directionality in the graph, this simplifies the data structure and algorithmic handling of the graph; I think we can avoid it, since even though some protocols may include a "direction", they don't fundamentally need to, and the resulting free energy differences between nodes have no directionality to them
  - JC – This woulodn’t be specific to just this project, it would be a good mentality for general data structures
  - DS – OpenFE is using multidigraphs for our internal objects.
    - directionality is important, in particular for the sign of the free energy difference
  - JC – Would it be possible for OpenFE and this F@H interface to use the same object?
  - DS – I think so. We just delivered this to our board.
  - JC – Is there code we could look at for this? What I’d seen before was RDKit-specific.So like how would we model a mutation in a kinase?
    - DS – Right now we have hard dep on RDKit, until we can use OpenFF mol fully. Right now our graph functionality is tailored to a ligand network. So systems (nodes) are identified by name, and edges would link two names.
    - JC – We know what kinds of transformations we want to support, so we could spec out the work with a larger scope and only implement a subset of it initially.
    - RG – We could always implement the small scope now, and subclass it to more general functionality later.
    - JW – If we already know we want to implement more generality, we should stat with speccing it out, even if implementation trails it. Speccing takes time, but will take less time than re-implementation after we are further down
    - JC – And we do know the domain pretty well.
  - JC – is it possible to have multiple edges between nodes (like, different ways of transforming one ligand into another?)
    - (General) – Yes, this is the “multi” in a “multidigraph”
  - RG – Could use comparison for OFFMol that does position-equality. Also need hashing that doesn’t graph comparison.
    - DS – User expects our molecule object to behave in ways that are pythonic with e.g. dictionaries
    - JW – There’s no clear definition of “equal” for mols
    - JC – If you want a dict that understands molecular structure, it’s possible, but the dict needs to understand that it’s going to have special keys
    - DS – tend to assume we have users who are not Python experts
    - JC – We need things to be interoperable
    - RG – Perhaps the problem is that the OpenFF object isn’t interoperable enough? If it doesn’t play well with e.g. a dict, this is a problem.
    - JC – is the disagreement that there are different meanings of “molecule equality”?
    - DS – Yes
    - RG – the Molecule object in OpenFE is really intended to just be a network node; so perhaps renaming it avoids the confusion, and lets the object do only what it needs to do for the purposes of defining a transformation network
    - RG – another concern we have is that if we are upstreaming OpenFF, and OpenFF remains as beta for a long time, we can’t declare ourselves as stable. Tying ourselves to OpenFF’s timelines presents as project risk for us
      - JW – the API is still slated for breaks, so I can’t make guarantees on stability
      - JC – there are points that can be considered stable, however. Can we identify these?
        some components may make sense outside of the toolkit?
    - JW – you want the Molecule class stabilized?
    - RG – I know that’s an unreasonable ask, and it wouldn’t be possible in the timescale we are committed to.
    - JW – could make guarantees around certain things though
    - JW – OpenFF does monitor a set of internal “core packages”, and guarantees that they will get updates + new releases that keep them running with 95%+ uptime. So OMSF could facilitate us extending this guarantee to OpenFE, so it’s our problem when we break our API.
      - RG – Would you take care of our downstreams as well?
      - JW – We can discuss extending this guarantee to other repos.
  - RG – We could put in a superclass that supports subclasses like LigandNode, ProteinNode, etc
    - DD – I could use more info about what a “node” is
For RBFE edges, each edge maps to at least two simulations (solvent transform, complex transform), but could include arbitrarily many.
More than one edge can connect two nodes, indicating a different protocol, atom map, etc.; other metadata could be used to indicate the differences as well
- absolutely identical inputs for an edge that has already been submitted will be deduplicated within a project, however; allows idempotency of input with minimal churn
Definitely need RBAC built in from the start, with users able to be members of multiple orgs but with roles granted per-campaign; probably need an org-scoped Admin role, as well as a system-scoped Admin role for setting up orgs, granting org Admins
- will need to lay out initial set of roles, scopes, grants; can evolve these over time
Should create a glossary of terms as we build the data models, since e.g. "transformation" is already overloaded as a term
- a "transformation" is an edge in our data model; a node is a “microstate”; microstates can be connected by zero or many transformations
  - JC – In thinking about the most general setting from transformation work from the last decades, we need an edge that reflects a transition between environments. So like a protein env to an aqueous env. Sometimes also need to do two proteins and an aqueous env for selectivity workflows. So there’s be like different stacks of networks that are related to each other, but that each might reflect different environments where different methods are used to model transformations. And of course chemical transformations between ligands and amino acid mutations/modifications.
  - DD – So, I’d usually think of this as an input to an MD simulation.
  - JC – Like, a chemical microstate, with either coordinates or instructions on how to generate the coordinates… Like, if you had two microstates of a ligand, how do those relate to each other?
  - DD – Should a node be a single pose?
  - RG – I think positions are important. Are there workflows where you’re looking at two confs of a larger molecule?
  - JC – There are ways to do that, it may be more complicated that what we need to handle initially. Eg, catchecol binding lysozyme mutants in different binding modes… do we want to support that?
  - IA – I think that’d be useful. I just had a situation where I had to rerun HSP90 because of helix conf change. So I’d like a way for the network to recognize these separately.
  - JC – try to use diffnet approach as a guiding principle for deciding on the approach for a graph
  - JC – Huafeng Xu’s paper assumes gaussian error, not perfect but it worked pretty well. Could use that to relate things like different poses.
- ~~transformations/edges have no direction, even if some nonequilibrium protocols may have implicit directionality~~
Failure is a first-class citizen. The system should handle failures in a way that provides as much information as possible on what was done, why (if known) an execution failure occurred
A simulation of a molecular system with no alchemical transformation is a node with an edge that self-loops, with the self-loop edge carrying all parameters for the “protocol”, which in this case would be simulation parameters and no lambdas; multiple simulations for a single system would correspond to multiple self-loop edges on the node
- many simulations would be a “network” of unconnected nodes with self-loops
Nodes should carry structure (coordinate) information, since binding pose is a key detail in defining what the molecular “system” even is for protein-ligand, host-guest cases
1. DD – Do current OpenFE plans for “node” definition include what the ligand is bound to?
2. RG – Not currently
3. DD – If you transforming from “ligand unbound” to “ligand bound to protein”, how would that look?
4. IA – We’re currently looking at ligand RBFE nodes. Next months we’re planning on looking more closely into states and possibly extending node definition.
System must have a control surface for weighting / prioritization of orgs, campaigns, projects, down to individual transformations if possible in order to manage resources fairly and meet timelines

Item	Notes
Next steps	DD – I’m thinking about speccing things out, would like to check out openFE stuff but that may be premature. JC – We should see if we can move forward with a plan in which OpenFE components can get shoehorned in later. Though we’re still uncertain JW – Is there an alternative to another round of planning? All I can think of would be to try implementing something to learn if there are other development issues we can derisk JC – We’ve already kinda done an early implementation in the COVID moonshot. DD – We could take what we’ve done in moonshot, pull it into fah-alchemy, and try combining it with current OpenFE software.
	DN – I’d prefer us to have assembled a discrete task list/backlog from this. Today was largely technical discussion. JW – I think we’re finding that we can’t define what it is we want. DN – So, we should explicitly agree that we will have more meetings to go over user stories. DD – Yes, I think that’s necessary.
	JW – Current project status and timeline is now unknown - Didn’t have all approvers by this point in the meeting so we couldn’t commit to changing plans. But we are also no longer on our agreed-upon timeline of approving a spec after the third meeting RG – I talked with our governing board and we’d like to become approvers. JW – I think all of the current approvers would agree to that change. Please make sure your nominated approver attend the next such meeting.

Action items

David Dotson will assemble candidate data model for transformation graph in fah-alchemy; use lessons learned from Covid Moonshot, attempt to build on OpenFE network classes
Diego Nolasco (Deactivated) and David Dotson will assemble agenda for next meeting, including updates to project timeline, new approver

Participants

Goals

Discussion topics

Action items

Decisions

0 Comments