2022-03-18 JW | DD : FF on nodes vs. edges, pros and cons

Participants

Goals

For the following options, brainstorm pros and cons:
- forcefield information on nodes of an alchemical graph
- forcefield information on edges of an alchemical graph
- forcefield information on the alchemical graph itself

Discussion topics

Notes
DD – The third option above is a new idea that may help us identify the effects of connections. I want to give the working group the raw material for a decision. So our goal today isn’t to make the decision, but rather to list the pros and cons. Then we can make a decision at the meeting (even if we say “this is what we’ll assume for now”). JW – I’m going to try to argue that any choice of putting info on edges is a SUBSET of what could be achieved by putting What about the case where we have the SAME (to a human) protein+ligand combination, but there are minor differences (atom types, particle numbers, etc) that are required for it to be used with different FFs? Can one node represent multiple PDB files (or other structure definitions?)? It probably does, and so in any design, a human must be able to record in the data “these two unequal input files are the same thing” JW – Picking a good or bad design can add complexity on its own, but there’s some complexity that will be in here regardless of our design. What is that complexity, and how can we force it to live in one place so we can reason about it? Nonidentical systems that humans consider equal ((“this atom typed protein is the same as that chemical protein”), or (“this protein-ligand system with explicit water is the same as that protein-ligand system that says ‘add a solvent box to me before running’”) ) Rerunning the same transformation, even though it ran successfully the first time Force fields can be represented in different ways (a name, a string containing all contents, a pickled object, …) Choice of alchemical protocol, and the mapping of the protocol to 1 or more simulations Choice of free energy estimator in evaluating data on a single transformation Choice of graph free energy estimator in evaluating data for transformations in graph
forcefield information on nodes of an alchemical graph Pros JW would argue that every other representation can be distilled down as a VIEW of a graph with FF info on nodes Node definition more closely matches the concept of a “microstate”; includes atom identities, coordinates X and potential U(X) If FF info is NOT on nodes, then all molecular representations in the graph MUST be chemical representations (no atom typed representations!). Explicitly records human intent (“here’s a fake edge saying that these two input structures are the same” or “I’m giving these two nodes the same label because they represent the same chemistry, to me”) Cons Node equality operator and view mechanism would need to learn how to compare FFs (which could be represented in a variety of formats) For a network intended to benchmark many forcefields on its own, requires many more nodes instead of edges Would we really ever model the transformation for the same chemical system transitioning from one FF to another? This use case is not high-priority. Does it even/ever make sense to draw an edge between e.g. a node with gaff-2.11 and openff-1.3.0? If we did the same network of transformations in two FFs, there would be no edges between the nodes with one FF to the nodes with another. We’d need to make assumptions about how the networks relate to each other DD – effectively becomes option (3) implicitly for the use case of FF comparison JW – Counterpoint: I could imagine saying `network.view(identity=('protein','ligand'))`, and the nodes that use DIFFERENT ffs but the SAME protein+ligand (handwaving) would get merged in the final view I receive. DD – I see. In form it looks similar to the third option. But this option allows the two graphs to be connected. DD – Would it be helpful/necessary to have edges that represent “identity”? forcefield information on edges of an alchemical graph Pros JW – Human intent (eg “these two nonidentical input files (one with atom types and the other with elements+bond orders) are actually equal as far as I’m concerned”) is unambiguously recorded when the user adds two different structure files to the same node because we’ve chosen to put the FF on the edges, it affords us more flexibility in defining the system on the node If there are multiple structures on a node, the edges need to know which structure they should use to start simulations Ties the choice of FF to the alchemical protocol, and therefore the dynamics, between two systems Because it puts the FF alongside the protocol, allows for protocols that morph between more than one FF Cons forcefield information on the alchemical graph itself Pros None of the cons of either idea above, becasue FF info isn’t on either nodes or edges Cons If we’re comparing FFs, then we can’t have a data structure that connects the two graphs - We have to make assumptions about how the nodes/edges of one graph map to the nodes/edges of another.
DD brainstorming notes
JW brainstorming notes
Will resume at 11 AM Pacific (1 hour)

Participants

Goals

Discussion topics

Action items

Decisions