2020-09-15 System/Interoperability notes

Date

Sep 15, 2020

Participants

(This was fairly impromptu, otherwise would have a few more people)

@Christopher Bayly
@John Chodera
@David Mobley

Link to ~1 hr video here: https://openforcefieldgroup.slack.com/archives/C8NE3J96U/p1600196037022400

Matt’s notes from watching the video

Things parenthesis are mostly my annotations

CB: FFs apply patterns to molecular graphs. What I want is an object that understands the relationship between parameters and where on the graph they are applied. I.e. an edge in a graph is a bond and there is a way to know its parameters.
JC: How does this extend to angles, torsions, etc.? Hypergraph?
CB: Sure, something like that. The graph going in stays the same, and the object can take the minimal graph (just knowing nodes & edges, minimal information otherwise) and process how the FF terms are applied to it.
JC: Summarizes SMIRNOFF-centric view of the world, with topology as an input, and an FF composed of handlers
Going purely from a graph to the "slots" is doable, but tricky because of things like ordering
CB: Concerned that only keeping atoms, not bonds, would be a problem
JC: Oh no, the topology graph will definitely have bonds, also mapping back to the FF parameter applied in a slot
CB: Safe to stipulate that an FF is just a collection of "force types?" (yes) cool, we're on the same page
JC: Where in the FE pipeline are we talking about? Tracking parameters as they're applied, swapping them out for different ones, how are relationships between chemical states stored?
CB: Looking at building up something like a base class. Is just a way to store molecules and chemical data, not directly for FE calculations, but FE calculations will work with some derived class.

Would also like: for each force in the system, a method for evaluating the function, and also evaluating the first derivative, and possibly for evaluating the second derivative.
JC: This is part of the design, exporting to different engines, i.e. OpenMM, JAX, other NumPy-like libraries, for evaluation and its derivatives.
CB: So, you could break down an Amber FF into its terms, and re-build it up again with these force handler/container things?
JC: Yes, with transparent conversions <--> ParmEd, other topology representations
CB: So the topology is separate from the container that stores the force data? (yes) Worried that is fragile. We'd end up carrying around two objects: one with chemical information (top) and one with forces (ff/forces/etc.) and those better match up, always.
JC: Yes, so then the conception is that the System contains a topology representation (now: OFFTop, but can convert out if desired), which is immutable, and married together the forces and topology.
CB: Until now, we haven't been talking about things that need coords. But to actually get forces out, you'd need them.
JC: The stored topology doesn't necessarily need coords, but will be able to store them.
CB: These two components are quite different. One example: forces don't need to know bond orders once they have a k value. Why would we store bond orders in the forces? Seems not useful, out of scope.
JC: The force will point to somewhere in its topology and also point to its parameter sources.
CB: This implies that parameter IDs will be tracked (yes, they will). Might not be essential, but would be really useful to know where forces come from and other things like separate out similar-looking parameters that come from different sources.

CB: This design hinges on a really good molecule/topology representation. What goes in there?
JC: A goal here is, even in the final system, to retain parameter sources in order to re-create a system, know where components come from, etc.
CB: Here's a test case: trying to make a force object for a protein. The protein you're giving it is a single OEMole, with thousands of conformers, each conformer is a snapshot of a trajectory, and is a large file (45 MB). Would this be packaged inside the System/force-containing object? (Implying here: what would be nice to do, what is essential to do? This seems like an edge case)
JC: What you need to set up a simulation is a topology, a set of forces, and positions (and box vectors). If you want to do a relative FE calculation, you need a set of those, with mapping, or some sort of hybrid topology mapping. Setting up a simulation is, by comparison, an easy task. How to facility FE calculations is trickier.
CB: (back and forth) trying to isolate the minimalist view (i.e. bare minimum forces?) and build out from that the things that would be nice (i.e. topology ordering) and see what of those are necessary.
JC: (Conveniently showing off his new iPad Pro) draws diagram of System, which contains Topology - rich chemical representation, conformers, etc. - something like a ParameterSet - containing something like handlers for each type of force, and inside of those handlers go the individual forces. Mapping between each force object and where it comes from.
CB: What would be helpful for us is separating out collections of forces (sounds like what he's asking for is satisfied) from the rest of the system. For example, extract out the bond and angle forces of a system and serialize/convert those out. (Right now, kinda stitching together these components using different libraries, not desirable).
JC: That could be done, but it would make a lot more sense to use the entire System object. Since forces don't make as much sense de-coupled from their topology. Explains some of the other designed features.
CB: (on the idea that everything should be the entire System object, not allowed to splice things out). Seems very cool, elegant, aesthetic. But for some short-term and/or practical uses, there is some utility in separating out components. Maybe I want to splice out the force field and do something, or other "puzzle pieces" sort of approaches. Aware that it's a fragile approach.
JC: What's the driving motivator here? Concerned about the object getting too big with a ton of positions?
CB: Would like more generality, be able to plug in a part of the System, but not be married to use the entire thing.

CB: Thought of another thing. Of a potentially-really-big set of forces, would like to separate out a part of those, like parameters applied to a ligand, or a particular component of the potential energy function, or maybe a combination of the two (the GBSA for the ligand).
JC: Okay, could be an API point for selecting a part of a system (lig_sys = my_sys.subset(ligand_indices). Could then do ligand_system.compute_energy().
CB: I'd like to be able to piece together OpenFF infrastructure with some OpenEye tools like OEFF, Szybki (gold metal: some OpenEye tools can be optional backends for System functionality; silver medal: something that an OpenEye wizard can use to compose OpenEye

CB: Generally concerned that the fundamental object contains everything. Would like to be able to carry about something smaller and dealing with the responsibility that comes with the associated fragility.
JC: It's not necessarily bad to have a class that contains a lot of stuff, that can be optimized, compressed, composed of many optional fields if desired. For example, OEChem is huge but optimized, and can feel light if you're only using 1% of the API. Are you worried about size in memory, on disk? Those can generally be dealt with on the software side.
CB: Kinda concerned about slow interoperability when converting/composing things to other libraries.

Important take-aways

Need to design a "simple, here's your MD starting point" approach that doesn't itself to FE, but can enable FE infra to be built on top of
CB would like to be able to splice out a part of a System and stitch it into other tools
Be generally concerned with scope creep, particularly for data that cannot be optimized/compressed.