2020-09-15 System/Interoperability notes

Date

15 Sep 2020

Participants

(This was fairly impromptu, otherwise would have a few more people)

Link to ~1 hr video here: https://openforcefieldgroup.slack.com/archives/C8NE3J96U/p1600196037022400

Matt’s notes from watching the video

Things parenthesis are mostly my annotations

CB: FFs apply patterns to molecular graphs. What I want is an object that understands the relationship between parameters and where on the graph they are applied. I.e. an edge in a graph is a bond and there is a way to know its parameters.
JC: How does this extend to angles, torsions, etc.? Hypergraph?
CB: Sure, something like that. The graph going in stays the same, and the object can take the minimal graph (just knowing nodes & edges, minimal information otherwise) and process how the FF terms are applied to it.
JC: Summarizes SMIRNOFF-centric view of the world, with topology as an input, and an FF composed of handlers
Going purely from a graph to the "slots" is doable, but tricky because of things like ordering
CB: Concerned that only keeping atoms, not bonds, would be a problem
JC: Oh no, the topology graph will definitely have bonds, also mapping back to the FF parameter applied in a slot
CB: Safe to stipulate that an FF is just a collection of "force types?" (yes) cool, we're on the same page
JC: Where in the FE pipeline are we talking about? Tracking parameters as they're applied, swapping them out for different ones, how are relationships between chemical states stored?
CB: Looking at building up something like a base class. Is just a way to store molecules and chemical data, not directly for FE calculations, but FE calculations will work with some derived class.

Would also like: for each force in the system, a method for evaluating the function, and also evaluating the first derivative, and possibly for evaluating the second derivative.
JC: This is part of the design, exporting to different engines, i.e. OpenMM, JAX, other NumPy-like libraries, for evaluation and its derivatives.
CB: So, you could break down an Amber FF into its terms, and re-build it up again with these force handler/container things?
JC: Yes, with transparent conversions <--> ParmEd, other topology representations
CB: So the topology is separate from the container that stores the force data? (yes) Worried that is fragile. We'd end up carrying around two objects: one with chemical information (top) and one with forces (ff/forces/etc.) and those better match up, always.
JC: Yes, so then the conception is that the System contains a topology representation (now: OFFTop, but can convert out if desired), which is immutable, and married together the forces and topology.
CB: Until now, we haven't been talking about things that need coords. But to actually get forces out, you'd need them.
JC: The stored topology doesn't necessarily need coords, but will be able to store them.
CB: These two components are quite different. One example: forces don't need to know bond orders once they have a k value. Why would we store bond orders in the forces? Seems not useful, out of scope.
JC: The force will point to somewhere in its topology and also point to its parameter sources.
CB: This implies that parameter IDs will be tracked (yes, they will). Might not be essential, but would be really useful to know where forces come from and other things like separate out similar-looking parameters that come from different sources.

CB: This design hinges on a really good molecule/topology representation. What goes in there?
JC: A goal here is, even in the final system, to retain parameter sources in order to re-create a system, know where components come from, etc.
CB: Here's a test case: trying to make a force object for a protein. The protein you're giving it is a single OEMole, with thousands of conformers, each conformer is a snapshot of a trajectory, and is a large file (45 MB). Would this be packaged inside the System/force-containing object? (Implying here: what would be nice to do, what is essential to do? This seems like an edge case)
JC: What you need to set up a simulation is a topology, a set of forces, and positions (and box vectors). If you want to do a relative FE calculation, you need a set of those, with mapping, or some sort of hybrid topology mapping. Setting up a simulation is, by comparison, an easy task. How to facility FE calculations is trickier.
CB: (back and forth) trying to isolate the minimalist view (i.e. bare minimum forces?) and build out from that the things that would be nice (i.e. topology ordering) and see what of those are necessary.
JC: (Conveniently showing off his new iPad Pro) draws diagram of System, which contains Topology - rich chemical representation, conformers, etc. - something like a ParameterSet - containing something like handlers for each type of force, and inside of those handlers go the individual forces. Mapping between each force object and where it comes from.
CB: What would be helpful for us is separating out collections of forces (sounds like what he's asking for is satisfied) from the rest of the system. For example, extract out the bond and angle forces of a system and serialize/convert those out. (Right now, kinda stitching together these components using different libraries, not desirable).
JC: That could be done, but it would make a lot more sense to use the entire System object. Since forces don't make as much sense de-coupled from their topology. Explains some of the other designed features.
CB: (on the idea that everything should be the entire System object, not allowed to splice things out). Seems very cool, elegant, aesthetic. But for some short-term and/or practical uses, there is some utility in separating out components. Maybe I want to splice out the force field and do something, or other "puzzle pieces" sort of approaches. Aware that it's a fragile approach.
JC: What's the driving motivator here? Concerned about the object getting too big with a ton of positions?
CB: Would like more generality, be able to plug in a part of the System, but not be married to use the entire thing.

CB: Thought of another thing. Of a potentially-really-big set of forces, would like to separate out a part of those, like parameters applied to a ligand, or a particular component of the potential energy function, or maybe a combination of the two (the GBSA for the ligand).
JC: Okay, could be an API point for selecting a part of a system (lig_sys = my_sys.subset(ligand_indices). Could then do ligand_system.compute_energy().
CB: I'd like to be able to piece together OpenFF infrastructure with some OpenEye tools like OEFF, Szybki (gold metal: some OpenEye tools can be optional backends for System functionality; silver medal: something that an OpenEye wizard can use to compose OpenEye

CB: Generally concerned that the fundamental object contains everything. Would like to be able to carry about something smaller and dealing with the responsibility that comes with the associated fragility.
JC: It's not necessarily bad to have a class that contains a lot of stuff, that can be optimized, compressed, composed of many optional fields if desired. For example, OEChem is huge but optimized, and can feel light if you're only using 1% of the API. Are you worried about size in memory, on disk? Those can generally be dealt with on the software side.
CB: Kinda concerned about slow interoperability when converting/composing things to other libraries.

Important take-aways

Need to design a "simple, here's your MD starting point" approach that doesn't itself to FE, but can enable FE infra to be built on top of
CB would like to be able to splice out a part of a System and stitch it into other tools
Be generally concerned with scope creep, particularly for data that cannot be optimized/compressed.

2020_09_16 meeting notes

JC
- Interoperability – can provide readers/writers in different languages
- Pure Python, fully-informed object, which can be converted (perhaps lossily) to other formats
JC – Can’t have converters before API is settled
- MS – Not necessarily, will depend on timeline with Vanderbilt.
- JC – Will depend on incentives. Can make sure we have a increasingly useful product at the end of each step. But also make sure we don’t accumulate technical debt.
MS – Generally, we want a System object that is complete and extensible. We should be able to import it both via API and from existing file formats. MT, how much of this overlaps with GMSO progress/efforts
- MT – GMSO aims to solve similar problems. Underlying infrastructure is fairly irreconcilable. MBuild → Foyer → GMSO workflow is basically irreconcilable, since foyer output doesn’t keep track of parameter identity.
- JW – Could a “cauterized” topology be functional? Like, if loaded from a OMM system or foyer output, could we lose track of the parameter provenance but still have some functionality available?
- JC – A rich System object may be able to handle certain cases of missing information in a systematic way.
- MS – How can we make the Vanderbilt work overlap somewhat with us?
- JW – If GMSO simply doesn’t have data fields for important info in their object model, then any use of their converters will be destined to fail
- MS – It may still have some value, even with a lossy converter
MS – Let’s re-focus on finding areas of overlap in effort between us and Vanderbilt
- MS – This funding will help us avoid “sunk cost fallacy”, will also give us the opportunity to idenitfy places the objects aren’t isomorphic and identify whether this is intentional or simply oversights.
- MS – What are likely fundamental incompatibilities?
  - MT – Foyer has nothing that SMIRNOFF doesn’t. Main thing they have is a body of work in data files, like OPLS in XML format. It will be hard but possible to have them standardize on using more of OpenFF infrastructure.
  - MT – Foyer is basically a thin wrapper around OpenMM. Foyer FF class inherits heavily from OpenMM FF class. Agree that there’s a big “sunk cost” mentality around GMSO adoption. From an outside perspective, I see that it would be beneficial for them to move away from it.
  - JC – Looking at GMSO, I see that most of the output converters are incomplete/ParmEd dependent. If we made a GMSO ← → OFF System converter, then they could take advantage of our converters.
  - MT – It’s worth considering what the objects will look like in 6 months. I think our scope is much broader than GMSO’s.
  - JC – We might use the GMSO → OpenFF system converter as an initial goal, with the final goal of making a mBuild → OFF System output functionality.
  - MT – Agree
  - MS – Who are the relevant stakeholders at Vanderbilt and what will they (organizationally/emotionally/individually) want to pursue?
    - MT – My perspective is that I’m most proud of the design of GMSO, but see that the implementation is still largely incomplete. Currently nobody else has taken hard ownership of it.
    - MS – We can kind of view this as a chance to design a new object with the understanding that MT gained from designing GMSO.
    - MT – Stakeholders are Peter and Claire, Ray, Justin, Koh, Prashra. 6 or 7 other university on MosDeF grant, mostly just a single PI+GS at each. Other institution people are less software-focused. Big difference in the latter group is that nobody uses AMBER or OMM, some use GROMACS. Would need strong support for LAMMPS, HOOMD, CASSANDRA.
    - JC – Some communication with Andrew White of HOOMD, looking at support for ML potentials
    - MT – Those aren’t trivial, but are tractable. Could get support for this and other formats from community. MosDeF grant is up for renewal in 1ish year. Also looking to get a grant in the same category as MolSSI.
    - MT – As we provide more support for more chemical-engineering use cases, we’ll start gathering stakeholders in adjacent fields.
    - MS – So let’s anticipate having a very broad view for our potential scope.
  - MS – Basically everyone wants a replacement for ParmEd. What’s our position on this?
    - MT – I think it’s possible. Our system object will be largely isomorphic to ParmEd’s internal data representation.
    - MT – Initial System goals are to export faithfully to ParmEd, within the same scope as our current OpenMM System → ParmEd route. The next step will be successfuly exporting new features like vsites to ParmEd. (JC: Also CMAPs would be nice). The “gold medal” is a full set of direct converters to other formats.
Summary: Identify ways to work on the same code as much as possible. Our desired outcome is to get them investing effort in System and its converters.
- JW – Do we want to move toward full replacement of GMSO with OpenFF system? Like, having mBuild → OFF System directly? And if so, how do we want to present this initially?
- MS – It’s their choice. We should take the position that “we want a single system object”, and let them reason out the paths forward.
- (General) – What’s the “carrot” for vanderbilt?
  - MT – “we’ll handle the interoperability, you handle GMSO → system”
  - MS + JC – “We’ll provide a modern ParmEd if you can get to our format”
Possibility of help from Jason Swails / Entos?
- Likely we can get limited advisement from Swails
To Dos
- MS will prepare initial contact pitch for Vanderbilt
- MT and JW will work on preparing System object prototype.
- MT will make System object into a weekly meeting – Will send out a whenisgood.