Relationships with Toolkit Classes [WIP]

Landing page for documenting friction between definitions, scope, and uses of existing OpenFF infrastructure

Definitions

Topology (openforcefield.topology.topology.Topology)
Molecule (openforcefield.topology.molecule.Molecule)
ForceField (openforcefield.typing.engines.smirnoff.forcefield.ForceField)
System (openff.system.system.System)

Proposed changes

Topology should only track periodicity as a boolean, not box vectors

A definition of Topology that we are converging toward likely does not include box vectors since cheminformatics does not care about periodicity. However, it can be useful to know if a system is intended to be periodic. This would mean Topology.box_vectors should be removed and a boolean Topology.is_periodic should be added. This would break at least some of the API (Topology.from_openmm comes to mind) but would serve to better define the boundaries of what classes are responsible for what. In the case of importing an OpenMM Topology and running it through an OpenFF typing step, it would make sense to save the box vectors somewhere and apply them to a System, but there is no reason box vectors should be involved in the typing process.

Significant edge cases

Where do particles in virtual sites go?

Their existing implementation puts them in Topology - no other feasible option at the moment - but that breaks some unwritten rules. If Topology and Molecule are thought of a fairly minimal graph-like representations of chemical systems, then they should never contain virtual sites (they can’t go in SMILES, and arguably shouldn’t come close to the world of cheminformatics - they’re an MM construct.) Virtual sites are better thought of as a thing that results from applying a force field to a topology, not part of a topology itself. This implies the need for something that carries many of the properties of a Topology (connectivity, positions, forces, etc. for not only atoms, but also virtual site particles) but is separate, since it also carries force field information.

Application of parameters requiring bond order interpolation

Most force field parameters have a roughly 1:1 mapping between their representation in a force field (ForceField object, XML file, or something similar) and their representation in a system after a force field has been applied to a topology. For example, you could inspect the LJ parameters of a random carbon in a small molecule and (with some legwork) find identical values in a corresponding force field representation. Consider the case of a torsion applied using bond order interpolation: say a WBO of 1.7 is estimated, then the actual value of k in that torsion in the system will be a 70% average between the k1 and k2 values in the force field. The source (i.e. data) can be backed out from the force field with some effort, also, how the value of k was calculated (i.e. logic) can also be determined from the WBO stored somewhere in the parametrization process. This is one example that implies we should consider putting some restrictions on trusting and allowing “backwards parametrization” lookups (given a parametrized system, back out the force field parameters). The “forward” mapping can be stored with dict- or matrix-like representations and that should be trustworthy.

In the similar case of interpolating bond stretching for constants based off of WBOs (or perhaps other bond order estimates in the future), @Jeffrey Wagner notes on a thread in Slack:

[there is a] troubling trend of “certain steps in parameterizations modify the topology with information that other steps could misinterpret, but the topology doesn’t have a way to communicate the context”

This does not solve all cases, but a starting point could be to, in the first step of parametrization, make a copy of the input topology. This would make a more clear distinction between “topology” and “force field” worlds. This would not solve all issues relating to topology-ish data (i.e. bond orders), including

How should an arbitrary parameter handler know if it is permitted to modify encoded bond orders?
Should some handlers have flags for re-computing bond orders?
Should the above permissions be based on if the existing bond orders are from the topology or from an upstream parameter handler?

But it would separate out whether a carried topology’s bond orders are from a source topology (i.e. cheminformatics world) or from some other step in the parametrization process (somewhere in force field world)

Relationship between parameters in a force field and system

The system object currently tracks Potential objects that are effectively mirrors of parameters in the force field used to generate it.

Infrastructure