2020-12-02 System Design

Date

Dec 3, 2020

Participants

@Matt Thompson
@Jeffrey Wagner

Goals

Establish next steps for System development

Discussion topics

See second slide:

https://docs.google.com/presentation/d/1P-DMu7hmecExRBXpRteOzef-9LJRjt-WQTS3Y3Yh8lU/edit#slide=id.p

Can a “legacy atom-typed” topology be shoved into an OFFMol?

Maybe that can just go into the extras/properties/etc. field of the OpenFF Atom object

General question - is there anything that a GMSO object would bring to an OpenFF System that it wouldn’t be able to digest?

General answer: no, not really. Cases to care about include (but shouldn’t be blocking):

Virtual sites!

Off-center charges
Off-center LJ interaction sites
Polarizability
Tabulated potentials
- In general, there will be a lot of potentials that will not be analytically differentiable (and that’s ok)
What about topology? Any irreconcibale differences?

Positions?

Should be handled, except the area around it might be tricky
- Are atom indices tracked?
- Where are virtual sites processed? Always listed after atoms?
- What happens when an atom is added? (Can atoms be added?)

Where to store periodicity?

Just call it a property of box vectors, or ask “are these None”?

Non-bonded exceptions

Will be stored within (or very near) the non-boned potential handlers

(Summary of above): Physics, box vectors, positions should all be convertable GMSO -> OpenFF System

What about chemical topology?

OpenFF cares about some cheminformatics things (that GMSO may not):
1. Bond order
2. Stereochemistry
For re-parametrization, what do we need?
1. An entire OFFMol
For interop, we’ll need to store atom types (as an optional field)
1. This is to make GROMACS/AMBER/etc.
For valence terms, just need to track consituent atom indices and the relevant parameters (not necessarily the source FF params)
How to deal with residues (also chains/segments/other fun stuff)?
1. OFFTop idea (intention?) is to put a dict on each atom (or particle) that indicates what residue it’s in
  1. Have methods that do fast lookups to users can ask “which atoms are in residue XYZ?”
  2. Bypasses the responsibility of enforcing everything
  3. User/exporter/etc. will have a lot of responsibilities for implementation
  4. Given an Amber protein, put it into an OpenFF system and combine it with a Parsley-typed ligand. How is this to be exported to i.e. ParmEd?
    1. Protein will be easy (in this example…) given the structure of the input data
    2. Store the ligand as a separate residue named LIG?
      1. and give it a residue number that doesn’t clash with proteins
      2. and maybe need to create a new chain, just for that ligand
2. Positive: internal representation does not concern itself with the mess of residues
3. Negative: we’ll need to care about this implementation where we convert out to ParmEd/files/etc.
4. In-between: can’t reasonably expect all exports to match up well, and therefore will continue to be an annoyance to some users (acceptable, this isn’t really our fault)
  1. We can make reasonable attempts to sanitize state (i.e. a method that sanitizes for Amber output, another one for GROMACS output, etc.) but also expose these settings to the user, should they wish to turn off/on some features, modify how a particular step happens, etc. (“complexity has to live somewhere”)

What should the “topology in system” have at its interface? What should its API look like?

Barebones needs:
- Iterators over:
  - Atoms
  - Particles
  - Bonds
    - From this, get angles
- Atom model:
  - Index (maybe implicit by place in iterator?)
  - Mass
  - Optional:
    - Element
    - Pre-defined charge
    - Atom type
    - Stereochemistry
    - Aromaticity
    - Residue/chain/etc data (name, index, id, who the heck knows)
    - More space for other user-defined properties
    - Know its molecule
- Bond model:
  - Constituent atom indices
  - Optional:
    - Pre-defined partial bond order
    - Pre-defined integer bond order
    - Stereochemistry
    - Bond index
    - Aromaticity
    - Constrained boolean (maybe this should only be stored in the constraint handler?)
- Angle model:
  - Open question: should this exist? Or can everything we care about be generated on the fly from bond data?
- Virtual site model:
  - TBD

Medium term goals (OpenMM system re-creation, GMSO conversion/round-tripping, energy evaluation, more comprehensive JAX-based fitting) all rely on filling out the rest of the handlers.

Action items

Incorporate the above into the spec draft
Matt will set up meeting with Michael and John for next week
Work on building out more potential handlers
may explore a new, barebones topology