Interoperability Requirements [WIP]

Essential features

  • Object model and all components can be serialized to (at minimum) dict and JSON

  • Object model and all components can be hashed

  • Atoms must

    • Have elements

    • Have masses (think “isotopes”)

    • Have a topology indices

    • Be able to store a 4-character atom type name → JW – Can provide a “standardize” method that will cut atom names/types down to 4 characters

  • Molecule representations must not require that they can be represented SMILES

  • Multi-residue structures can be treated as individual molecules

Essential behaviors

These can either be explicitly part of the API or enabled by downstream code

  • Create iterators storing the indices of atoms of any valence term, indexed to the topology

    • i.e. a big list of tuples containing the indices of atoms in each bond - a little ugly at the moment since

  • Look up the following based only on an atom’s index

    • What molecule it is a part of

    • What residue (if any) it is a part of

    • What chain (if any) it is a part of

    • What other atoms it is bonded to

Ambiguous behavior/open questions

  • When molecules are converted to residues (i.e. in a conversion to Amber files), is that information (that this molecule is a molecule, not a residue) lost forever?

  • What information will acceptably be lost in round-trips with

    • OpenMM

    • MDTraj

    • other objects?

    • PDB files

    • other files?

    • For all converters, JW and IP should make tables like those for the Molecule core properties, showing which data is preserved and how fields are converted.

  • What assumptions are made about each components?

    • Connectivity within/between molecules and residues?

    • SMILES-ability of molecules (JW: NetworkX graph hash – We will provide at least an atom-order-dependent solution for this)

    • Non-element/isotope/bead “atoms”

Feature wishlist

  • Type annotations → JW – Will the return values of atomtyped molecules' inherited methods correctly indicate that they’re atomtypedmolecules, or will their return signature just indicate the base class?

  • Atoms can have masses not equal to their element’s mass

  • Atoms can have non-physical elements

  • Residues know if they include any other bonds to other residues

  • Atom type names limited to 4 characters

  • Fast generators and membership checks for atoms, bonds, angles, dihedrals, propers, impropers