2021-02-11 Topology refactor Meeting notes

Date

Feb 11, 2021

Participants

  • @Jeffrey Wagner

  • @Matt Thompson

  • @Simon Boothroyd

Last meeting:

Discussion topics

Item

Notes

Item

Notes

Topics identified from last meeting

  • Continue defining difference between pre- and post-parameterization topology

    • Positions? Box vectors?

      • MT – Want box vectors and periodicity boolean tuple

    • Virtual sites?

    • How do we let people slip in stuff like pre-defined partial charges?

  • Biopolymer tie-ins?

  • 3D mols vs. graph mols? Both? Just one?

  • Isotopes?

  • What are other hard questions about tracking state? What happens when we find another one in several years? Will the proposed object model make it possible to wrangle this complexity?

  • What from the existing API can stay? What behavior changes (eg. offmol.generate_conformers returns one conf? multiple molecules?

Biopolymer tie-ins

  • SB – Presents

  • JW – I like the behavior that this exposes, but I’m still kinda in favor of storing residue/chain/resid info on the atoms, and then just providing these API points as a view of the fully-detailed topology.

  • (General) – How can we decide which use cases we’ll need to support/will be common?

    • Could start coming up with concrete use cases and seeing which design(s) support their functionality.

  • Use case brainstorming:

    • Load from SDF, percieve residues

    • Load from PDB, fill in details according to known residues?

    • Load a structure with missing atoms

    • Detect unnatural AA and assign parameters, gracefully handle backbone interface w/ natural AAs

    • Take a canonical protein and covalently modify with sugar/PTM

      • Modified AA becomes a single residue

      • Modified AA becomes several residues (maybe one for each monosaccharide)

    • Create/break disulfide/other covalent bond between distant parts of chain

    • Support manually filling in missing atoms/loops

    • Change protonation state of a residue

    • Rename a residue

    • Support iteration over Biopolymer.residues when residues are separated by

      • resname/residue + resnum/resid

    • Attach a cofactor like heme, which connects to 4 other residues

    • Get the graphs of unique monomers in this biopolymer/topology.

    • Semi-Persistent biopolymer-level info (eg secondary structure prediction) →

      • Could store on all-atom, Biomonomer-instance, or whole-biopolymer level.

    • Arbitrary metadata attached to biopolymer (eg. “Ala131 has metadata indicating it’s my favorite, then I chemically modify Ala131. Is there still a way to see it’s my favorite?”) → Don’t store arbitrary metadata. Arbitrary metadata will clash with cached data (under what conditions is it persistent?) and pydantic data models (data formats must be well-defined)

    • Add a PTM to this one alanine, and then find the new graph in the biopolymer/topology. → Biomonomer instance vs Biomonomer type. (SB + JW – Don’t persistently store Biomonomer type. Though we could offer on-the-fly grouping of identical biomonomer graphs though an API point)

    • All of these residues are HIS, but they have different protonation → flexible atom/residue/molecule bookkeeping

    • Prepare a system for export to X format, and comply with that format’s residue names where possible → Plugins. Atom group perception / flexible atom/molecule bookkeeping and grouping

    • I want to build a biopolymer from “AlaValGly” string. → PolymerConstructor plugins/classmethods for Biopolymers

    • I want to build a custom biopolymer with non-standard residues. → PTM. PolymerConstructor plugins/classmethods for Biopolymers. Possible reaction SMARTS support? Possible support for other manual API points for modifying chemistry.

    • I pulled this janky PDB from the internet which may or may not be complete, I want to solvate it, parameterize it, add a ligand, and simulate. → PDB loading / atom group perception / flexible atom/molecule bookkeeping and grouping. We will make no effort to fill in information if a molecule is incomplete.

    • I want to get the atom indices of residue X / I want to pull out all of the ‘backbone' atom indices. → PDB loading / atom group perception / flexible atom/molecule bookkeeping and grouping

    • I want to know the secondary structure of my protein. → Can offer export to MDTraj. Could also have an API point to do the analysis automatically and label the OFFBiopolymer accordingly.

    • I want to know where all of the strong h-bonds / disulphide bonds are. → We don’t want to handle h-bonds. Can offer export to MDTraj. Could offer API points that do SMARTS matching to find disulfide bonds.

    • I want to cluster groups of chains together so I can identify my aggregate protein blob I pulled from the PDB. → PDB loading / atom group perception / flexible atom/molecule bookkeeping and grouping.

      • eg. labeling molecules by micelle in a multi-micelle simulation, other spatial/custom molecule groupings

      • dummy strawman:

        • "groups": { "solvent": [0-6,7,9], "alchemical-molecule": [10], },
    • Biopolymers in mixtures of non-aqueous solvents → Interaction with bulk materials

    • I stuck a fork in an electrical outlet and would like to covalently link a ligand to a biopolymer → PTM (medium)

    • I want to modify an arbitrary set of amino acids with this weird modification, like deprotonation, fluorination, or replacing a functional group → PTM (small)

    • Nothing would make me happier than slapping a protein on a metal surface and just seeing what happens (physisorption) → Interaction with bulk materials

    • That didn’t work, so I’m going to form some chemical bonds between the surface and the protein (chemisorption) → Covalent attachment to bulk materials

    • I half-assed a system preparation and I have coordinates for each atom by index and can probably match up the indices with a corresponding PDB file → Loading correctly-indexed coordinates from PDB

    • Something something membrane proteins → PTM infrastructure

    • I have a bunch of micelles and I’d like to group up each of the micelles separately → position-based atom group perception / molecule vs. residue bookkeeping and grouping.

  • (2021_03_10 post @Lily Wang talk)

    • Defining connection points on residues (IF we grow the chain from this residue, then HERE’s where a H gets removed and another atom added) (this may not be a hard requirement, we don’t like molecule mutability anyway)

    • Taking the RESULTS of the above process and perceiving the underlying monomers in a desired way

    • Perceiving residues that be defined not just by the atoms IN the residue, but also the atoms NEIGHBORING the residue.

    • Connecting residues?

  • Later in the day, with John Chodera

    • Change one residue into another (OK if this produces a new topology and atom mapping)

    • Phosphorylate

    • Covalent ligands

    • Take a residue, and get a capped molecule with atom mapping

    • Take a residue, and get a UNcapped molecule with atom mapping → Make sure that these fragments can make it to an OEMol/RDMol with some basic capabilities.

    • Can get further use case info/feedback from Dominic Rufa

Box vectors/periodicity in System?

  • MT – Are octohedral box vectors technically “triclinic”? LAMMPS and HOOMD have a different representation than AMBER and GROMACS. My understanding is that L/H octahedral boxes have 6 angle parameters rather than 3. Do we care about supporting these?

    • SB – I’m inclined to say “no”

    • MT – Agree, this is an uncommon use case and we shouldn’t support it initially.

    • SB – could put a poll in #general to gather use cases

    • JW + SB – Mostly interested to ensure that folks can subclass/use the plugin interface to make octahedral/other weird boxes

  • MT – What should “own” box vectors? Currently I see them as “MUST/MAY” be on Topology. In the future I’d like the box vectors to NOT be on topology,and instead be somewhere else in System.

    • SB – Basically, I think coordinates and box vectors (if any) have to go together. So one idea would be to have something similar to the graph vs. 3d molecule, where one type of Topology doesn’t have coords/box vectors, and the other must have coords and may have box vectors. But that would introduce some ambiguity into what a System could expect to find in its data fields. Another idea is that a System could have fully separate Topology and Coordinates members

    • MT – Currently System has 4 major data members – Topology, Coordinates, Box Vectors, and Handlers.

    • Possible mergings?

      • Coordinates+box vectors?

        • There doesn’t seem to be a problem that this solves.

  • MT – Does box_vectors = None necessarily mean “nonperiodic”? Or could it mean “currently unset”?

  • MT – There are different ways that a box can be “partially periodic”, like being sandwiched between two slabs of metal. So we do want to be able to handle 2D peridocity.

  • SB – Maybe, instead of box vectors specifying periodicity, have booleans for X/Y/Z_is_periodic

Action items

Decisions