Continue defining difference between pre- and post-parameterization topology
Positions? Box vectors?
MT – Want box vectors and periodicity boolean tuple
Virtual sites?
How do we let people slip in stuff like pre-defined partial charges?
Biopolymer tie-ins?
3D mols vs. graph mols? Both? Just one?
Isotopes?
What are other hard questions about tracking state? What happens when we find another one in several years? Will the proposed object model make it possible to wrangle this complexity?
What from the existing API can stay? What behavior changes (eg. offmol.generate_conformers returns one conf? multiple molecules?
Biopolymer tie-ins
SB – Presents
JW – I like the behavior that this exposes, but I’m still kinda in favor of storing residue/chain/resid info on the atoms, and then just providing these API points as a view of the fully-detailed topology.
(General) – How can we decide which use cases we’ll need to support/will be common?
Could start coming up with concrete use cases and seeing which design(s) support their functionality.
Use case brainstorming:
Load from SDF, percieve residues
Load from PDB, fill in details according to known residues?
Load a structure with missing atoms
Detect unnatural AA and assign parameters, gracefully handle backbone interface w/ natural AAs
Take a canonical protein and covalently modify with sugar/PTM
Modified AA becomes a single residue
Modified AA becomes several residues (maybe one for each monosaccharide)
Create/break disulfide/other covalent bond between distant parts of chain
Support manually filling in missing atoms/loops
Change protonation state of a residue
Rename a residue
Support iteration over Biopolymer.residues when residues are separated by
resname/residue + resnum/resid
Attach a cofactor like heme, which connects to 4 other residues
Get the graphs of unique monomers in this biopolymer/topology.
Semi-Persistent biopolymer-level info (eg secondary structure prediction) →
Could store on all-atom, Biomonomer-instance, or whole-biopolymer level.
Arbitrary metadata attached to biopolymer (eg. “Ala131 has metadata indicating it’s my favorite, then I chemically modify Ala131. Is there still a way to see it’s my favorite?”) → Don’t store arbitrary metadata. Arbitrary metadata will clash with cached data (under what conditions is it persistent?) and pydantic data models (data formats must be well-defined)
Add a PTM to this one alanine, and then find the new graph in the biopolymer/topology. → Biomonomer instance vs Biomonomer type. (SB + JW – Don’t persistently store Biomonomer type. Though we could offer on-the-fly grouping of identical biomonomer graphs though an API point)
All of these residues are HIS, but they have different protonation → flexible atom/residue/molecule bookkeeping
Prepare a system for export to X format, and comply with that format’s residue names where possible → Plugins. Atom group perception / flexible atom/molecule bookkeeping and grouping
I want to build a biopolymer from “AlaValGly” string. → PolymerConstructor plugins/classmethods for Biopolymers
I want to build a custom biopolymer with non-standard residues. → PTM. PolymerConstructor plugins/classmethods for Biopolymers. Possible reaction SMARTS support? Possible support for other manual API points for modifying chemistry.
I pulled this janky PDB from the internet which may or may not be complete, I want to solvate it, parameterize it, add a ligand, and simulate. → PDB loading / atom group perception / flexible atom/molecule bookkeeping and grouping. We will make no effort to fill in information if a molecule is incomplete.
I want to get the atom indices of residue X / I want to pull out all of the ‘backbone' atom indices. → PDB loading / atom group perception / flexible atom/molecule bookkeeping and grouping
I want to know the secondary structure of my protein. → Can offer export to MDTraj. Could also have an API point to do the analysis automatically and label the OFFBiopolymer accordingly.
I want to know where all of the strong h-bonds / disulphide bonds are. → We don’t want to handle h-bonds. Can offer export to MDTraj. Could offer API points that do SMARTS matching to find disulfide bonds.
I want to cluster groups of chains together so I can identify my aggregate protein blob I pulled from the PDB. → PDB loading / atom group perception / flexible atom/molecule bookkeeping and grouping.
eg. labeling molecules by micelle in a multi-micelle simulation, other spatial/custom molecule groupings
Biopolymers in mixtures of non-aqueous solvents → Interaction with bulk materials
I stuck a fork in an electrical outlet and would like to covalently link a ligand to a biopolymer → PTM (medium)
I want to modify an arbitrary set of amino acids with this weird modification, like deprotonation, fluorination, or replacing a functional group → PTM (small)
Nothing would make me happier than slapping a protein on a metal surface and just seeing what happens (physisorption) → Interaction with bulk materials
That didn’t work, so I’m going to form some chemical bonds between the surface and the protein (chemisorption) → Covalent attachment to bulk materials
I half-assed a system preparation and I have coordinates for each atom by index and can probably match up the indices with a corresponding PDB file → Loading correctly-indexed coordinates from PDB
I have a bunch of micelles and I’d like to group up each of the micelles separately → position-based atom group perception / molecule vs. residue bookkeeping and grouping.
Box vectors/periodicity in System?
MT – Are octohedral box vectors technically “triclinic”? LAMMPS and HOOMD have a different representation than AMBER and GROMACS. My understanding is that L/H octahedral boxes have 6 angle parameters rather than 3. Do we care about supporting these?
SB – I’m inclined to say “no”
MT – Agree, this is an uncommon use case and we shouldn’t support it initially.
SB – could put a poll in #general to gather use cases
JW + SB – Mostly interested to ensure that folks can subclass/use the plugin interface to make octahedral/other weird boxes
MT – What should “own” box vectors? Currently I see them as “MUST/MAY” be on Topology. In the future I’d like the box vectors to NOT be on topology,and instead be somewhere else in System.
SB – Basically, I think coordinates and box vectors (if any) have to go together. So one idea would be to have something similar to the graph vs. 3d molecule, where one type of Topology doesn’t have coords/box vectors, and the other must have coords and may have box vectors. But that would introduce some ambiguity into what a System could expect to find in its data fields. Another idea is that a System could have fully separate Topology and Coordinates members
MT – Currently System has 4 major data members – Topology, Coordinates, Box Vectors, and Handlers.
Possible mergings?
Coordinates+box vectors?
There doesn’t seem to be a problem that this solves.
MT – Does box_vectors = None necessarily mean “nonperiodic”? Or could it mean “currently unset”?
MT – There are different ways that a box can be “partially periodic”, like being sandwiched between two slabs of metal. So we do want to be able to handle 2D peridocity.
SB – Maybe, instead of box vectors specifying periodicity, have booleans for X/Y/Z_is_periodic
Add Comment