Working session on API examples | “User has their final protein structure, and wants to start a simulation” 75% of potential users will start with a “dirty” pdb structure, with missing protons and other problems. This will be very hard to accomodate. 20% of potential users will start with a PDB with protons assigned, such that substructure matching can help us fill in bond orders. This will be “medium difficulty” to accomodate (add bond orders/formal charges) 5% of potential users will somehow start with a protein as an SDF, and this requires little extra work to accomodate. (just add residue names/numbers) JW – We want to support the 20% and guide the 75% to another tool that will clean up their PDB structures
“User wants to modify an existing protein structure by covalently modifying a residue” Modifying one canonical amino acid to another canonical amino acid (including protonation state changes) Modifying a canonical amino acid to an unnatural amino acid
“User wants to stick a protein on a surface”
In the above cases, what are the user’s expectations for perceiving, iterating over, and modifying residues or other groups? Can these be handled by MDTraj/MDA? How can we ensure that the residue info that we assign/load will be compatible with MDTraj’s iterators? We can make protein.perceive_residues default behavior be MDTraj-compatible In file loaders that we implement, we can ensure that the data fields are loaded in an MDTraj-compatible way In the developer docs, we can state which metadata fields we expect to be populated in order to successfully run to_mdtraj (and list expected behavior in edge cases, like a converting metal surface without residue names to mdtraj) Can MDTraj see a bunch of atoms with residue information and generate the residue iterators over them? Or would WE need to explicitly provide both the atom info, and also the hierarchy info, before its residue iterators would work?
Matt Thompson can help us figure out what the “minimal hierarchy info” looks like, since the first major place where residue info can have an impact is in system export to other formats.
|