General updates | JW Not much generally to report. I’m working on getting evaluator jobs to run on NRP with LW, fixing forcebalance CI, and some general maintenance stuff. I haven’t made progress on the substructure processing/PDB loading stuff in the last week. Try out some Interchange interop stuff and see if there’s a PR that’s natural to open. JM – Sounds good, I’ve already got ideas. It’d be cool to write MDP files. JW – Sounds good, also it’d be good to look at user experience polishing and/or parameter import/export validation.
JM Update on PDB census - Working on the flowchart for this. Kinda overwhelming. Working here PDBFixer PR opened Made a PDB reader of some sort - Trying to figure out what the “ground truth” is for PDB files in the world wide PDB… One idea is to implement a reader that tries to match to CCD reference and informatively record errors, so that we can group the errors and make functionality to handle them. JW – JM – My current implemented reader loads a file from the wwPDB correctly Missing atoms, including hydrogens, are filled in (without coordinates) from the CCD Atoms missing from the CCD residue definition raise an error Custom residues or residues that do not conform to the CCD are not supported, but a residue definition could easily be provided by the user or software to fill this gap Bonds between residues are defined from the CCD’s “linking type” Performance is v. slow - Molecule objects are used to build up the chemical graph, this could be easily improved The only chemical databases needed are the CCD (which is downloaded and cached as needed) and a dictionary from linking types to the bond between residues
JW – This is a bit broader than just the PDB loading census context I was thinking of. We do have improved PDB/polymer loading as one of our roadmap goals, but its full list of requirements is a bit longer: (optionally) support fast lookup by name Support looking with “incorrect” atom/residue names (match by CONECT+elements) Support user-specified residues Be performant for stuff from materials-science-land (200+ atom residues) Be able to read from current substructure format If there’s a symmetric way to assign chemistry, it’s OK as long as both matchings assign identical bond order+formal charge (modulo trivial stuff like carboxylates) Inter-residue bonds are checked to ensure that both “neighbors” agree on the order Stereochemistry and aromaticity from templates should be ignored - Instead get re-percieved from 3D (stereo) and using the MDL aro model.
JW – Do we need to do missing atom replacement at all? It seems like we’d be duplicating PDBFixer’s functionality there JW – Generally, the goal of “load the PDB” is a bit of a misnomer - We want to develop a broad understanding what types of chemistries we can and can’t load+model, and get a strategic picture of how we can invest effort to get the most benefit. So expecting only eg. the protonated forms of things might be “CCD compliant”, but it’s not functionally useful to our users.
JW – I’m thinking about how this fits into our roadmap - Probably the best way to go about this would be to wrap up the PDB loading study/census, make some slides/a recording I can show to the ad board, and say “I recommend we let Josh build this loader that he has in mind”
|