2022-07-06 Mitchell/Wagner Check-in meeting notes

Participants

  • @Josh Mitchell

  • @Jeffrey Wagner

Discussion topics

Item

Notes

Item

Notes

General updates

  • JW – We’ve made release candidate packages for the biopolymer refactor toolkit. I’m drafting the announcement now.

  • JM - solvation methodologies? Trying to figure out how to solvate box with a protein+ligand for new toolkit showcase.

    • JW – I think I used pdbfixer’s solvate functioanlity for this before (not Modeller). This is OK to deal with all sorts of junk. If it can do the protein but not the ligand, it may be worth solvating just the protein, then putting in the lgiand and deleting any clashing waters.

  • JM - from_polymer_pdb refactor?

    • JW – Some reasons I like the current functionality

    • OpenMM has a great PDB reader but it can only provide connectivity for things it knows about (namely protein residues). So for other things the user will need to provide CONECT records

    • A lot of people have workflows that result in non-canonical PDBs being made (lots of tools make PDBs with “incorrect” atom names, but where the element can be inferred)

    • I get the sense that perceiving inter-residue bonds is really hard, and I like that OpenMM handles them for us. If we tried to roll our own functionality, that would be one place where our functionality depends on correct PDB hierarchy info. I’d very much prefer that our functionality never depends on PDB hierarchy info.

  • Priorities for our PDB reading functioanlity

    • Load a spec-compliant protein PDB containing a HIS residue

    • Load a protein PDB containing an HID residue

    • Load a spec-complaint non-protein PDB containing a non-protein residue

    • Load a non-protein PDB containing a misnamed non-protein residue

    • Not require us to directly parse PDB

  • Scenarios:

    • Loading a nucleic acid

      • Element graph matching land: We just extend our substructure library since openmm already recognizes these

      • Atom name matching land: We just extend our substructure library

    • Loading a protein with PTMs

      • Element graph matching land: User adds the PTM into our substructure library and the user had to provide CONECT records for openmm

      • Atom name matching land: User adds the PTM into our substructure library and the atom names have to match

    • Loading a non-protein polymer

      • Element graph matching land: User has to provide CONECT records and adds “residues” into our substructure library

      • Atom name matching land: User has to add “residues” into our substructure library and the atom names have to match

  • (General) – Basically, the tradeoff is that we EITHER need atom names specified, OR CONECT records provided, in order to successfully assign the chemical graph of a polymer.

    • JW – I’ve found that a lot of the materials world doesn’t do residue names at all, and don’t use unique atom names

    • JM – Basically, if we’re going to accept HID as a histidine, then we’re already not spec-compliant. Also it’s worth noting that OMM renames HID to HIS

    • JW – Do we need to roll our own PDB parser if something doesn’t match OpenMM’s input expectations?

      • JM – No, OpenMM can load atoms from partially or fully unrecognized residues, it just won’t assign bonds to them.

    • JM – The UX will be a lot better if we do atom names - Right now we can’t tell users which residue template came closest to matching their unknown residue. Also, the assumptions going into atom name matching are much easier to explain.

    • JW – Element graph matching will let us load non-spec-compliant structures from the materials science world, and there are lots of biophysics programs that also give the wrong atom/residue names.

  • Summary: There are benefits to both. JW wants to get the release candidate out the door as soon as possible so we’ll do the initial release with element-matching. But since we’re only supporting “vanilla” proteins in this release, we can change the matching behavior/backend later when we’ve discussed it more.

  •  

PR stuff

JW gave early feedback on

JM will ping JW on slack when the above PR is ready for review.

Todos

  1. (high) Topology refactor - better errors from PDB loading

  2. (high) refactor toolkit showcase to not use ParmEd/OMMFFs (from 2022-06-22 Mitchell/Wagner Check-in meeting notes)

  3. (high) Make bespokefit faq page and add answer from https://openforcefieldgroup.slack.com/archives/C01G0J25S1Z/p1656442911965619?thread_ts=1656432717.350439&cid=C01G0J25S1Z

  4. (high) Resolve

  5. (medium) Skunkworks notebook bespokefit integration

  6. (medium) Centralized examples (from 2022-04-27 Mitchell/Wagner Check-in meeting notes)

    1. (high) bespokefit protein-ligand or ligand-in-water example

  7. (medium) Toolkit docs cleanup

  8. (medium) Toolkit revised user guide (+-unifying/centralizing package user guides)

  9. (medium) Come up with “milestones” for making first of three videos this year (like, “first video on covid spike protein and small molecule, filmed this day, edited in that range, etc…”)

  10. (medium) bibtex blocks for website “how to cite” page on http://openforcefield.org

  11. (low) Propose policy for using GH citation machinery

  12. (low) Check main website for broken links

  13. (low) Conda env yamls for each release (automated inside of Toolkit’s single-file-installer action)

Action items

Decisions