/
2021-09-28 Pulido/Wagner check-in

2021-09-28 Pulido/Wagner check-in

Date

Sep 28, 2021

Participants

  • @Jeffrey Wagner

  • @Iván Pulido

Discussion topics

Item

Notes

Item

Notes

Updates

  • JW

    • HierarchySchemes and Elements prototyped

    • Working on TopologyMolecule deprecation

  • IP

    • Checked out CCTBX, this DOES apply residue names and stuff, but it DOESN’T apply bondorders + formal charges. Largely aimed at crystallographers with messy data. So one module handles PDB, another handles MMCIF.

      • IP – Kinda tricky because it pulls in a bunch of dependencies

      • JW – Licensing issues?

        • (General) – It’s MIT or BSD3 licensed

      •  

Next steps

  • What loading pathways do we WANT to offer?

    • Load from Element + bond existence

      • Element PDB w/ CONECT

      • mmcif w/ bonds

    • Load from atomtyped representation with atom names matching a known typing scheme

      • Atomtyped PDB w/o CONECT (Perses entry point)

      • mmcif w/o bonds

  • What loading pathways CAN we offer?

    • (slow) Molecule.from_pdb (matching to residue templates)

      • Element PDB w/ CONECT → OFFMol

      • (would require more work) mmcif with bonds → OFFMol

    • CCTBX

      • Element PDB → Atomtyped PDB w/ CONECT

  • What prep method will people have used beforehand?

    • AMBER tleap protein prep → SDF that can probably be fixed

    • Schrodinger protein prep → ?

    • Chimera protein prep → ?

    • CCTBX cleanup →

    • PDBFixer → Atomtyped PDB w/ CONECT

    • Pymol mutagenesis wizard output → Atomtyped PDB w/ sometimes-messy CONECT

  • IP – We could speed up subgraph matching by splitting at peptide bonds.

    • JW – Agree. But how do we empower users to handle their own corner cases?

    • IP – Could let users add new residue SMILES

    • IP – Could let people match only a range of atoms for complex molecules

      • JW – This could work well, but then we may end up with partially-annotated molecules, and that could get really messy if people try to assign chemical information in different steps – When they try to convert to a full OFFMol, it’ll be hard to communicate which parts didn’t get bonds+formal charges.

  • JW will plan to have TypedMolecules optionally hold element, formal charge, stereo, and bond info, and potentially let them be upscaled to OFFMols if all info is present.

  • IP will try to speed up subgraph matching by splitting at peptide bonds. This will provide a prototype and early users to start providing feedback and finding corner cases.

 

  • IP – I spoke with DHahn the other day. I’ll be contributing a bit to the PLBenchmarks repo, and will probably also be involved in the continuous benchmarking efforts. I’m thinking about making a PLBenchmarks conda package.

  • JW – PLBenchmarks has a bunch of protein structures prepared in Schrodinger, so that will be a great source of example input data.

    • IP – CCTBX refused to read these, they probably violate the PDB spec in some way.

  • JW – I don’t think the biopolymer stuff will be in a major OpenFF Toolkit release in 2021. Instead, we should either direct people to do development builds from the branch, or I can make omnia conda packages from the topology-refactor branch

Action items

Decisions