2021-05-25 Topology Refactor Working Session Meeting notes

Date

May 25, 2021

Participants

  • @Jeffrey Wagner

  • @Iván Pulido

Discussion topics

Item

Notes

Item

Notes

Processing the CCD

  • IP – Found aa-variants.cif from PDB – 218 entries with various protonation states. Contains main chain and terminal variants.

    • This file sometimes has long codes for variants, like “his_ll_dhd1”.

      • What does the suffix mean?

        • Our default dictionary should just take the three_letter_code as the primary ID

        • We don’t know and it’s probably not necessary to know. There are hints/a longer description in the _chem_comp.name field

      • Should we always call histidine “HIS”, or should we separate it into “HIE/HID/HIP”?

        • This will affect whether we can export residue schemes that can be interpreted by existing programs

        • Same with Lysine, Asn, Gln, Asp, Glu

        • Produce two substructure dictionaries, and offer a kwarg to perceive_residues for which to use (AMBER vs. CCD)

          • Let’s just make the CCD dictionary for now. Can do AMBER later if we have time.

      • How do we handle the leaving atoms?

        • Make SMARTS both WITH all leaving atoms, and also WITHOUT any leaving atoms.

      • How do we filter out unphysical/unnecessary SMARTS?

        • Options for removing

          • Do they all have “fragment“ in the name?

          • Filter out SMARTS that have N- C- or C+

          • Filter out cif structures with FRAGMENT in the name

        • Not sure why these are in the database at all. If we need them later we can re-add them

  • How do we validate the substructure library?

    • Checking for equivalence with the AMBER-derived SMARTS

    • Checking that we can label the AMBER structures' residues

    • Checking against T4

Creating feature branch

Opened

Next steps:

  • Making substructure dictionary from CCD

    • Which method(s) to make public?

      • Make it all private (maybe a file _cif_to_substructure_dict.py with the classes+methods)

    • How to handle extra dependencies?

      • @requires_package decorator

    • How to package data for testing?

      • Could have a way to download release tarball from amber_ff_porting

      • Normal tests could be done on t4-protein.pdb and t4-protein.sdf (we can make the sdf)

        • Decorate these tests with pytest.mark.slow

    • IP will open a PR to bring in CIF-reading functionality, and include the current substructure dictionary.

    • JW made T4-protein.sdf using OpenEye

      • IP will review + merge

  • Loading from PDB

    • Where can we get a PDB with CONECT records?

      • IP – Maybe an output from GLYCAM/Charmm-GUI?

      • JW – Otherwise we may be able to hack something using ambertools?

      • (Tried CHARMM-GUI, Chimera, GLYCAM, other tools, no luck getting CONECT records)

      • We’ll keep looking into where to get PDB files with CONECT records

  • Adding tests

  • Adding atom metadata

  • HierarchyElements and HierarchySchemes implementation

  • TypedAtom+TypedMolecule

  • merge_molecules

Canonical atom ordering

 

Antechamber issues

 

Action items

Decisions