2021-05-18 Topology refactor Meeting notes

Date

May 18, 2021

Participants

  • @Jeffrey Wagner

  • @Iván Pulido

Goals

  •  

Discussion topics

Item

Notes

Item

Notes

Residue substructure perception

  • Some difficulty doing residue perception – Different resonance forms of ARG/HIS

    • Tried to do the “bond order = 4” trick to set those resonance-prone substructures to have “any” bond order in final smarts, but the original trick used OpenEye, whereas this one uses RDKit, and bond roders of 4 choke sanitization in offmol.to_rdkit

    • So, should we keep trying to specifically do ARG and HIP substructures selectively with “any bond order” patterns? OR should we do ALL substructures with “any bond order” patterns?

      • If we do only ARG and HIP, then we’re a bit safer

      • If we do all substructures, then it’s a little bit dangerous (since, without bond orders, GLU and GLH will both match the same atoms), but then we’ll only have one dict to handle reading both SDF and PDB.

      • So, let’s try to rewrite the bond-order-4 logic to use RDKit instead of OFFTK

    • (We rewrote it using the RDKit API)

    • IP – Running this takes a long time now (like, ~1 minute for T4 lysozyme)

      • JW – working on improving that. You may get better performance with



On Fri, May 14, 2021, Jeffrey Wagner wrote: > >Sorry to run this off on a tangent, but OpenFF is also trying to >incorporate the standard definitions from components.cif into some of our >own work. It's turning out to be not-entirely-trivial -- basically, we're >struggling to distinguish when entries in that file are describing residue >_substructures_ (as they'd appear in the middle of a chain), versus just an >uncapped instance of the residue as would be found floating around in >solution. Hi Jeff: I'm guessing that just seeing if the chem_comp.type contains the word "LINKING" is not enough, then. Does it help to look at the chem_comp.pdbx_type field, searching for "ATOMP"? Examples of failures would be helpful, as we (along with many others) are trying to automate MM setup procedures. ...thx...dac
  • Deterministic conformer generation

    • IP will make a table of how many conformers are generates for different aliphatic carbon chain lengths.

  • Other implementation toward new topology



Improving Caching in parametrization runtime

  • #881 is really complicated. IP shouldn’t try to finish it, JW needs to do that. If protein-searching tests end up taking a long time, then IP can check it out, knowing that it’s unstable.

Canonical atom ordering before conformer generation

  • Canonical order is messing up the current multiple conformer generation toolkit test.

  • IP will create a PR showing the changes and a table with the number of conformers being generated for a chain of carbons with increasing size.

    • Do this for old and new behavior.

    • for i in range(5,15): smiles = 'C' * i mol_oe = Molecule.from_smiles(smiles) mol_rdk = Molecule.from_smiles(smiles) mol_oe.generate_conformers(n_conformers=100, toolkit_registry=OpenEyeToolkitWrapper()) mol_rdk.generate_conformers(n_conformers=100, toolkit_registry=RDKitToolkitWrapper()) print(i, mol_oe.n_conformers, mol_rdk.n_conformers)
  • JW – Is this a “good” change? Many people using current generate_conformers can get unexpected behavior and complain

Action items

Decisions