2021-05-20 Topology Working Session Meeting notes

Date

May 20, 2021

Participants

@Jeffrey Wagner
@Iván Pulido
@Lily Wang

Discussion topics

Notes

Notes
IP – I’ve made substructures that fully match T4 lysozyme based on AMBER substructures IP – There's a problem with excessive runtime to do all this substructure matching. JW – This is hard to avoid because we can’t reduce the protein as we label parts of it, and we will likely have to search through the entire SMARTS list because a real protein will have at least one of each amino acid. IP – It would be very helpful to have the caching implementation done JW – I could merge the incomplete implementation into the biopolymer topology feature branch and leave a failing test for it, so that we know to fix it before we merge. Let’s do this – Could even try during today’s working session JW – Could also, instead of using `find_smarts_matches`, which runs `to_rdkit` every time, we could make `find_multi_smarts_matches`, which takes as input a LIST of smarts, and only runs `to_rdkit` once for the whole thing. IP – When matching, if multiple substructures match the same atoms, I take the largest one. JW – This is a pretty good idea. But what if their residue database has one residue that looks like alanine, and another that looks like alanine+a neighboring backbone. How would this know what to do with a ALA-ALA sequence? Other sources of protein structures for testing: PDBs: https://github.com/MCompChem/fep-benchmark/ SDFs: To get SDF from PDB: in tleap: `source leaprc.protein.ff14SB mol = loadPdb ALA.pdb saveMol2 mol "ALA.mol2" 0` in python: `from openff.toolkit.topology import Molecule # Will need to fix carboxylate bond orders here mol.to_file('ALA.sdf', file_format='sdf')` Where else could protein SDFs come from? IP will ask on developers channel, possibly also directly ask perses devs.
IP – I’ve also tried out some mmcif parsers. None of them are particularly friendly, but I think biopython is the best. (IP gave demo of using mmcif to read T4 lysozyme and show iterators) JW – Can it read components.cif? IP – It doesn’t do a very good job. It seems to overwrite a lot of what it reads because it doesn’t understand the multi-entry format of components.cif. (General) – We could either chunk up this file using python `readlines` ourselves, or keep looking into the API docs for biopython to see if it gives a different kind of iterator.
RDKit deterministic confs PR The reduction in the number of generated conformers is not due to the canonical ordering – It’s all due to changes in the 2021.03 RDKit release OpenEye tests started failing when we set `omega.SetCanonOrder(True)`. This is the OPPOSITE of what we’d expect. IP will revert the OE change in the PR IP will contact support@eyesopen.com with a reproducing example of the behavior – We shouldn’t even need to run AM1 calculations, just the omega conformer generation (did more digging, more details in PR comments)

Meetings

2021-05-20 Topology Working Session Meeting notes

Date

Participants

Discussion topics

Action items

Decisions