IP – I’ve made substructures that fully match T4 lysozyme based on AMBER substructures IP – There's a problem with excessive runtime to do all this substructure matching. JW – This is hard to avoid because we can’t reduce the protein as we label parts of it, and we will likely have to search through the entire SMARTS list because a real protein will have at least one of each amino acid. IP – It would be very helpful to have the caching implementation done JW – I could merge the incomplete implementation into the biopolymer topology feature branch and leave a failing test for it, so that we know to fix it before we merge. JW – Could also, instead of using find_smarts_matches , which runs to_rdkit every time, we could make find_multi_smarts_matches , which takes as input a LIST of smarts, and only runs to_rdkit once for the whole thing.
IP – When matching, if multiple substructures match the same atoms, I take the largest one. Other sources of protein structures for testing:
|