Page Comparison

...

Discussion topics

Item

NotesNotes

IP – I’ve made substructures that fully match T4 lysozyme based on AMBER substructures
- IP – There's a problem with excessive runtime to do all this substructure matching.
- JW – This is hard to avoid because we can’t reduce the protein as we label parts of it, and we will likely have to search through the entire SMARTS list because a real protein will have at least one of each amino acid.
- IP – It would be very helpful to have the caching implementation done
- JW – I could merge the incomplete implementation into the biopolymer topology feature branch and leave a failing test for it, so that we know to fix it before we merge.
  - Let’s do this – Could even try during today’s working session
- JW – Could also, instead of using find_smarts_matches, which runs to_rdkit every time, we could make find_multi_smarts_matches, which takes as input a LIST of smarts, and only runs to_rdkit once for the whole thing.
IP – When matching, if multiple substructures match the same atoms, I take the largest one.
- JW – This is a pretty good idea. But what if their residue database has one residue that looks like alanine, and another that looks like alanine+a neighboring backbone. How would this know what to do with a ALA-ALA sequence?
Other sources of protein structures for testing:
- PDBs:
  - Github link macro
    link https://github.com/openforcefield/protein-ligand-benchmark/tree/master/data
  - https://github.com/MCompChem/fep-benchmark/
- SDFs:
  - To get SDF from PDB:
    - in tleap:
      Code Block
      source leaprc.protein.ff14SB mol = loadPdb ALA.pdb saveMol2 mol "ALA.mol2" 0
      in python:
    - Code Block
      from openff.toolkit.topology import Molecule # Will need to fix carboxylate bond orders here mol.to_file('ALA.sdf', file_format='sdf')
  - Where else could protein SDFs come from? IP will ask on developers channel, possibly also directly ask perses devs.

IP – I’ve also tried out some mmcif parsers. None of them are particularly friendly, but I think biopython is the best.
- (IP gave demo of using mmcif to read T4 lysozyme and show iterators)
- JW – Can it read components.cif?
- IP – It doesn’t do a very good job. It seems to overwrite a lot of what it reads because it doesn’t understand the multi-entry format of components.cif.
  - (General) – We could either chunk up this file using python readlines ourselves, or keep looking into the API docs for biopython to see if it gives a different kind of iterator.

RDKit deterministic confs PR
- The reduction in the number of generated conformers is not due to the canonical ordering – It’s all due to changes in the 2021.03 RDKit release
- OpenEye tests started failing when we set omega.SetCanonOrder(True). This is the OPPOSITE of what we’d expect.
  - IP will revert the OE change in the PR
  - IP will contact support@eyesopen.com with a reproducing example of the behavior – We shouldn’t even need to run AM1 calculations, just the omega conformer generation
  - (did more digging, more details in PR comments)
    Github link macro
    link https://github.com/openforcefield/openff-toolkit/pull/942

Versions Compared

Old Version 1

New Version Current

Key

Discussion topics

Action items

Decisions