2020-09-03 Wagner Thompson Cheminformatics Meeting notes
Date
Sep 3, 2020
Participants
@Jeffrey Wagner
@Matt Thompson
Discussion topics
Item | Notes |
---|---|
Can we do bottom-to-top matching? |
|
Possible performance gains |
|
SMIRKS equivalence checking | Example of two equivalent SMIRKS: <Bond smirks="[H][C@@]([C]=O)([C:1]([H:2])([H])[S])[N][H]" length="1.09 * angstrom" k="680.0 * angstrom**-2 * mole**-1 * kilocalorie" id="A14SB-MainChain_CYX-2C_H1"></Bond>
<Bond smirks="[H][C@@]([C]=O)([C:1]([H])([H:2])[S])[N][H]" length="1.09 * angstrom" k="680.0 * angstrom**-2 * mole**-1 * kilocalorie" id="A14SB-MainChain_CYX-2C_H1"></Bond> Concerns about aromaticity in SMIRKS (more of a vague concern that an easy solution would overlook something important wrt aromaticity)
Closest example of a concrete problem with aromaticity in protein SMARTS:
Problem above is that the parameter below didn’t match a structure of ARG in a different resonance form.
I’m not sure whether the problem above is directly related to a problem we’d encounter in SMARTS deduplications – Really it’s a question of what we expect form different representations at different steps in the protein FF porting/parameter application pipeline. “Guanidinium” – https://en.wikipedia.org/wiki/Guanidine
[N+](=C(N([H])[H])N([H])[H])[H])[N][H]" from openforcefield.topology import Molecule
Molecule.from_smiles("[N+](=C(N([H])[H])N([H])[H])[H]")
mol = Molecule.from_smiles("[N+]([H])([H])(=C(N([H])[H])N([H])[H])")
mol.to_smiles() Difference between interpreting the above in SMILES vs. tagged SMARTS:
How does our current machinery interpret aromaticity in SMIRKS?
|
Initial implementation |
IMPORANTLY – False NEGATIVES are ok – Saying that two SMIRKS aren’t euqivalent when they really Are will just be an inconvenience in our planned workflows. However, False POSITIVES are really bad, since they’ll have deleteing parameters that don’t actually have a replacement/aren’t really redundant with another one. |