Project updates | OE vs AT AM1-mulliken, never using optimization, with symmetry on and off Only taking BCC contributions from each toolkit CD: looks like BCC corrections are a problem. They vary a lot between OE and AT by molecule. JW – It’s weird that this is looking like a continuum on the y=x line. I’d expect it to look like a really clumpy distribution that wither looks like a grid or some vertical lines LW – There are 141 unique BCC values (charge values) for SMIRKS involving N in some way (either including or excluding), so it may look smoother than JW thinks. There are 251 unique BCC values in total.
We get perfect agreement between OE and AT with no-symmetry and no-optimization and no BCC (i.e. AM1). A small error (difference) comes in when we tell OE to perform symmetrization. A large error comes in when we introduce BCCs. JW hypothesis: AMBER is not loading the molecules properly. It’s not reading the symmetry properly and not applying BCCs correctly. Or OE is changing the molecular graph after we give it to QUACPAC. LW: I wonder if RDKit is assigning bond orders correctly. “While it is possible for RDKit to erroneously label these properties for some molecules (particularly for nuanced concepts such as aromaticity, as shown by the rightmost “aromatic” molecule as classified by RDKit, which is in fact not aromatic) […] As an example, aromaticity is a concept that RDKit acknowledges is difficult to capture algorithmically, and thus may be misclassified for some molecules (molecules which have their aromaticity broken by steric strain is an intuitive example).” (doi.org/10.1021/acs.jcim.1c00519)
JW – How does this tie into the overall goals of this project? Give Connor research experience If this is a technical bug, then it’s not a great research project. JW and/or LW should probably write the bug down somewhere searchable.
Get the toolkit to get more consistent charges between backends Determine whether ELF1 is better than random conformer selection.
Using no optimization, no symm, random confs JW – This looks great. The RMSE has dropped a ton. CD – The second plot only has part of the dataset complete. I’ll ping you when it’s complete. |