Organometallic Tasks - Draft

"Include a small subset (~100 molecules) of small molecules relevant to OpenFF (for distinguishing levels of theory etc below)"

Need variety of metal centers and organic moieties to achieve this better to have large swath at low level of theory

Decide on Level of Theory

"Include a small subset (~100 molecules) of small molecules relevant to OpenFF (for distinguishing levels of theory etc below)"

Is BP86 / def2-TZVP for primary and B3LYP-d3bj for final good enough?

Search through existing databases

Use Gemmi to handle dataset

PDB Chemical Component Dictionary (CCD)
download and filter (Brent)
Submit dataset
figure to out what's going on with our PDB dataset, all issues are SCF convergence
tmQM [paper, dataset] (which sources from CSD) - 86K transition metal complexes
Utilize this existing dataset to begin testing out NNP strategy regarding encoding spin state all spin states = 0.
Currently being worked with by Chodera lab, should get into QCArchive at some level
Crystallography Open Database (COD) – CC0 licensed
CSD (cambridge strucural database) filtered for stable small molecules, size, elements
might need to discuss release of subset as open data
Look for structures neglected by tmQM of interest?
MPtrj: Materials Project Trajectory Dataset

Simple augmentations [start working on in the background? Or clean existing and "batch" produce]

Any transition metal for any transition metal, let it run and see what happens (will get something relevant most of the time).
Switching within column is almost always okay.
[Will need in house tmQM for this]
substitutions: H->F, Ph->pMeOPh.
Consider RDKit has "replacesubstructs" method
Use MD to search for transition states that are hard to find with conventional DFT, then minimize for a few steps. Better than rough conformations from RDKit. [Pursued by Chodera Lab]

Conformal search:

take from CIF files (CIF is the new PDB)

CMILES Issue: Organometallics are difficult, not supported

SMILES All Around: Structure to SMILES conversion for Transition Metal Complexes

Jeff thinks cif -> rdkit -> qca
- not sure that's necessary because QCArchive has a hash to represent the molecules.
- Don't have to use CMILES as the name (i.e., index), can be arbitrary.
- QCArchive doesn't need CMILES, Prepare to pair program with JClark on cif --> QCA pipeline by bypassing openff molecule structures that are dependent on CMILES and directly compare cif files to QCArchive
RDKit had an organometallic class when assessing implicit hydrogens, reverse engineer an expression?
From 09-05 notes, CI: this is something we’ll need to consider for QCArchive too. It’s one Record per conformer, so we need to be able to associate a metadata record with a record.
- QCArchive has a metadata file field to add such information. Also openff Molecule.conformer is an attribute containing pint.Quality class with coordinates and units. However, the list seems to usually only contain the coordinates of the molecule of main molecule.

Computed/stored properties

  • energies

  • forces

  • other properties

    • atomic spin density

    • partial charges (multiple methods)

    • Dipole moment / polarizability

    • orbital energies (+/- 5 molecular orbitals around highest occupied molecular orbital)

    • Allow us to see electronic structure of complexes

      • relative contributions of each atom to each orbital

      • coefficients? - too large!

  • This level of theory was used to compute the following properties: electronic and dispersion energies, HOMO and LUMO energies, HOMO/LUMO gap, dipole moment, and metal center charge, which was derived from NBO