"Include a small subset (~100 molecules) of small molecules relevant to OpenFF (for distinguishing levels of theory etc below)"
Need variety of metal centers and organic moieties to achieve this better to have large swath at low level of theory
Decide on Level of Theory
"Include a small subset (~100 molecules) of small molecules relevant to OpenFF (for distinguishing levels of theory etc below)"
- Is BP86 / def2-TZVP for primary and B3LYP-d3bj for final good enough?
Search through existing databases
Use Gemmi to handle dataset
- PDB Chemical Component Dictionary (CCD)
- Brent started, figure to out what's going on with our PDB dataset, all issues are SCF convergence - tmQM [paper, dataset] (which sources from CSD) - 86K transition metal complexes
Utilize this existing datasetto begin testing out NNP strategy regarding encoding spin stateall spin states = 0. - Crystallography Open Database (COD) – CC0 licensed
- CSD (cambridge strucural database) filtered for stable small molecules, size, elements
might need to discuss release of subset as open data
Look for structures neglected by tmQM of interest? - MPtrj: Materials Project Trajectory Dataset
Simple augmentations [start working on in the background? Or clean existing and "batch" produce]
- Any transition metal for any transition metal, let it run and see what happens (will get something relevant most of the time).
Switching within column is almost always okay.
[Will need in house tmQM for this] - substitutions: H->F, Ph->pMeOPh.
Consider RDKit has "replacesubstructs" method - Use MD to search for transition states that are hard to find with conventional DFT, then minimize for a few steps. Better than rough conformations from RDKit. [Pursued by Chodera Lab]
Conformal search:
- take from CIF files (CIF is the new PDB)
CMILES Issue: Organometallics are difficult, not supported
https://chemrxiv.org/engage/chemrxiv/article-details/676050a56dde43c9085b4ccd
- Jeff thinks cif -> rdkit -> qca
- not sure that's necessary because QCArchive has a hash to represent the molecules.
- Don't have to use CMILES as the name (i.e., index), can be arbitrary.
- QCArchive doesn't need CMILES, Prepare to pair program with JClark on cif --> QCA pipeline by bypassing openff molecule structures that are dependent on CMILES and directly compare cif files to QCArchive - RDKit had an organometallic class when assessing implicit hydrogens, reverse engineer an expression?
- From 09-05 notes, CI: this is something we’ll need to consider for QCArchive too. It’s one Record per conformer, so we need to be able to associate a metadata record with a record.
- QCArchive has a metadata file field to add such information. Also openffMolecule.conformer
is an attribute containingpint.Quality
class with coordinates and units. However, the list seems to usually only contain the coordinates of the molecule of main molecule.
Computed/stored properties
energies
forces
other properties
atomic spin density
partial charges (multiple methods)
Dipole moment / polarizability
orbital energies (+/- 5 molecular orbitals around highest occupied molecular orbital)
Allow us to see electronic structure of complexes
relative contributions of each atom to each orbital
coefficients? - too large!
This level of theory was used to compute the following properties: electronic and dispersion energies, HOMO and LUMO energies, HOMO/LUMO gap, dipole moment, and metal center charge, which was derived from NBO
- Make a new dataset type for these properties, run opt as this dataset
- Determine the keywords for psi4 to obtain these, change provided output? (Note tmQM used Gaussian NBO analysis for these, where output is trivial.)
OptimizationResultCollection.create_basic_dataset pull final geom from opt and then create input for qca SP with these properties.
Add Comment