"Include a small subset (~100 molecules) of small molecules relevant to OpenFF (for distinguishing levels of theory etc below)"
Has this been done? [Ask Chodera Lab]Need variety of metal centers and organic moieties to achieve this better to have large swath at low level of theory
Decide on Level of Theory
...
- Is BP86 / def2-TZVP for primary and B3LYP-d3bj for final good enough?
Search through existing databases
Use Gemmi to handle dataset
- PDB Chemical Component Dictionary (CCD)
[Brent achived] - download and filter (Brent)
- Submit dataset
figure to out what's going on with our PDB dataset, all issues are SCF convergence - tmQM [paper, dataset] (which sources from CSD) - 86K transition metal complexes
Utilize this existing datasetto begin testing out NNP strategy regarding encoding spin stateall spin states = 0.
Currently being worked with by Chodera lab, should get into QCArchive at some level - Crystallography Open Database (COD) – CC0 licensed
- CSD (cambridge strucural database) filtered for stable small molecules, size, elements
might need to discuss release of subset as open data
[Repeat of tmQM?] Look for structures neglected by tmQM of interest? - MPtrj: Materials Project Trajectory Dataset
...
- Any transition metal for any transition metal, let it run and see what happens (will get something relevant most of the time).
Switching within column is almost always okay.
[Will need in house tmQM for this] - substitutions: H->F, Ph->pMeOPh.
Consider RDKit has "replacesubstructs" method - Use MD to search for transition states that are hard to find with conventional DFT, then minimize for a few steps. Better than rough conformations from RDKit. [Pursued by Chodera Lab]
Conformal search:
- take from CIF files (CIF is the new PDB)
CMILES Issue: Organometallics are difficult, not supported
https://chemrxiv.org/engage/chemrxiv/article-details/676050a56dde43c9085b4ccd
- Jeff thinks cif -> rdkit -> qca
- not sure that's necessary because QCArchive has a hash to represent the molecules.
- Don't have to use CMILES as the name (i.e., index), can be arbitrary.
- QCArchive doesn't need CMILES, Prepare to pair program with JClark on cif --> QCA pipeline by bypassing openff molecule structures that are dependent on CMILES and directly compare cif files to QCArchive - RDKit had an organometallic class when assessing implicit hydrogens, reverse engineer an expression?
- From 09-05 notes, CI: this is something we’ll need to consider for QCArchive too. It’s one Record per conformer, so we need to be able to associate a metadata record with a record. [
- QCArchive has a metadata file field to add such information]. Also openffMolecule.conformer
is an attribute containingpint.Quality
class with coordinates and units. However, the list seems to usually only contain the coordinates of the molecule of main molecule.
Computed/stored properties
...
- Make a new dataset type for these properties, run opt as this dataset
- Determine the keywords for psi4 to obtain these, change provided output? (Note tmQM used Gaussian NBO analysis for these, where output is trivial.)
OptimizationResultCollection.create_basic_dataset pull final geom from opt and then create input for qca SP with these properties.