Page Comparison

"Include a small subset (~100 molecules) of small molecules relevant to OpenFF (for distinguishing levels of theory etc below)"

Has this been done? [Ask Chodera Lab]Need variety of metal centers and organic moieties to achieve this better to have large swath at low level of theory

Decide on Level of Theory

...

Is BP86 / def2-TZVP for primary and B3LYP-d3bj for final good enough?

Search through existing databases

Use Gemmi to handle dataset

PDB Chemical Component Dictionary (CCD)
[Brent achived] - Brent started, figure to out what's going on with our PDB dataset, all issues are SCF convergence
tmQM [paper, dataset] (which sources from CSD) - 86K transition metal complexes
Utilize this existing dataset ~~to begin testing out NNP strategy regarding encoding spin state~~ all spin states = 0.
Crystallography Open Database (COD) – CC0 licensed
CSD (cambridge strucural database) filtered for stable small molecules, size, elements
might need to discuss release of subset as open data
[Repeat of tmQM?] Look for structures neglected by tmQM of interest?
MPtrj: Materials Project Trajectory Dataset

...

Any transition metal for any transition metal, let it run and see what happens (will get something relevant most of the time).
Switching within column is almost always okay.
[Will need in house tmQM for this]
substitutions: H->F, Ph->pMeOPh.
Consider RDKit has "replacesubstructs" method
Use MD to search for transition states that are hard to find with conventional DFT, then minimize for a few steps. Better than rough conformations from RDKit. [Pursued by Chodera Lab]

Conformal search:

take from CIF files (CIF is the new PDB)

CMILES Issue: Organometallics are difficult, not supported

https://chemrxiv.org/engage/chemrxiv/article-details/676050a56dde43c9085b4ccd

Jeff thinks cif -> rdkit -> qca
- not sure that's necessary because QCArchive has a hash to represent the molecules.
- Don't have to use CMILES as the name (i.e., index), can be arbitrary.
- QCArchive doesn't need CMILES, Prepare to pair program with JClark on cif --> QCA pipeline by bypassing openff molecule structures that are dependent on CMILES and directly compare cif files to QCArchive
RDKit had an organometallic class when assessing implicit hydrogens, reverse engineer an expression?
From 09-05 notes, CI: this is something we’ll need to consider for QCArchive too. It’s one Record per conformer, so we need to be able to associate a metadata record with a record. [
- QCArchive has a metadata file field to add such information]. Also openff Molecule.conformer is an attribute containing pint.Quality class with coordinates and units. However, the list seems to usually only contain the coordinates of the molecule of main molecule.

Computed/stored properties

...

Make a new dataset type for these properties, run opt as this dataset
Determine the keywords for psi4 to obtain these, change provided output? (Note tmQM used Gaussian NBO analysis for these, where output is trivial.)
OptimizationResultCollection.create_basic_dataset pull final geom from opt and then create input for qca SP with these properties.

Versions Compared

Old Version 2

New Version 3

Key

"Include a small subset (~100 molecules) of small molecules relevant to OpenFF (for distinguishing levels of theory etc below)"

Decide on Level of Theory

Search through existing databases

Conformal search:

CMILES Issue: Organometallics are difficult, not supported

Computed/stored properties