"Include a small subset (~100 molecules) of small molecules relevant to OpenFF (for distinguishing levels of theory etc below)"
Has this been done? [Ask Chodera Lab]
Search through existing databases
- PDB Chemical Component Dictionary (CCD)
[Brent achived] - tmQM [paper, dataset] (which sources from CSD) - 86K transition metal complexes
Utilize this existing dataset to begin testing out NNP strategy regarding encoding spin state.
Chodera Lab has it? - Crystallography Open Database (COD) – CC0 licensed
- CSD (cambridge strucural database) filtered for stable small molecules, size, elements
might need to discuss release of subset as open data
[Repeat of tmQM?] - MPtrj: Materials Project Trajectory Dataset
Simple augmentations [start working on in the background? Or clean existing and "batch" produce]
- Any transition metal for any transition metal, let it run and see what happens (will get something relevant most of the time).
Switching within column is almost always okay.
[Will need in housr tmQM for this] - substitutions: H->F, Ph->pMeOPh.
Consider RDKit has "replacesubstructs" method
Conformal search:
- take from CIF files (CIF is the new PDB)
CMILES Issue: Organometallics are difficult, not supported
- Jeff thinks cif -> rdkit -> qca
- not sure that's necessary because QCArchive has a hash to represent the molecules.
- Don't have to use CMILES as the name (i.e., index), can be arbitrary.
- QCArchive doesn't need CMILES, Prepare to pair program with JClark on cif --> QCA pipeline by bypassing openff molecule structures that are dependent on CMILES and directly compare cif files to QCArchive - RDKit had an organometallic class when assessing implicit hydrogens, reverse engineer an expression?
- From 09-05 notes, CI: this is something we’ll need to consider for QCArchive too. It’s one Record per conformer, so we need to be able to associate a metadata record with a record. [QCArchive has a metadata file field to add such information]
Computed/stored properties
energies
forces
other properties
atomic spin density
partial charges (multiple methods)
Dipole moment / polarizability
orbital energies (+/- 5 molecular orbitals around highest occupied molecular orbital)
Allow us to see electronic structure of complexes
relative contributions of each atom to each orbital
coefficients? - too large!
This level of theory was used to compute the following properties: electronic and dispersion energies, HOMO and LUMO energies, HOMO/LUMO gap, dipole moment, and metal center charge, which was derived from NBO
- Make a new dataset type for these properties
- Determine the keywords for psi4 to obtain these, change provided output? (Note tmQM used Gaussian NBO analysis for these, where output is trivial.)
OptimizationResultCollection.create_basic_dataset pull final geom from opt and then create input for qca SP with these properties.
0 Comments