Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Current »

"Include a small subset (~100 molecules) of small molecules relevant to OpenFF (for distinguishing levels of theory etc below)"

Need variety of metal centers and organic moieties to achieve this better to have large swath at low level of theory

Decide on Level of Theory

"Include a small subset (~100 molecules) of small molecules relevant to OpenFF (for distinguishing levels of theory etc below)"

  • Is BP86 / def2-TZVP for primary and B3LYP-d3bj for final good enough?

Search through existing databases

Use Gemmi to handle dataset

  • PDB Chemical Component Dictionary (CCD)
    - Brent started, figure to out what's going on with our PDB dataset, all issues are SCF convergence
  •  
  • tmQM [paper, dataset] (which sources from CSD) - 86K transition metal complexes
    Utilize this existing dataset to begin testing out NNP strategy regarding encoding spin state all spin states = 0.
  • Crystallography Open Database (COD) – CC0 licensed
  • CSD (cambridge strucural database) filtered for stable small molecules, size, elements
    might need to discuss release of subset as open data
    Look for structures neglected by tmQM of interest?
  • MPtrj: Materials Project Trajectory Dataset

Simple augmentations [start working on in the background? Or clean existing and "batch" produce]

  • Any transition metal for any transition metal, let it run and see what happens (will get something relevant most of the time).
    Switching within column is almost always okay.
    [Will need in house tmQM for this]
  • substitutions: H->F, Ph->pMeOPh.
    Consider RDKit has "replacesubstructs" method
  • Use MD to search for transition states that are hard to find with conventional DFT, then minimize for a few steps. Better than rough conformations from RDKit. [Pursued by Chodera Lab]

Conformal search:

  • take from CIF files (CIF is the new PDB)

CMILES Issue: Organometallics are difficult, not supported

https://chemrxiv.org/engage/chemrxiv/article-details/676050a56dde43c9085b4ccd

  • Jeff thinks cif -> rdkit -> qca
    - not sure that's necessary because QCArchive has a hash to represent the molecules.
    - Don't have to use CMILES as the name (i.e., index), can be arbitrary.
    - QCArchive doesn't need CMILES, Prepare to pair program with JClark on cif --> QCA pipeline by bypassing openff molecule structures that are dependent on CMILES and directly compare cif files to QCArchive
  • RDKit had an organometallic class when assessing implicit hydrogens, reverse engineer an expression?
  • From 09-05 notes, CI: this is something we’ll need to consider for QCArchive too. It’s one Record per conformer, so we need to be able to associate a metadata record with a record.
    - QCArchive has a metadata file field to add such information. Also openff Molecule.conformer is an attribute containing pint.Quality class with coordinates and units. However, the list seems to usually only contain the coordinates of the molecule of main molecule.

Computed/stored properties

  • energies

  • forces

  • other properties

    • atomic spin density

    • partial charges (multiple methods)

    • Dipole moment / polarizability

    • orbital energies (+/- 5 molecular orbitals around highest occupied molecular orbital)

    • Allow us to see electronic structure of complexes

      • relative contributions of each atom to each orbital

      • coefficients? - too large!

  • This level of theory was used to compute the following properties: electronic and dispersion energies, HOMO and LUMO energies, HOMO/LUMO gap, dipole moment, and metal center charge, which was derived from NBO

  • Make a new dataset type for these properties, run opt as this dataset
  • Determine the keywords for psi4 to obtain these, change provided output? (Note tmQM used Gaussian NBO analysis for these, where output is trivial.)
    OptimizationResultCollection.create_basic_dataset pull final geom from opt and then create input for qca SP with these properties.
  • No labels