Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

"Include a small subset (~100 molecules) of small molecules relevant to OpenFF (for distinguishing levels of theory etc below)"

Has this been done? [Ask Chodera Lab]Need variety of metal centers and organic moieties to achieve this better to have large swath at low level of theory

Decide on Level of Theory

...

  •  Is BP86 / def2-TZVP for primary and B3LYP-d3bj for final good enough?

Search through existing databases

Use Gemmi to handle dataset

  •  PDB Chemical Component Dictionary (CCD)
    [Brent achived]
    •  download and filter (Brent)
    •  Submit dataset
      figure to out what's going on with our PDB dataset, all issues are SCF convergence
  •  tmQM [paper, dataset] (which sources from CSD) - 86K transition metal complexes
    Utilize this existing dataset to begin testing out NNP strategy regarding encoding spin state all spin states = 0.
    Currently being worked with by Chodera lab, should get into QCArchive at some level
  •  Crystallography Open Database (COD) – CC0 licensed
  •  CSD (cambridge strucural database) filtered for stable small molecules, size, elements
    might need to discuss release of subset as open data
    [Repeat of tmQM?] Look for structures neglected by tmQM of interest?
  •  MPtrj: Materials Project Trajectory Dataset

...

  •  Any transition metal for any transition metal, let it run and see what happens (will get something relevant most of the time).
    Switching within column is almost always okay.
    [Will need in house tmQM for this]
  •  substitutions: H->F, Ph->pMeOPh.
    Consider RDKit has "replacesubstructs" method
  •  Use MD to search for transition states that are hard to find with conventional DFT, then minimize for a few steps. Better than rough conformations from RDKit. [Pursued by Chodera Lab]

Conformal search:

  •  take from CIF files (CIF is the new PDB)

CMILES Issue: Organometallics are difficult, not supported

https://chemrxiv.org/engage/chemrxiv/article-details/676050a56dde43c9085b4ccd

  •  Jeff thinks cif -> rdkit -> qca
    - not sure that's necessary because QCArchive has a hash to represent the molecules.
    - Don't have to use CMILES as the name (i.e., index), can be arbitrary.
    - QCArchive doesn't need CMILES, Prepare to pair program with JClark on cif --> QCA pipeline by bypassing openff molecule structures that are dependent on CMILES and directly compare cif files to QCArchive
  •  RDKit had an organometallic class when assessing implicit hydrogens, reverse engineer an expression?
  •  From 09-05 notes, CI: this is something we’ll need to consider for QCArchive too. It’s one Record per conformer, so we need to be able to associate a metadata record with a record. [
    - QCArchive has a metadata file field to add such information]. Also openff Molecule.conformer is an attribute containing pint.Quality class with coordinates and units. However, the list seems to usually only contain the coordinates of the molecule of main molecule.

Computed/stored properties

...

  •  Make a new dataset type for these properties, run opt as this dataset
  •  Determine the keywords for psi4 to obtain these, change provided output? (Note tmQM used Gaussian NBO analysis for these, where output is trivial.)
    OptimizationResultCollection.create_basic_dataset pull final geom from opt and then create input for qca SP with these properties.