...
Has this been done? [Ask Chodera Lab]
Decide on Level of Theory
"Include a small subset (~100 molecules) of small molecules relevant to OpenFF (for distinguishing levels of theory etc below)"
- Is BP86 / def2-TZVP for primary and B3LYP-d3bj for final?
Search through existing databases
- PDB Chemical Component Dictionary (CCD)
[Brent achived] - tmQM [paper, dataset] (which sources from CSD) - 86K transition metal complexes
Utilize this existing dataset to begin testing out NNP strategy regarding encoding spin state.
Chodera Lab has it? - Crystallography Open Database (COD) – CC0 licensed
- CSD (cambridge strucural database) filtered for stable small molecules, size, elements
might need to discuss release of subset as open data
[Repeat of tmQM?] - MPtrj: Materials Project Trajectory Dataset
...
- Any transition metal for any transition metal, let it run and see what happens (will get something relevant most of the time).
Switching within column is almost always okay.
[Will need in housr house tmQM for this] - substitutions: H->F, Ph->pMeOPh.
Consider RDKit has "replacesubstructs" method
...