Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Excerpt

Initial approach as suggested stakeholder consensus.

👀 Overview

Summary

Create a QM dataset from an existing chemical structure databases running optimization , torsion-drive, and *new* electronic property calculation types. Datasets will be split into two sets, those with metal centers of primary interest and those of secondary interest. Those of secondary interest will only be run with OPT calculation with GFN2-XTB. Those of primary interest will be run with a target level of theory, to be determined in this work, and have the target electronic properties evaluated.and single-point calculations for structures of primary interest. Only datasets of primary interest will be produced, containing:

  • Metal centers of primary interest: Pd, Fe, Zn, Mg, Cu, Li

  • Single metal centers

  • Ligand chemical space with organic compound elements: C, H, P, S, O, N, F, Cl, Br

  • Charged molecules {+1, 0, −1}e

  • QM output with properties such as: energies, forces, partial charges, multipole moments, spin states

  • High-spin Fe complexes (e.g. up to 5/2)

The final model chemistry (level of theory + basis set) must be determined for production.

These datasets will be run with the following workflow:

  1. Run with OPT calculation with GFN2-XTB

  2. Run SP at GFN2-xTB

  3. Run Opt at final level of theory

  4. Run SP at final level of theory

To achieve this the standard OpenFF QCA dataset submission pipeline must be adapted in multiple ways.

Address incompatibility with

to address incompatibility of existing OpenFF infrastructure with Organometallic CMILES

Add ability for conformers to be imported into qc-submit

  • Create new dataset type with relevant properties.

  • GitHub link

    Status

    Status
    titleNOT STARTED
    Status
    colourYellow
    titleIn progressIN PROGRESS
    Status
    colourGreen
    titleCOMPLETED
    Status
    colourRed
    titleWON'T PROGRESS

    Table of Contents
    minLevel2
    maxLevel6
    outlinefalse
    stylenone
    typelist
    printabletrue

    \uD83D\uDEA9 Milestones and metrics

    Datasets will be labeled as DS#-XXX-1, e.g. DS1-CCD-1, which denotes dataset 1, taken from the CCD database, with the chemical space of primary interest defined above. The first number and the middle three letter code are always paired to avoid confusion between similar database abbreviations, e.g. CCD vs COD vs CSD. The last number denotes inclusion of metal centers of primary and one of the secondary chemical space expansions that are outside of the scope of this approach:

    Stage

    Milestone/Benchmark

    Contributors

    Deadline

    Status

    Add ability for conformers to be imported into qc-submit

    Assess ability for conformers to be added into qc-submit

    Jennifer A Clark

    Status
    titleNot started

    Resolve qc-submit CMILES incompatibility with organometallic complexes

    Determine if RDKit functionality will perform adequately

    Jennifer A Clark

    Status
    titleNot started

    If RDKit will not handle CMILES, skip for cif to qca interaction

    Jennifer A Clark

    Status
    titleNot started

    If RDKit will handle CMILES, assess work around for OpenEye, or implement error handling

    Jennifer A Clark

    Status
    titleNot started

    Curate opt training dataset

    Filter PDB Chemical Component Dictionary (CCD) and submit DS1-CCD-1 and DS1-CCD-2 at BP86 / def2-TZVP (split metal centers of primary and secondary interest)

    Jennifer A Clark, Brent Westbrook

    Jan. 15, 2025

    Status
    colourYellow
    titleIn progress

    Submit DS1-CCD-1 and DS1-CCD-2 at alternative model chemistries for assessment

    Jennifer A Clark

    Status
    titleNot started

    Choose model chemistry based off of DS1-CCD-1 and DS1-CCD-2

    Jennifer A Clark ,Lily Wang

    Status
    titleNot started

    Filter Crystallography Open Database (COD) and submit OPT DS2-COD-1 and DS2-COD-2 at GFN2-XTB

    Jennifer A Clark

    Status
    titleNot started

    Filter CSD (cambridge strucural database) and submit OPT DS3-CSD-1 and DS3-CSD-2 at GFN2-XTB with structures neglected by tmQM

    Jennifer A Clark

    Status
    titleNot started

    Filter MPtrj: Materials Project Trajectory Dataset and submit OPT DS4-MPT-1 and DS4-MPT-2 at GFN2-XTB

    Jennifer A Clark

    Status
    titleNot started

    Submit DS2-COD-1 OPT at target model chemistry

    Jennifer A Clark

    Status
    titleNot started

    Submit DS3-CSD-1 OPT at target model chemistry

    Jennifer A Clark

    Status
    titleNot started

    Submit DS4-MPT-1 OPT at target model chemistry

    Jennifer A Clark

    Status
    titleNot started

    Curate electronic properties training dataset

    Define primary and secondary properties of interest

    Jennifer A Clark , Chris Iacovella

    Status
    colourGreen
    titleNot startedCOMPLETED

    Determine output protocol of primary properties of interest and implement

    Jennifer A Clark

    Status
    colourYellow
    titleNot startedIN PROGRESS

    Determine output protocol of secondary properties of interest and implement

    Jennifer A Clark

    Status
    colourYellow
    titleNot startedIN PROGRESS

    Submit DS1-CCD-1 Electronic Property calculation at target model chemistry

    Jennifer A Clark

    Status
    titleNot started

    Submit DS2-COD-1 Electronic Property calculation at target model chemistry

    Jennifer A Clark

    Status
    titleNot started

    Submit DS3-CSD-1 Electronic Property calculation at target model chemistry

    Jennifer A Clark

    Status
    titleNot started

    Submit DS4-MPT-1 Electronic Property calculation at target model chemistry

    Jennifer A Clark

    Status
    titleNot started

    📊 Progress and findings

    Curated data (or similar title)