Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Excerpt

Initial approach as suggested stakeholder consensus.

👀 Overview

Summary

Create a QM dataset from an existing chemical structure databases running optimization, torsion-drive, and *new* electronic property calculation types. Datasets will be split into two sets, those with metal centers of primary interest (i.e., Pd, Fe, Zn, Mg, Cu, Li) and those of secondary interest (i.e., Rh, Ir, Pt, Ni, Cr, Ag). Those of secondary interest will only be run with OPT calculation with GFN2-XTB. Those of primary interest will be run with a target level of theory, to be determined in this work, and have the target electronic properties evaluated. These dataset will be accompanied by organic compound elements: C, H, P, S, O, N, F, Cl, Br.

To achieve this the standard OpenFF QCA dataset submission pipeline must be adapted in multiple ways.

  1. Address incompatibility with existing OpenFF infrastructure with Organometallic CMILES

  2. Add ability for conformers to be imported into qc-submit

  3. Create new dataset type with relevant properties.

GitHub link

Status

Status
titleNot startedNOT STARTED
Status
colourYellow
titleIn progressIN PROGRESS
Status
colourGreen
titleCompletedCOMPLETED
Status
colourRed
titleWonWON't progressT PROGRESS

Table of Contents
minLevel2
maxLevel6
outlinefalse
stylenone
typelist
printabletrue

\uD83D\uDEA9 Milestones and metrics

Datasets will be labeled as DS#-XXX-{1,2}, e.g. DS1-CCD-2, which denotes dataset 1, taken from the CCD database, with the metal centers of secondary interest. The first number and the middle three letter code are always paired to avoid confusion between similar database abbreviations, e.g. CCD vs COD vs CSD. The last number denotes inclusion of metal centers of primary and secondary interest:

1: primary interest: {Pd, Fe, Zn, Mg, Cu, Li}
2: secondary interest: {Rh, Ir, Pt, Ni, Cr, Ag}

March 30, 2025April 15, 2025April 30, 2025

Stage

Milestone/Benchmark

Contributors

Deadline

Status

Curate training dataset

Work out best level of theory for the training dataset

Alexandra McIsaac , Lily Wang

November 10, 2024

Status
colourYellow
titleIn progress

Compute training dataset

Alexandra McIsaac

December 31, 2024Add ability for conformers to be imported into qc-submit

Assess ability for conformers to be added into qc-submit

Jennifer A Clark

Status
titleNot started

Resolve qc-submit CMILES incompatibility with organometallic complexes

Determine if RDKit functionality will perform adequately

Jennifer A Clark

Status
titleNot started

Curate testing dataset

Compile QM dataset

November 30, 2024If RDKit will not handle CMILES, skip for cif to qca interaction

Jennifer A Clark

Status
titleNot started

Compute QM dataset

January 31, 2025If RDKit will handle CMILES, assess work around for OpenEye, or implement error handling

Jennifer A Clark

Status
titleNot started

Compile simulation test set (Free Solv, maybe non-hydration solvation free energy sets that are harder to reproduce)

April Curate opt training dataset

Filter PDB Chemical Component Dictionary (CCD) and submit DS1-CCD-1 and DS1-CCD-2 at BP86 / def2-TZVP

Jennifer A Clark, Brent Westbrook

Jan. 15, 2025

Status
colourYellow
titleIn progressNot started

Determine best NN architecture

Implement attention-based GNN

Alexandra McIsaac Brent Westbrook (Unlicensed) Lily Wang

December 31, 2024

Submit DS1-CCD-1 and DS1-CCD-2 at alternative model chemistries for assessment

Jennifer A Clark

Status
titleNot started

Choose model chemistry based off of DS1-CCD-1 and DS1-CCD-2

Jennifer A Clark ,Lily Wang

Status
titleNot started

Implement bond features in GraphSAGE (?)

Alexandra McIsaac Brent Westbrook (Unlicensed) Lily Wang

December 31, 2024Filter Crystallography Open Database (COD) and submit OPT DS2-COD-1 and DS2-COD-2 at GFN2-XTB

Jennifer A Clark

Status
titleNot started

Filter CSD (cambridge strucural database) and submit OPT DS3-CSD-1 and DS3-CSD-2 at GFN2-XTB with structures neglected by tmQM

Jennifer A Clark

Status
titleNot started

Determine best architecture

Alexandra McIsaac Brent Westbrook (Unlicensed) Lily Wang

January 31, 2025Filter MPtrj: Materials Project Trajectory Dataset and submit OPT DS4-MPT-1 and DS4-MPT-2 at GFN2-XTB

Jennifer A Clark

Status
titleNot started

First pass at NN training

Train using just ESPs, dipoles, quadrupoles

Feb 28, 2025Submit DS2-COD-1 OPT at target model chemistry

Jennifer A Clark

Status
titleNot started
Regularize to RESP charges or MBIS charges if buried atoms are a problem

Submit DS3-CSD-1 OPT at target model chemistry

Jennifer A Clark

Status
titleNot started

Train directly to charge model if still having issuesSubmit DS4-MPT-1 OPT at target model chemistry

Jennifer A Clark

Status
titleNot started

Benchmark 1: QM

Neural network charge model with low testing error on QM data (ESPs, dipoles)

March 15, 2025

Re-train VDW terms

Curate electronic properties training dataset

Define primary and secondary properties of interest

Jennifer A Clark , Chris Iacovella

Status
colourGreen
titleCOMPLETED

Determine output protocol of primary properties of interest and implement

Jennifer A Clark

Status
colourYellow
titleIN PROGRESS

Determine output protocol of secondary properties of interest and implement

Jennifer A Clark

Status
colourYellow
titleIN PROGRESS

Submit DS1-CCD-1 Electronic Property calculation at target model chemistry

Jennifer A Clark

Status
titleNot started

Re-train valence terms

Submit DS2-COD-1 Electronic Property calculation at target model chemistry

Jennifer A Clark

Status
titleNot started

Benchmark 2: Simulation

Neural network charge model with equivalent or better performance to NAGL in simulations

Submit DS3-CSD-1 Electronic Property calculation at target model chemistry

Jennifer A Clark

Status
titleNot started

Submit DS4-MPT-1 Electronic Property calculation at target model chemistry

Jennifer A Clark

Status
titleNot started

📊 Progress and findings

Curated data (or similar title)