WBO Dataset Design:

Project members: @Jessica Maat (Deactivated) @Pavan Behara @Simon Boothroyd @Trevor Gokey @David Mobley @Christopher Bayly

Substituted WBO dataset design:

The goal of these proposed datasets is to explore the effects of conjugation on WBO and torsion barrier height. These molecules probe various substituted groups of chemistry. The proposed datasets are organized by name, SMILES, and visualization.

The datasets consist of 12 molecules which substitute the following groups: (1) hydroxy, (2) thiol, (3) carboxylic, (4) primary amine, (5) pronated amine, (6) urea , (7) secondary amine, (8) hydroxyl amine , (9) nitrile, (10) alkene, (11) sulfone, (12) ethoxy, (13) hydroxide.

Summary of Proposed datasets:

  1. Substituted enyl dataset

  2. Substituted styrene dataset

  3. Substituted conjugated nitrogen datasets

    1. Amide

      1. primary

      2. secondary

      3. tertiary

    2. Carbamate

    3. Urea

 

Future dataset ideas:

  1. Pyriodone dataset

    1. 2-pyridone

    2. 4-pyridone

  2. Substituted Imidazole dataset

 

Dataset Details:

Substituted enyl dataset:

smiles: CC=Cc1ccc(cc1)O | CC=Cc1ccc(cc1)S | CC=Cc1ccc(cc1)C(=O)O | CC=Cc1ccc(cc1)N | CC=Cc1ccc(cc1)[N+](C)(C)C | CC=Cc1ccc(cc1)NC(=O)N | CC=Cc1ccc(cc1)NC | CC=Cc1ccc(cc1)NON | CC=Cc1ccc(cc1)C#N | CC=Cc1ccc(cc1)C | CC=Cc1ccc(cc1)S(=O)=O | CCOc1ccc(cc1)C=CC

Visualization:

Substituted styrene dataset:

Smiles: C=Cc1ccc(cc1)O | C=Cc1ccc(cc1)S | C=Cc1ccc(cc1)C(=O)O | C=Cc1ccc(cc1)N | C[N+](C)(C)c1ccc(cc1)C=C | C=Cc1ccc(cc1)NC(=O)N | CNc1ccc(cc1)C=C | C=Cc1ccc(cc1)NON | C=Cc1ccc(cc1)C#N | Cc1ccc(cc1)C=C | C=Cc1ccc(cc1)S(=O)=O | CCOc1ccc(cc1)C=C

Visualization:

 

Substituted conjugated carbonyl groups:

Smiles: C=CC(=O)O | C=CC(=O)S | C=CC(=O)C(=O)O | C=CC(=O)N | C[N+](C)(C)C(=O)C=C | C=CC(=O)NC(=O)N | CNC(=O)C=C | C=CC(=O)NON | C=CC(=O)C#N | CCC(=O)C=C | C=CC(=O)S(=O)=O | CCOC(=O)C=C

Visualization:

 

Substituted conjugated Nitrogen groups:

a) amide:

Primary:

Smiles: C(=O)(N)O | C(=O)(N)S | C(=O)(C(=O)O)N | C(=O)(N)N | C[N+](C)(C)C(=O)N | C(=O)(N)NC(=O)N | CNC(=O)N(C)C | C(=O)(N)NON | C(#N)C(=O)N | CCC(=O)N | CN(C)C(=O)S(=O)=O | CCOC(=O)N

Visualization:

 

Secondary:

Smiles: CNC(=O)O | CNC(=O)S | CNC(=O)C(=O)O | CNC(=O)N | CNC(=O)[N+](C)(C)C | CNC(=O)NC(=O)N | CNC(=O)NC | CNC(=O)NON | CNC(=O)C#N | CCC(=O)NC | CNC(=O)S(=O)=O | CCOC(=O)NC

Visualization:

 

Tertiary:

Smiles: CN(C)C(=O)O | CN(C)C(=O)S | CN(C)C(=O)C(=O)O | CN(C)C(=O)N | CN(C)C(=O)[N+](C)(C)C | CN(C)C(=O)NC(=O)N | CNC(=O)N(C)C | CN(C)C(=O)NON | CN(C)C(=O)C#N | CCC(=O)N(C)C | CN(C)C(=O)S(=O)=O | CCOC(=O)N(C)C

Visualization:

 

b) carbamate

Smiles: CN(C)C(=O)OO | CN(C)C(=O)OS | CN(C)C(=O)OC(=O)O | CN(C)C(=O)ON | CN(C)C(=O)O[N+](C)(C)C | CN(C)C(=O)ONC(=O)N | CNOC(=O)N(C)C | CN(C)C(=O)ONON | CN(C)C(=O)OC#N | CCOC(=O)N(C)C | CN(C)C(=O)OS(=O)=O | CCOOC(=O)N(C)C

Visualization:

c) urea

Smiles: CN(C)C(=O)NO | CN(C)C(=O)NS | CN(C)C(=O)NC(=O)O | CN(C)C(=O)NN | CN(C)C(=O)N[N+](C)(C)C | CN(C)C(=O)NNC(=O)N | CNNC(=O)N(C)C | CN(C)C(=O)NNON | CN(C)C(=O)NC#N | CCNC(=O)N(C)C | CN(C)C(=O)NS(=O)=O | CCONC(=O)N(C)C

Visualization: