/

WBO Dataset Design:

Updated Feb 03, 2021

WBO Dataset Design:

Project members: @Jessica Maat (Deactivated) @Pavan Behara @Simon Boothroyd @Trevor Gokey @David Mobley @Christopher Bayly

Substituted WBO dataset design:

The goal of these proposed datasets is to explore the effects of conjugation on WBO and torsion barrier height. These molecules probe various substituted groups of chemistry. The proposed datasets are organized by name, SMILES, and visualization.

The datasets consist of 12 molecules which substitute the following groups: (1) hydroxy, (2) thiol, (3) carboxylic, (4) primary amine, (5) pronated amine, (6) urea , (7) secondary amine, (8) hydroxyl amine , (9) nitrile, (10) alkene, (11) sulfone, (12) ethoxy, (13) hydroxide.

Summary of Proposed datasets:

Substituted enyl dataset
Substituted styrene dataset
Substituted conjugated nitrogen datasets
1. Amide
  1. primary
  2. secondary
  3. tertiary
2. Carbamate
3. Urea

Future dataset ideas:

Pyriodone dataset
1. 2-pyridone
2. 4-pyridone
Substituted Imidazole dataset

Dataset Details:

Substituted enyl dataset:

smiles: CC=Cc1ccc(cc1)O | CC=Cc1ccc(cc1)S | CC=Cc1ccc(cc1)C(=O)O | CC=Cc1ccc(cc1)N | CC=Cc1ccc(cc1)[N+](C)(C)C | CC=Cc1ccc(cc1)NC(=O)N | CC=Cc1ccc(cc1)NC | CC=Cc1ccc(cc1)NON | CC=Cc1ccc(cc1)C#N | CC=Cc1ccc(cc1)C | CC=Cc1ccc(cc1)S(=O)=O | CCOc1ccc(cc1)C=CC

Visualization:

Substituted styrene dataset:

Smiles: C=Cc1ccc(cc1)O | C=Cc1ccc(cc1)S | C=Cc1ccc(cc1)C(=O)O | C=Cc1ccc(cc1)N | C[N+](C)(C)c1ccc(cc1)C=C | C=Cc1ccc(cc1)NC(=O)N | CNc1ccc(cc1)C=C | C=Cc1ccc(cc1)NON | C=Cc1ccc(cc1)C#N | Cc1ccc(cc1)C=C | C=Cc1ccc(cc1)S(=O)=O | CCOc1ccc(cc1)C=C

Visualization:

Substituted conjugated carbonyl groups:

Smiles: C=CC(=O)O | C=CC(=O)S | C=CC(=O)C(=O)O | C=CC(=O)N | C[N+](C)(C)C(=O)C=C | C=CC(=O)NC(=O)N | CNC(=O)C=C | C=CC(=O)NON | C=CC(=O)C#N | CCC(=O)C=C | C=CC(=O)S(=O)=O | CCOC(=O)C=C

Visualization:

Substituted conjugated Nitrogen groups:

a) amide:

Primary:

Smiles: C(=O)(N)O | C(=O)(N)S | C(=O)(C(=O)O)N | C(=O)(N)N | C[N+](C)(C)C(=O)N | C(=O)(N)NC(=O)N | CNC(=O)N(C)C | C(=O)(N)NON | C(#N)C(=O)N | CCC(=O)N | CN(C)C(=O)S(=O)=O | CCOC(=O)N

Visualization:

Secondary:

Smiles: CNC(=O)O | CNC(=O)S | CNC(=O)C(=O)O | CNC(=O)N | CNC(=O)[N+](C)(C)C | CNC(=O)NC(=O)N | CNC(=O)NC | CNC(=O)NON | CNC(=O)C#N | CCC(=O)NC | CNC(=O)S(=O)=O | CCOC(=O)NC

Visualization:

Tertiary:

Smiles: CN(C)C(=O)O | CN(C)C(=O)S | CN(C)C(=O)C(=O)O | CN(C)C(=O)N | CN(C)C(=O)[N+](C)(C)C | CN(C)C(=O)NC(=O)N | CNC(=O)N(C)C | CN(C)C(=O)NON | CN(C)C(=O)C#N | CCC(=O)N(C)C | CN(C)C(=O)S(=O)=O | CCOC(=O)N(C)C

Visualization:

b) carbamate

Smiles: CN(C)C(=O)OO | CN(C)C(=O)OS | CN(C)C(=O)OC(=O)O | CN(C)C(=O)ON | CN(C)C(=O)O[N+](C)(C)C | CN(C)C(=O)ONC(=O)N | CNOC(=O)N(C)C | CN(C)C(=O)ONON | CN(C)C(=O)OC#N | CCOC(=O)N(C)C | CN(C)C(=O)OS(=O)=O | CCOOC(=O)N(C)C

Visualization:

c) urea

Smiles: CN(C)C(=O)NO | CN(C)C(=O)NS | CN(C)C(=O)NC(=O)O | CN(C)C(=O)NN | CN(C)C(=O)N[N+](C)(C)C | CN(C)C(=O)NNC(=O)N | CNNC(=O)N(C)C | CN(C)C(=O)NNON | CN(C)C(=O)NC#N | CCNC(=O)N(C)C | CN(C)C(=O)NS(=O)=O | CCONC(=O)N(C)C

Visualization: