WBO Dataset Design:
Project members: @Jessica Maat (Deactivated) @Pavan Behara @Simon Boothroyd @Trevor Gokey @David Mobley @Christopher Bayly
Substituted WBO dataset design:
The goal of these proposed datasets is to explore the effects of conjugation on WBO and torsion barrier height. These molecules probe various substituted groups of chemistry. The proposed datasets are organized by name, SMILES, and visualization.
The datasets consist of 12 molecules which substitute the following groups: (1) hydroxy, (2) thiol, (3) carboxylic, (4) primary amine, (5) pronated amine, (6) urea , (7) secondary amine, (8) hydroxyl amine , (9) nitrile, (10) alkene, (11) sulfone, (12) ethoxy, (13) hydroxide.
Summary of Proposed datasets:
Substituted enyl dataset
Substituted styrene dataset
Substituted conjugated nitrogen datasets
Amide
primary
secondary
tertiary
Carbamate
Urea
Future dataset ideas:
Pyriodone dataset
2-pyridone
4-pyridone
Substituted Imidazole dataset
Dataset Details:
Substituted enyl dataset:
smiles: CC=Cc1ccc(cc1)O | CC=Cc1ccc(cc1)S | CC=Cc1ccc(cc1)C(=O)O | CC=Cc1ccc(cc1)N | CC=Cc1ccc(cc1)[N+](C)(C)C | CC=Cc1ccc(cc1)NC(=O)N | CC=Cc1ccc(cc1)NC | CC=Cc1ccc(cc1)NON | CC=Cc1ccc(cc1)C#N | CC=Cc1ccc(cc1)C | CC=Cc1ccc(cc1)S(=O)=O | CCOc1ccc(cc1)C=CC
Visualization:
Substituted styrene dataset:
Smiles: C=Cc1ccc(cc1)O | C=Cc1ccc(cc1)S | C=Cc1ccc(cc1)C(=O)O | C=Cc1ccc(cc1)N | C[N+](C)(C)c1ccc(cc1)C=C | C=Cc1ccc(cc1)NC(=O)N | CNc1ccc(cc1)C=C | C=Cc1ccc(cc1)NON | C=Cc1ccc(cc1)C#N | Cc1ccc(cc1)C=C | C=Cc1ccc(cc1)S(=O)=O | CCOc1ccc(cc1)C=C
Visualization:
Substituted conjugated carbonyl groups:
Smiles: C=CC(=O)O | C=CC(=O)S | C=CC(=O)C(=O)O | C=CC(=O)N | C[N+](C)(C)C(=O)C=C | C=CC(=O)NC(=O)N | CNC(=O)C=C | C=CC(=O)NON | C=CC(=O)C#N | CCC(=O)C=C | C=CC(=O)S(=O)=O | CCOC(=O)C=C
Visualization:
Substituted conjugated Nitrogen groups:
a) amide:
Primary:
Smiles: C(=O)(N)O | C(=O)(N)S | C(=O)(C(=O)O)N | C(=O)(N)N | C[N+](C)(C)C(=O)N | C(=O)(N)NC(=O)N | CNC(=O)N(C)C | C(=O)(N)NON | C(#N)C(=O)N | CCC(=O)N | CN(C)C(=O)S(=O)=O | CCOC(=O)N
Visualization:
Secondary:
Smiles: CNC(=O)O | CNC(=O)S | CNC(=O)C(=O)O | CNC(=O)N | CNC(=O)[N+](C)(C)C | CNC(=O)NC(=O)N | CNC(=O)NC | CNC(=O)NON | CNC(=O)C#N | CCC(=O)NC | CNC(=O)S(=O)=O | CCOC(=O)NC
Visualization:
Tertiary:
Smiles: CN(C)C(=O)O | CN(C)C(=O)S | CN(C)C(=O)C(=O)O | CN(C)C(=O)N | CN(C)C(=O)[N+](C)(C)C | CN(C)C(=O)NC(=O)N | CNC(=O)N(C)C | CN(C)C(=O)NON | CN(C)C(=O)C#N | CCC(=O)N(C)C | CN(C)C(=O)S(=O)=O | CCOC(=O)N(C)C
Visualization:
b) carbamate
Smiles: CN(C)C(=O)OO | CN(C)C(=O)OS | CN(C)C(=O)OC(=O)O | CN(C)C(=O)ON | CN(C)C(=O)O[N+](C)(C)C | CN(C)C(=O)ONC(=O)N | CNOC(=O)N(C)C | CN(C)C(=O)ONON | CN(C)C(=O)OC#N | CCOC(=O)N(C)C | CN(C)C(=O)OS(=O)=O | CCOOC(=O)N(C)C
Visualization:
c) urea
Smiles: CN(C)C(=O)NO | CN(C)C(=O)NS | CN(C)C(=O)NC(=O)O | CN(C)C(=O)NN | CN(C)C(=O)N[N+](C)(C)C | CN(C)C(=O)NNC(=O)N | CNNC(=O)N(C)C | CN(C)C(=O)NNON | CN(C)C(=O)NC#N | CCNC(=O)N(C)C | CN(C)C(=O)NS(=O)=O | CCONC(=O)N(C)C
Visualization: