Goal

Generation of a “simple molecule” torsiondrive dataset

TODO (2021-04-01)

Approach 2. substituent+substituent → new molecule

Test generation of a molecule set using Roche set only

1. Using the concept of scaffold, generate a list of substituents and filter complex ones.


# mols

filter1 (harsh filter)

: # rot bonds =0, # rings <=1

filter2

: # rot bonds = 0 and # rings = 1

+ # rot bonds < 2 and # rings = 0

Roche 

468

106

139

2. From the filtered list of substituents, combine two substituents into a molecule.

* Note that the list of substituents generated using Roche set only includes phenyls with ortho substituent. → “will remove phenyls with ortho substituents and add phenyls with meta/para substituents”


Approach 1. Usage of Constructure enumeration method

Combination of a set of substructures and a scaffold → new molecule 

1. Scaffolds - used 151 scaffolds obtained from Constructure

2. substituents (functional groups) 

: To obtain a reasonable list of substituents, I generated the list using the existing molecule sets. (Roche set, Pfizer discrepancy set, and eMolecules discrepancy set. )

(1) Generation of list of substituents 

harsh filter: # of rotatable bond =0, # of rings <= 1) 


# mols

filter1

# rot bonds <=1, # rings <=1

harsh filter

# rot bonds =0, # rings <=1

Roche 

468

271

106

Pfizer

100

102

57

eMolecules

2904

2343

748

3. Enumerate combinations using Constructure tool

(1) scaffolds w/ 1 r_groups(54, such as aldehyde, alcohol, ...)

(2) scaffolds w/ 2 r_groups(37, such as ketone, oxime, …)

(3) scaffolds w/ # of r_group > 2