Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

Goal

Generation of a simple-molecule-only torsion scan dataset

  • The issue of torsion parameter contamination by large internal non-bonded interactions has been brought up repeatedly in the past discussions.

  • Due to the nature of Roche set, many molecules used to generate the current torsion parameter training set have a phenyl group with ortho substituent, and ortho substituent causes large steric hindrance. And the complexity of molecules was not considered when designing the current training dataset.

  • Generation of new torsion parameter training set, which excludes (1) complex molecules and (2) molecules with high steric hindrance is necessary

Scheme

(1) For each molecule from an input molecule set, using scaffolds, identify substituents; (2) List up all substituents; (3) Filter complex substituents ( by checking number of rotatable bonds, number of rings ); (4) In the enumeration stage, instead of enumerating molecule by adding substituents to a scaffold, combine two substituents into a molecule. The bond formed during the combination becomes a center bond, which will be rotated during its torsion scan;

...

([1*])c1ccccc1 + [*:1]Nc1ccccc1 → c1cc[c:1](cc1)[NH:2]c2ccccc2

1. Generation of substituent list

  • Removed phenyls with ortho-substituents

  • Filter: cyclic substituents with (1) zero rotatable bond (2) # rings =1 or acyclic substituents with # rotatable bonds <2

  • Combined lists from Roche, Coverage, Pfizer, Bayer set: 361 substituents (Acyclic aliphatic: 183, 2. aliphatic rings: 100, 3. 6-membered aromatic rings:50, 4. 5-membered aromatic rings:28)

View file
namecombined.pdf

2. Generation of molecule set

  • combine two substituents into one molecule;

  • From 361 substituents, generated 59086 molecules;

View file
namemolecules_out.smi
View file
namerun.log

3. Curation of molecule set

  • Before clustering, will add two filters to exclude (1) internal H bond forming molecules; (2) molecules chemically non-synthesizable.

3.1. Remove similar molecules using MACCS keys fingerprints and Check coverage of torsion parameters

(1) list molecules matching to each torsion parameter

...