Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

1. Generation of substituent list

  • Removed phenyls with ortho-substituents

  • Filter: cyclic substituents with (1) zero rotatable bond (2) # rings =1 or acyclic substituents with # rotatable bonds <2

  • Combined Roche, Coverage, Pfizer, Bayer: 361 substituents (Acyclic aliphatic: 183, 2. aliphatic rings: 100, 3. 6-membered aromatic rings:50, 4. 5-membered aromatic rings:28)

    • Roche set

      • Acyclic aliphatic: 61, 2. aliphatic rings: 21, 3. 6-membered aromatic rings:5, 4. 5-membered aromatic rings:11

    • Coverage set

      • Acyclic aliphatic: 76, 2. aliphatic rings: 2, 3. 6-membered aromatic rings:6, 4. 5-membered aromatic rings:2

    • Pfizer set

      • Acyclic aliphatic: 24, 2. aliphatic rings: 9, 3. 6-membered aromatic rings:6, 4. 5-membered aromatic rings:7

    • eMolecules set (okay not to include eMolecules set?)

      • aliphatic chain: 148, 2. aliphatic rings: 75, 3. 6-membered aromatic rings:90, 4. 5-membered aromatic rings:51

    • Bayer set

      • Acyclic aliphatic: 116, 2. aliphatic rings: 86, 3. 6-membered aromatic rings:42, 4. 5-membered aromatic rings:16

View file
namecombined.pdf

2. Generation of molecule set

  • Using 361 substituents, generated 59086 molecules (align by mol weights ) molecules

View file
namemolecules_out.smi
View file
namerun.log

3. Curation of molecule set

3.1. Remove similar molecules using MACCS keys fingerprints and Check coverage of torsion parameters

(1) list molecules matching to each torsion parameter

(2) using MACCS keys fingerprints, cluster each molecule list into ~20 clusters and pick center molecule from

...

(3) Pick one molecule per each cluster to generate subset of list with around 20 molecules for each torsion parameter

(3) check coverage of torsion parameter (missing torsions)

→ now running

  • (1) Wnt to include all substituents in the final torsiondrive dataset (2) while choosing one from each cluster, currently choose center onePicking a center molecule(one with the largest sum of similarity indices) or the simplest molecule→ constantly choose certain substituents?

    Image Removed
    • Choosing center molecules

      • 38 substituents out of 361 not included.(26 scaffolds missing)

View file
namet70c_cluter_core_selected.pdf
View file
namet70c_all.pdfmissing_scaffolds.pdf

  • Choosing simple molecules

    • 65 substituents out of 361 not included. (28 scaffolds missing)

  • Random picking

    • ~26 substituents out of 361 not included. (27 scaffolds missing)

(5) check coverage of torsion parameter (missing torsions)

→ Generate torsiondrive dataset to submit

...

3.2. Internal H bond forming mols : Better SMIRKS needed. How to consider spatial arrangement of 1-n chain

(1) Test filtering w/ oversimplified SMIRKS

...

(2) More specific SMIKRS patterns

  • Filter  [n,N,o,O,F]([H])[!#1][!#1]~!@[!#1;r]([#7X2;r])

    • # molecules matched : 1430 (out of 59086)

    • Image RemovedImage Added

      Right hand side mols dont seem to form internal H bond

  • Filter [n,N,o,O,F]([H])[!#1]~!@[!#1]~!@[!#1;r]([#7X2;r])

    • # molecules matched : 1060

      Image Removed

      ( <2 % of total)

      Image Added
    • How to exclude right mol?

...

TODO (2021-04-01)

  •  1. Remove ortho substituents from substituent list, add ones with meta/para substituents
  •  2. Remove similar molecules using MACCS keys fingerprints
  •  3. Check coverage of torsion parameters → generate a draft of molecule set (~3000 entries)
  •  4. Addition of intra H bond filter : by using SMIKRS pattern matching
  •  5. Check the coverage of problematic substituents, which showed large discrepancies in Pavan’s 1.3.0 benchmarks
  •  6. Range of WBOs of each training data subset, a list of scans training a certain torsion parameter
  •  7. addition of double bond rotating torsion scans

...

damn installation

1. conda create --name constructure -c conda-forge -c openeye -c omnia pydantic openeye-toolkits cmiles ipykernel python=3.8

2. python setup.py develop (constructure)

3. python setup.py install (fragmenter)

4. conda install -c conda-forge pyyaml

* Additionally openforcefield has been installed