Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

1. Generation of substituent list

  • Removed phenyls with ortho-substituents

  • Filter: cyclic substituents with (1) zero rotatable bond (2) # rings =1 or acyclic substituents with # rotatable bonds <2

  • Combined Roche, Coverage, Pfizer, Bayer: 361 substituents (Acyclic aliphatic: 183, 2. aliphatic rings: 100, 3. 6-membered aromatic rings:50, 4. 5-membered aromatic rings:28)

    • Roche set

      • Acyclic aliphatic: 61, 2. aliphatic rings: 21, 3. 6-membered aromatic rings:5, 4. 5-membered aromatic rings:11

    • Coverage set

      • Acyclic aliphatic: 76, 2. aliphatic rings: 2, 3. 6-membered aromatic rings:6, 4. 5-membered aromatic rings:2

    • Pfizer set

      • Acyclic aliphatic: 24, 2. aliphatic rings: 9, 3. 6-membered aromatic rings:6, 4. 5-membered aromatic rings:7

    • eMolecules set (okay not to include eMolecules set?)

      • aliphatic chain: 148, 2. aliphatic rings: 75, 3. 6-membered aromatic rings:90, 4. 5-membered aromatic rings:51

    • Bayer set

      • Acyclic aliphatic: 116, 2. aliphatic rings: 86, 3. 6-membered aromatic rings:42, 4. 5-membered aromatic rings:16

...

  • Using 361 substituents, generated 59086 molecules (align by mol weights )

View file
namerun.log

3. Curation

...

(2) using MACCS keys fingerprints, cluster each molecule list into ~20 clusters and pick center molecule from each cluster to generate subset of list with around 20 molecules for each torsion parameter

...

(3) check coverage of torsion parameter (missing torsions)

→ now running

  • (1) Wnt Want to include all substituents in the final torsiondrive dataset (2) while choosing one from each cluster, currently choose center one(one with the largest sum of similarity indices) → constantly choose certain substituents? *Random selection?

View file
namet70c_cluter_core_selected.pdf
View file
namet70c_all.pdf

3.2. Internal H bond forming mols : Better SMIRKS needed. How to consider spatial arrangement of 1-n chain

(1) Test filtering w/ oversimplified SMIRKS

...

  • Filter  [n,N,o,O,F]([H])[!#1][!#1]~!@[!#1;r]([#7X2;r])

    • # molecules matched : 1430 (out of 59086)

    • Image RemovedImage Added

      Right hand side mols dont seem to form internal H bond

  • Filter [n,N,o,O,F]([H])[!#1]~!@[!#1]~!@[!#1;r]([#7X2;r])

    • # molecules matched : 1060

      Image Removed

      ( <2 % of total)

      Image Added
    • How to exclude right mol?

...

TODO (2021-04-01)

  •  1. Remove ortho substituents from substituent list, add ones with meta/para substituents
  •  2. Remove similar molecules using MACCS keys fingerprints
  •  3. Check coverage of torsion parameters → generate a draft of molecule set (~3000 entries)
  •  4. Addition of intra H bond filter : by using SMIKRS pattern matching
  •  5. Check the coverage of problematic substituents, which showed large discrepancies in Pavan’s 1.3.0 benchmarks
  •  6. Range of WBOs of each training data subset, a list of scans training a certain torsion parameter
  •  7. addition of double bond rotating torsion scans

...

damn installation

1. conda create --name constructure -c conda-forge -c openeye -c omnia pydantic openeye-toolkits cmiles ipykernel python=3.8

2. python setup.py develop (constructure)

3. python setup.py install (fragmenter)

4. conda install -c conda-forge pyyaml

* Additionally openforcefield has been installed