Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

1. Generation of substituent list

  • Removed phenyls with ortho-substituents

  • Filter: cyclic substituents with (1) zero rotatable bond (2) # rings =1 or acyclic substituents with # rotatable bonds <2

  • Combined Roche, Coverage, Pfizer, Bayer: 361 substituents (Acyclic aliphatic: 183, 2. aliphatic rings: 100, 3. 6-membered aromatic rings:50, 4. 5-membered aromatic rings:28)

    • Roche set

      • Acyclic aliphatic: 61, 2. aliphatic rings: 21, 3. 6-membered aromatic rings:5, 4. 5-membered aromatic rings:11

    • Coverage set

      • Acyclic aliphatic: 76, 2. aliphatic rings: 2, 3. 6-membered aromatic rings:6, 4. 5-membered aromatic rings:2

    • Pfizer set

      • Acyclic aliphatic: 24, 2. aliphatic rings: 9, 3. 6-membered aromatic rings:6, 4. 5-membered aromatic rings:7

    • eMolecules set (okay not to include eMolecules set?)

      • aliphatic chain: 148, 2. aliphatic rings: 75, 3. 6-membered aromatic rings:90, 4. 5-membered aromatic rings:51

    • Bayer set

      • Acyclic aliphatic: 116, 2. aliphatic rings: 86, 3. 6-membered aromatic rings:42, 4. 5-membered aromatic rings:16

...

  • Using 361 substituents, generated 59086 molecules (align by mol weights )

View file
namemolecules_out.smi
View file
namerun.log

3. Curation of molecule set

3.1. Remove similar molecules using MACCS keys fingerprints and Check coverage of torsion parameters

...

  • Picking a center molecule(one with the largest sum of similarity indices) or the simplest molecule→ constantly choose certain substituents?

    • When choosing Choosing center molecules → need to check coverage of substituent list

      Image Removed
    • When choosing simple molecules → need check coverage of substituent list

    Random picking?
      • 38 substituents out of 361 not included.(26 scaffolds missing)

View file
namemissing_scaffolds.pdf

  • Choosing simple molecules

    • 65 substituents out of 361 not included. (28 scaffolds missing)

  • Random picking

    • ~26 substituents out of 361 not included. (27 scaffolds missing)

  • maybe, if the molecule covers all scaffolds (3-4 PM)

(5) check coverage of torsion parameter (missing torsions)

→ Generate torsiondrive dataset to submit

...

3.2. Internal H bond forming mols : Better SMIRKS needed. How to consider spatial arrangement of 1-n chain

(1) Test filtering w/ oversimplified SMIRKS

...

(2) More specific SMIKRS patterns

  • Filter  [n,N,o,O,F]([H])[!#1][!#1]~!@[!#1;r]([#7X2;r])

    • # molecules matched : 1430 (out of 59086)

    • Right hand side mols dont seem to form internal H bond

  • Filter [n,N,o,O,F]([H])[!#1]~!@[!#1]~!@[!#1;r]([#7X2;r])

    • # molecules matched : 1060 ( <2 % of total)

    • How to exclude right mol?

...

TODO (2021-04-01)

  •  1. Remove ortho substituents from substituent list, add ones with meta/para substituents
  •  2. Remove similar molecules using MACCS keys fingerprints
  •  3. Check coverage of torsion parameters → generate a draft of molecule set (~3000 entries)
  •  4. Addition of intra H bond filter : by using SMIKRS pattern matching
  •  5. Check the coverage of problematic substituents, which showed large discrepancies in Pavan’s 1.3.0 benchmarks
  •  6. Range of WBOs of each training data subset, a list of scans training a certain torsion parameter
  •  7. addition of double bond rotating torsion scans

...

damn installation

1. conda create --name constructure -c conda-forge -c openeye -c omnia pydantic openeye-toolkits cmiles ipykernel python=3.8

2. python setup.py develop (constructure)

3. python setup.py install (fragmenter)

4. conda install -c conda-forge pyyaml

* Additionally openforcefield has been installed