Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 12 Next »

Contributors: David Mobley, Lee-Ping Wang, Hyesu Jang, Jeff Wagner, Chris Bayly, Josh Horton, Chaya Stern, Jessica Maat

Background: The Open Force Field Initiative is working on developing optimization training data sets via a fingerprint and clustering method. The aim of this project is to pull chemically diverse molecules from a range of data sets to survey a large chemical space for our May release force field.

Aim: The aim of this sub-experiment is to limit the number of conformers in a patented data set from Bayer.

Problem: The Bayer set contains large flexible drug molecules that range from 12-30 heavy atoms. Current fingerprint & clustering methods result in 525 molecules & 16,242 conformers. We need to reduce data set to ~3 conformers/molecule.

Hypothesized contributors to large number of conformers:

  • Large molecule size

  • Excessive rotatable bonds

Approach:

  1. After I get a reasonably sized set of optimized molecules (through a size filtering method or rotatable bond filtration), I will submit these.

  2. I will use Fragmentation as a second data set iteration or if I am unable to achieve 3 confs/mol in step 1. I will report how fragmentation affects conformer generation. I hypothesize this will reduce # of conformers because smaller resulting molecules.

Experimental notes:

Clustering method: DBSCAN eps = 0.3, min_samples = 4

Fingerprint method: MACCS (supported by previous experiments from Hyesu Jang)

Method

# of molecules

# of conformers

notes

Randomized size selection

524

10454

Conclusion:

  • No labels