Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 18 Next »

Contributors: David Mobley, Lee-Ping Wang, Hyesu Jang, Jeff Wagner, Chris Bayly, Josh Horton, Chaya Stern, Jessica Maat

Background:

  • The Open Force Field Initiative is working on developing optimization training data sets via a fingerprint and clustering method.

  • The aim of this project is to pull chemically diverse molecules from a range of data sets to survey a larger chemical space for our May release force field.

  • The Bayer set contains 5054 molecules that are large flexible pharmaceutically relevant molecules ranging from 12-30 heavy atoms.

Aim:

  • Limit the number of conformers in a patented data set from Bayer for optimization data set.

    • Reduce data set to ~3 conformers/molecule.

Problem:

  • Current fingerprint & clustering methods result in 525 molecules & 16,242 conformers.

  • Hypothesized contributors to large number of conformers:

    • Large molecule size

    • Excessive rotatable bonds

Approach:

  1. Try numerous size filtering strategies for molecule size that try to preserve chemical diversity and measure # of molecules and conformers.

  2. If #1 is not successful, move onto rotatable bond filtering.

  3. if #1 & #2 are not successful, move onto Fragmentation.

Experimental notes:

  • Clustering method: DBSCAN eps = 0.3, min_samples = 4

  • Fingerprint method: MACCS (supported by previous experiments from Hyesu Jang)

Method

# of molecules

# of conformers

notes

Randomized size selection

524

10454

WIP

Conclusion:

  • No labels