Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Contributors: Hyesu Jang , Lee-Ping Wang

...

  1. Neighboring set: a set contains chemically similar torsions to the training sets;

  2. Diverse set: a set covers more broad chemical diversity.

*Change has been made based on the feedback from David Mobley and Jessica Maat (Deactivated) .

Selection Scheme:

...

1. Neighboring set selection scheme

...

There was one concern brought up by L-P about the size of the diverse set when including all the torsions which can be generated from input molecule sets. Depending on the size of the input molecule set, the size of the diversity set can be too big to deal with. So one possible approach to eliminate this concern is to (1) randomly select certain portion of molecules from input molecule set and (2) generate all possible torsions out of the selected molecules. This is beneficial in that it can control the size of the test set.

Comment from David Mobley : “I think for selecting the diverse set, it would be quite reasonable to simply pick random molecules from our input sets until we reach the desired number of molecules or – if parameter coverage is a concern – to pick random molecules utilizing targeted parameters. I don’t think there is any reason we have to use chemical similarity/clustering to pick the diverse set since that’s handled by the neighboring set; the point of the diverse set is to get diversity.”

Selected sets:

...

View file
namediverse_selected_torsions.pdf

(Note that the diverse set shared here was generated using the first approach.)