03-05-2020: Short discussion regarding data set clustering

DLM: Quantify diversity using graph similarity then eventually incorporate WBO

CB: LINGOs - graph based similarity based on the SMILES string (Citation to LINGOs method: Link)

DLM: Why should we use LINGOs versus graph fingerprint similarity?

CB: We used smiles for everything, might be simpler. Links to data representation to clustering. Graph similarity is better for 2D similarity.

DLM: Then let’s proceed with graph fingerprint similarity.

DLM: We have Bayers patented collection and has higher similarity within its data set to the other data sets.

DS: We are running 1000 torsion drives for Silicon Therapeutics in QCA. It could be a good data set to use for upcoming fitting, although we might want to consider running Fragmenter on the data set.

DLM: We should make sure to consider updating the benchmark set for training.