DLM: Quantify diversity using graph similarity then eventually incorporate WBO
CB: LINGOs - graph based similarity based on the SMILES string (Citation to LINGOs method: Link)
DLM: Why should we use LINGOs versus graph fingerprint similarity?
CB: We used smiles for everything, might be simpler. Links to data representation to clustering. Graph similarity is better for 2D similarity.
DLM: Then let’s proceed with graph fingerprint similarity.
DLM: We have Bayers patented collection and has higher similarity within its data set to the other data sets.
DS: We are running 1000 torsion drives for Silicon Therapeutics in QCA. It could be a good data set to use for upcoming fitting, although we might want to consider running Fragmenter on the data set.
DLM: We should make sure to consider updating the benchmark set for training.