03-05-2020: Data set clustering & data set selection
Participants: @David Mobley @Hyesu Jang @Lee-Ping Wang @Christopher Bayly @Daniel Smith (Deactivated) @Jessica Maat (Deactivated)
Discussion:
DLM: Quantify diversity using graph similarity then eventually incorporate WBO
CB: LINGOs - graph based similarity based on the SMILES string (Citation to LINGOs method: Link)
DLM: Why should we use LINGOs versus graph fingerprint similarity?
CB: We used smiles for everything, might be simpler. Links to data representation to clustering. Graph similarity is better for 2D similarity.
DLM: Then let’s proceed with graph fingerprint similarity.
DLM: We have Bayers patented collection and has higher similarity within its data set to the other data sets.
DS: We are running 1000 torsion drives for Silicon Therapeutics in QCA. It could be a good data set to use for upcoming fitting, although we might want to consider running Fragmenter on the data set.
DLM: We should consider updating the training data set for future releases.