Table of Contents |
---|
Background
Sage 2.0 was trained on Gen2 data (List of QM training/ benchmark datasets ), while Sage 2.1 was trained on a combination of Gen1 and Gen2 data (
Github link macro | ||
---|---|---|
|
...
NCI 250K: https://cactus.nci.nih.gov/download/nci/
Chembl30: https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_30/chembl_30_release_notes.txt
PDB: http://ligand-expo.rcsb.org/dictionaries/Components-smiles-stereo-oe.smi
Zinc:
Pre-filtered by Riniker lab (also ChEMBL): https://www.research-collection.ethz.ch/handle/20.500.11850/230799
MMFF (Merck Molecular Mechanics FF) (maybe not as training, but validation) Possible reference?
MOPAC training set
Proper Torsions
Assessing coverage
Starting from version 2 of the the Torsion multiplicity force field, which includes split torsion parameters for every incorrect torsion identified by Pavan and Meghan, I computed the coverage for every proper torsion parameter:
View file | ||
---|---|---|
|
View file | ||
---|---|---|
|
Searching the ChEMBL 33 database for molecules matching these parameters reveals that 41 of them (
View file | ||
---|---|---|
|
As a final check, I also searched for these parameters in our industry benchmarking data set. This time, only 13 parameters are not covered by ChEMBL, the training set, or the benchmarking set: t115h, t116i, t116j, t123, t130g, t130h, t132g, t133g, t133h, t142j, t142k, t142l, t143i. In other words, despite not being found in ChEMBL, t59g, t60g, t61g, and t62g are all covered by our industry benchmark.
In light of this, I think t115h, t116i, t116j, t123, t130g, t130h, t132g, t133g, t133h, t142j, t142k, t142l, t143i are good candidates for deletion, while t59g, t60g, t61g, and t62g likely need some kind of coverage in the training set.
...
To do
Chemistry | ||
---|---|---|
Sulfonic and phosphonic acids | ||
Sulfur functional groups – sulfones, sulfonates, sulfinyl, sulfoxy, sulfoximines, sulfonamides, thioethers, thioazoles, sulfonimidamines, … | ||
Nitrogen functional groups common in drugs |
Attachments
View file | ||
---|---|---|
|
...