Background
Sage 2.0 was trained on Gen2 data (List of QM training/ benchmark datasets ), while Sage 2.1 was trained on a combination of Gen1 and Gen2 data ( ). These datasets have known coverage gaps of particular chemistries, including molecules containing halogens, hypervalent sulfur groups, sulfonic and phosphonic acids. These gaps may account for such problems as:
Many of these rarer functional groups are still important in medicinal chemistry:
Goal
To check and possibly increase the coverage of various chemistries in our training and test sets. Example molecules could be constructed (e.g. ) or filtered from various datasets, such as ChEMBL30 or https://enamine.net/building-blocks/medchem/view-all/sulfoximines .
To do
Chemistry | ||
---|---|---|
Sulfonic and phosphonic acids | ||
Sulfur functional groups – sulfones, sulfonates, sulfinyl, sulfoxy, sulfoximines, sulfonamides, thioethers, thioazoles, sulfonimidamines, … | ||
Nitrogen functional groups common in drugs |
Attachments