Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Background

Sage 2.0 was trained on Gen2 data (List of QM training/ benchmark datasets ), while Sage 2.1 was trained on a combination of Gen1 and Gen2 data ( ). These datasets have known coverage gaps of particular chemistries, including molecules containing halogens, hypervalent sulfur groups, sulfonic and phosphonic acids. These gaps may account for such problems as:

  • https://openforcefield.atlassian.net/wiki/spaces/FF/pages/2592604191/Further+sulf+on+amide+improvements

Many of these rarer functional groups are still important in medicinal chemistry:

Goal

To check and possibly increase the coverage of various chemistries in our training and test sets. Example molecules could be constructed (e.g. ) or filtered from various datasets, such as ChEMBL30 or https://enamine.net/building-blocks/medchem/view-all/sulfoximines .

Dataset ideas:

To do

Chemistry

Sulfonic and phosphonic acids

Sulfur functional groups – sulfones, sulfonates, sulfinyl, sulfoxy, sulfoximines, sulfonamides, thioethers, thioazoles, sulfonimidamines, …

Nitrogen functional groups common in drugs

Attachments

  • No labels