BespokeFit SMIRKS generation

Bespokefit generates bespoke smirks patterns for torsions of interest using a Chemper single graph. This includes all attributes of the 4 atoms in the torsion quartet and attributes for atoms up to one bond away (chemper option layers=1). This was chosen to make sure the substructure in the SMIRKS would transfer from the fragment to the parent without the capping groups confusing the matching. However, this can lead to parameters missing the intended atoms in some cases. Instead, it may be better to try and encode the MCS between the fragment and the parent in the SMIRKS to ensure that they transfer well, this may also mean that the parameters could be better reused in new molecules (needs validation). This then more closely matches the environment described by the fragmentation scheme meaning that the minimum electronically coupled environment needed to described the torsion is consistent.

Example of where the current scheme fails

During development, the current scheme performed well on the tyk2 ligand series but as we move to new sets like the ptp1b failures in the SMIRKS generation have come to light.

Take this parent molecule

and the fragment corresponding to the biaryl torsion

The workflow will identify 4 possible torsions passing through the central bond highlighted above. Each of these combinations of 4 atoms will then have a separate chemper graph made and associated smirks. Some of these smirks patterns are equivalent, and bespokefit will deduplicate them, resulting in 2 unique torsion smirks parameters.

smirks1=[#6aH1X3x2r6+0:1](-!@[#1AH0X1x0!r+0])(:@[#6aH1X3x2r6+0]):@[#6aH0X3x2r6+0:2](:@[#6aH1X3x2r6+0])-!@[#6AH0X3x2r5+0:3](=@[#6AH0X3x2r5+0])-@[#16AH0X2x2r5+0:4]-@[#6AH1X3x2r5+0]

hits atoms

 

and smirks2=[#6aH1X3x2r6+0:1](-!@[#1AH0X1x0!r+0])(:@[#6aH1X3x2r6+0]):@[#6aH0X3x2r6+0:2](:@[#6aH1X3x2r6+0])-!@[#6AH0X3x2r5+0:3](-@[#16AH0X2x2r5+0])=@[#6AH0X3x2r5+0:4](-!@[#35AH0X1x0!r+0])-@[#6AH0X3x2r5+0]

hits atoms

Now we can use the toolkit to find where these new smirks patterns will hit the parent molecule.

smirks1

we see the parameter does not match the parent at all due to the last part of smikrs1 @[#16AH0X2x2r5+0:4]-@[#6AH1X3x2r5+0] this says the sulphur in the 5 member ring should be bonded to a carbon in the ring which has 3 bonds and 1 of them is hydrogen which is true in the fragment but not in the parent. We something similar with smirks2

here the parameter matches one of the torsions but not the second due to this part [#6aH1X3x2r6+0:1](-!@[#1AH0X1x0!r+0])(:@[#6aH1X3x2r6+0]) of the smirks. This implies that the terminal carbon in the phenyl ring should be bonded to a carbon atom that has 3 bonds and one of these is hydrogen which again is true in the fragment but not in the parent.

To avoid this we can use the MCS between the parent and the fragment to define the SMIRKS patterns, this will lead to more complex patterns containing or groups in the locations of capping groups but should better capture the entire fragment and lead to better transferability. To make the deduplication of the smirks patterns easier we can use the toolkits (rdkit or openeye) to get the symmetry classes of the atoms and deduplicate the torsions by the combination of the 4 symmetry classes. Running this again using a cluster graph to capture the MCS in SMIRKS for each of the symmetry unique torsions we get 2 smirks patterns that correctly hit the target atoms.

cluster_smirks1=[#6H1X3x2r6+0a:1](-;!@[#1H0X1x0!r+0A])(:;@[#6H0X3x2r6+0a,#6H1X3x2r6+0a](-;!@[#1H0X1x0!r+0A,#7H1X3x0!r+0A]):;@[#6H1X3x2r6+0a](-;!@[#1H0X1x0!r+0A]):;@[#6H0X3x2r6+0a,#6H1X3x2r6+0a](-;!@[#1H0X1x0!r+0A,#7H1X3x0!r+0A]):;@[#6H1X3x2r6+0a]-;!@[#1H0X1x0!r+0A]):;@[#6H0X3x2r6+0a:2]-;!@[#6H0X3x2r5+0A:3](=;@[#6H0X3x2r5+0A](-;!@[#35H0X1x0!r+0A])-;@[#6H0X3x2r5+0A](=;@[#6H0X3x2r5+0A,#6H1X3x2r5+0A]-;!@[#1H0X1x0!r+0A,#6H0X3x0!r+0A])-;!@[#8H0X2x0!r+0A]-;!@[#6H2X4x0!r+0A,#6H3X4x0!r+0A](-;!@[#1H0X1x0!r+0A,#6H0X3x0!r+0A])(-;!@[#1H0X1x0!r+0A])-;!@[#1H0X1x0!r+0A])-;@[#16H0X2x2r5+0A:4]

which hits atoms

and cluster_smirks2=[#6H1X3x2r6+0a:1](-;!@[#1H0X1x0!r+0A])(:;@[#6H0X3x2r6+0a,#6H1X3x2r6+0a](-;!@[#1H0X1x0!r+0A,#7H1X3x0!r+0A]):;@[#6H1X3x2r6+0a](-;!@[#1H0X1x0!r+0A]):;@[#6H0X3x2r6+0a,#6H1X3x2r6+0a](-;!@[#1H0X1x0!r+0A,#7H1X3x0!r+0A]):;@[#6H1X3x2r6+0a]-;!@[#1H0X1x0!r+0A]):;@[#6H0X3x2r6+0a:2]-;!@[#6H0X3x2r5+0A:3](-;@[#16H0X2x2r5+0A]-;@[#6H0X3x2r5+0A,#6H1X3x2r5+0A](-;!@[#1H0X1x0!r+0A,#6H0X3x0!r+0A])=;@[#6H0X3x2r5+0A]-;!@[#8H0X2x0!r+0A]-;!@[#6H2X4x0!r+0A,#6H3X4x0!r+0A](-;!@[#1H0X1x0!r+0A,#6H0X3x0!r+0A])(-;!@[#1H0X1x0!r+0A])-;!@[#1H0X1x0!r+0A])=;@[#6H0X3x2r5+0A:4]-;!@[#35H0X1x0!r+0A]