/
Missing parameter coverage

Missing parameter coverage

t129

As reported in the Known issues and bugs (@striketeam), t129 ([*:1]-[#8X2r5:2]-;@[#7X2r5:3]~[*:4]) in Sage 2.1.0 has a suspicious V = -19.907.  Based on investigation in Redundant parameters in Sage 2.1, it looks like this force constant is reasonable for the aromatic rings in the training data, but as shown below, this pattern also applies to some non-aromatic rings in the industry dataset, which are a bit more suspicious. We should probably at least add some of these to the training set for Sage 2.2.

Industry data

The final question then is whether this pattern can only apply to aromatic rings like these. The pattern covers 295 molecules in the industry benchmarking data set, as shown in the series of images below.

At least a few of these, shown below, are not aromatic.

Molecule

SMILES

Molecule

SMILES

mol00.png

[H]c1c(c(c(c(c1[H])[H])N2C(=NOS2)C3=C(C(=O)Oc4c3c(c(c(c4[H])[H])[H])[H])[H])[H])[H]

mol16.png

[H]c1c(c(c(c(c1C2=NO[C@@]3([C@]2(C(N(C3([H])[H])C(=O)OC(C([H])([H])[H])(C([H])([H])[H])C([H])([H])[H])([H])[H])[H])[H])[H])[H])Cl)[H]

mol24.png

[H]c1c(c(c(c(c1[H])c2c(c3c(c(c2[H])Cl)N(C(=N3)C4=NOC5(C4([H])[H])C(C(C(C(C5([H])[H])([H])[H])([H])[H])([H])[H])([H])[H])[H])[H])F)[H])[H]

mol43.png

[H]c1c(c(c(c(c1C(=O)C2=C([N-]N(C2=O)C([H])([H])[H])[H])C([H])([H])[H])C3=NOC(C3([H])[H])([H])[H])S(=O)(=O)C([H])([H])[H])[H]

mol55.png

[H]c1c(c(c(c(c1[H])c2c(c(c3c(n2)N=C(N3[H])C4=NOC5(C4([H])[H])C(C(C(C(C5([H])[H])([H])[H])([H])[H])([H])[H])([H])[H])[H])[H])C(F)(F)F)[H])[H]

mol91.png

[H]c1c(c(c(c(c1[H])c2c(c(c3c(c2C([H])([H])[H])N(C(=N3)C4=NOC5(C4([H])[H])C(C(C(C(C5([H])[H])([H])[H])([H])[H])([H])[H])([H])[H])[H])C([H])([H])[H])[H])C(F)(F)F)[H])[H]

mol106.png

[H]c1c(c(c(c(c1[H])c2c(c3c(c(c2[H])Br)N(C(=N3)C4=NOC5(C4([H])[H])C(C(OC(C5([H])[H])([H])[H])([H])[H])([H])[H])[H])[H])C(F)(F)F)[H])[H]

mol107.png

[H]c1c(c(c(c(c1[H])c2c(c3c(c(c2[H])Cl)N=C(N3[H])C4=NOC5(C4([H])[H])C(C(C(C(C5([H])[H])([H])[H])(C([H])([H])[H])C([H])([H])[H])([H])[H])([H])[H])[H])C(F)(F)F)[H])[H]

mol110.png

[H]c1c(c(c2c(c1[H])C3=NOC@([H])C([H])([H])N4C(C(N(C(C4([H])[H])([H])[H])C([H])([H])C([H])([H])[H])([H])[H])([H])[H])[H])[H]

mol123.png

[H]c1c(c(c(c(c1[H])F)c2c(c(c3c(c2[H])N=C(N3[H])C4=NOC5(C4([H])[H])C(C(C(C(C5([H])[H])([H])[H])([H])[H])([H])[H])([H])[H])[H])[H])OC(F)(F)F)[H]

mol139.png

[H]c1c(c(c(c(c1[H])[H])[C@]2(C(C(=NO2)c3c(c(c4c(c3[H])N=C(N(C4=O)[H])[H])[H])[H])([H])[H])C([H])([H])[H])[H])[H]

mol156.png

[H]c1c(c(c(c(c1[H])c2c(c(c3c(c2[H])N(C(=N3)C4=NOC5(C4([H])[H])C(C(C(C(C5([H])[H])([H])[H])([H])[H])([H])[H])([H])[H])[H])C(F)(F)F)[H])Cl)[H])[H]

mol185.png

[H]c1c(c(c(c(c1[H])[H])[C@@]2(C(C(=NO2)c3c(c(c4c(c3[H])C(C(C(=C4[H])[H])([H])[H])([H])[H])[H])[H])([H])[H])[H])[H])[H]

mol199.png

[H]c1c(c(c(c(c1[H])[H])C2=NOC@([H])c3c(c(c4c(c3[H])C(=O)N(C(=N4)[H])[H])[H])[H])[H])[H]

mol223.png

[H]c1c(c(c(c(c1C2=NOC@@([H])C3=Nc4c(c(c(c(c4C(=O)O3)[H])Cl)[H])[H])[H])[H])Cl)[H]

mol239.png

[H]c1c(c(c(c(c1[H])c2c(c(c3c(c2[H])N=C(N3[H])C4=NOC5(C4([H])[H])C(C(OC(C5([H])[H])([H])[H])([H])[H])([H])[H])[H])[H])Cl)[H])[H]

t164

t164 is only covered by these three conformations of one molecule: [H]C1=C(N(C(=C1[H])C(=S)N=P(N(C([H])([H])[H])C([H])([H])[H])(N(C([H])([H])[H])C([H])([H])[H])N(C([H])([H])[H])C([H])([H])[H])C([H])([H])[H])[H]

It also doesn’t apply to any molecules in the industry dataset (both by my checks and as mentioned in the Sage 2.0 paper), so it either needs much more training data, or, more likely, it needs to be refined and/or separated into multiple parameters.

Possible Training Data

From ChEBI

Name

SMILES

Image

Name

SMILES

Image

trimethyl(phenylimino)phosphorane

P(=NC1=CC=CC=C1)(C)(C)C

 

P,P-diphenylphosphinimidic amide

NP(=N)(c1ccccc1)c1ccccc1

 

N,N',P,P-tetraphenylphosphinimidic amide

N(c1ccccc1)P(=Nc1ccccc1)(c1ccccc1)c1ccccc1

 

apholate

C1CN1P1(=NP(=NP(=N1)(N1CC1)N1CC1)(N1CC1)N1CC1)N1CC1

 

phosphenodiimidic amide

P(N)(=N)=N

 

hexakis(2,2,3,3-tetrafluoropropoxy)cyclotriphosphazene

FC(F)C(F)(F)COP1(OCC(F)(F)C(F)F)=NP(OCC(F)(F)C(F)F)(OCC(F)(F)C(F)F)=NP(OCC(F)(F)C(F)F)(OCC(F)(F)C(F)F)=N1

 

From ChEMBL

These SMILES should be in the same order as the images below.

Not #15X4

Only two of these molecules, shown below, involve a nitrogen-phosphorus double bond, where P does not have 4 substituents. So it seems the original bug report suggesting #15X4 as more appropriate may be correct, but this chemistry is very rare.

 

Related content

Increasing dataset coverage
Increasing dataset coverage
More like this
Using Espaloma to discover areas to improve parameters
Using Espaloma to discover areas to improve parameters
Read with this
Redundant parameters in Sage 2.1
Redundant parameters in Sage 2.1
More like this
2022-12-08 Force Field Release Meeting notes
2022-12-08 Force Field Release Meeting notes
More like this
Comparing YAMMBS to old benchmarks
Comparing YAMMBS to old benchmarks
More like this
Systematic errors in parameters
Systematic errors in parameters
More like this