/
New parameters from Espaloma/MSM analysis

New parameters from Espaloma/MSM analysis

Small rings (Lexie)

Summary

First round of experiments

In general, I have two versions of a small ring FF that I’m iterating on.

Both of these force fields have the following changes:

a3 [*;r3:1]1~;@[*;r3:2]~;@[*;r3:3]1--> a43 [*;r3:1]1~;@[*;r3:2]~;@[*;r3:3]1 (same SMIRKS, moved to the end to catch heteroatoms)

a7 [#6r4:1]-;@[#6r4:2]-;@[#6r4:3] --> a42 [*;r4:1]-;@[*;r4:2]-;@[*;r4:3] (made generic & moved to the end to catch heteroatoms)

New parameter: a41: [*;r5:1]@[*;r5:2]@[*;r5:3]

New parameter: a41a: [*;r5:1]@[#16;r5:2]@[*;r5:3]

New parameter: a13a: [*;r6:1]~;@[*;r5;x4:2]~;@[*;r5;x2:3] (splits spiro rings from fused rings)

The difference between the two FFs is in the 4-member ring “external” angles--e.g. angles where one atom is in a 4-membered ring but one or more of the others aren’t.

Version 1 is more similar to Sage, but correcting some over-specificity.

a8: [!#1:1]-[#6r4:2]-;!@[!#1:3]--> [!#1:1]-[*;r4:2]-;!@[!#1:3] (same param ID)
a9: [!#1:1]-[#6r4:2]-;!@[#1:3]--> [!#1:1]-[*;r4:2]-;!@[#1:3] (same param ID)

These angles don’t distinguish between ring-ring-nonring or nonring-ring-nonring--they instead differentiate between H vs non-H.

Version 2 is aimed at making the ring/nonring distinction, by introducing two new parameters a44 for nonring-r4-nonring angles and a45 for r4-r4-nonring angles.

Parameter removed: a8

Parameter removed: a9

New parameter: a44: [*;!r4:1]~[*;r4:2]~[*;!r4:3]

New parameter: a45: [*;r4:1]@[*;r4:2]~;!@[*:3]

Second round of experiments

The second round of experiments is aimed at distinguishing between H and non-H atoms.

Both force fields have the following modifications (in addition to those described in iteration 1):

New parameter: a4a: [*;r3:1]~;@[*;r3:2]~;!@[#1:3] r3 atom - r3 atom - H

New parameter: a6a: [#1:1]-[*;r3:2]~;!@[#1:3] H - r3 atom - H

a13a: [*;r6:1]~;@[*;r5;x4:2]~;@[*;r5;x2:3] -->

[*;r6:1]~;@[*;r5;x4,*;r5;X4:2]~;@[*;r5;x2:3]

a42:[*;r4:1]-;@[*;r4:2]-;@[*;r4:3]--> [*;r4:1]-;@[*;r4x2:2]-;@[*;r4:3]

New parameter: a14a:[#1:1]~!@[*;X3;r5:2]~;@[*;r5:3]

Version 1 has expanded a8 and a9 to distinguish between nonring-ring-nonring vs ring-ring-nonring as well as distinguish more between H/nonH:

New parameter: a8a: [*;r4:1]@[*;r4:2]-;!@[!#1:3]

New parameter: a9a: [*;r4:1]@[*;r4:2]-;!@[#1:3]

New parameter: a9b: [#1:1]-[*;r4:2]-;!@[#1:3]

Version 2 has expanded a44 and a45 to distinguish between H/nonH:

New parameter: a44a: [#1:1]~[*;r4:2]~[#1:3]

New parameter: a45a: [*;r4:1]@[*;r4:2]~;!@[#1:3]

3-membered rings

First iteration of experiments

Moving a3 ([*;r3:1]1~;@[*;r3:2]~;@[*;r3:3]1) to the end so that it also picks up the epoxy C-O-C angle (which was previously covered by a28). Instead of adding to the end, could also change a28 to be not-3 (or 4-) membered ring atoms.

Chris Bayly also suggested the following generic parameters be added for 3-membered rings, to cover situations where you have r3-r3-not r3 and not r3 - r3 - not r3:

[*;r3:1]-[*;r3:2]-[*;!r3:3]
[*;!r3:1]-[*;r3:2]-[*;!r3:3]

Some of these are already covered with existing parameters, need to figure out how they interact with the suggested ones:

a4: [*;r3:1]~;@[*;r3:2]~;!@[*:3] r3 atom - r3 atom - not in (same) ring

a5: [*:1]~;!@[*;r3:2]~;!@[*:3]not in (same) ring - r3 atom - not in (same) ring

a6: [#1:1]-[*;r3:2]~;!@[*:3] H - r3 atom - not in (same) ring

For now, it appears that these capture the desired chemistry based on inspecting the captured molecules, so I’m leaving them as is for the first iteration.

New parameter: Renamed a3 to a43, due to moving it to the end, though it sounds like the numbering is arbitrary so could probably keep the same name. This parameter is after the 4- and 5- membered ring internal angles below.

Second iteration of experiments

For the next iteration of experiments, I will split these parameters based on H vs nonH, adding the following new parameters:

a4a: [*;r3:1]~;@[*;r3:2]~;!@[#1:3] r3 atom - r3 atom - H

a6a: [#1:1]-[*;r3:2]~;!@[*:3] H - r3 atom - H

Plot of MSM k vs angle (deg) for a4 from Small Molecule v1/2. Blue indicates the molecules that would be covered by the original parameter a4, while red indicates molecules that would be covered by the new parameter a4a. MSM data is for the training set.
Plot of Espaloma k vs angle (rad) for a4 from Small Molecule v1/2. Blue indicates the molecules that would be covered by the original parameter a4, while red indicates molecules that would be covered by the new parameter a4a. Espaloma data is for the benchmark set.
Plot of Espaloma k vs angle (rad) for a6 from Small Molecule v1/2. Blue indicates the molecules that would be covered by the original parameter a6, while red indicates molecules that would be covered by the new parameter a6a
Plot of MSM k vs angle (deg) for a6 from Small Molecule v1/2. Blue indicates the molecules that would be covered by the new parameter a6a, while red indicates molecules that would be covered by the original parameter a6

 

4-membered rings

First round of experiments

Moving a7 ([#6r4:1]-;@[#6r4:2]-;@[#6r4:3]) to the end, and changing the SMIRKS pattern to [*;r4:1]-;@[*;r4:2]-;@[*;r4:3] so that it also catches heteroatoms which were previously included under a1, a18a, and a28. Instead of adding to the end, could also keep SMIRKS change but change a28 to be not-4 (or 3-) membered ring atoms and remove a18a (or make it specific to 5-membered rings).

Chris Bayly also suggested the following generic parameters be added for 3-membered rings, to cover situations where you have r4-r4-not r4 and not r4 - r4 - not r4:

[*;r4:1]-[*;r4:2]-[*;!r4:3]
[*;!r4:1]-[*;r4:2]-[*;!r4:3]

Our existing parameters are:

a8: [!#1:1]-[#6r4:2]-;!@[!#1:3]
a9: [!#1:1]-[#6r4:2]-;!@[#1:3]

Which are both too specific (central atom must be C) and too broad (first atom could be in-ring or out of ring).

Looking at the distributions for a8 and a9 below, it’s not clear whether specifying ring-ring-nonring and nonring-ring-nonring separately will make a difference.

Plot of MSM k vs angle (deg) for a8 in Sage 2.1.0. Blue indicates the molecules that are nonring-r4-nonring angles, while red indicates molecules that are r4-r4-nonring angles.
Plot of Espaloma k vs angle (rad) for a8 in Sage 2.1.0. Blue indicates the molecules that are nonring-r4-nonring angles, while red indicates molecules that are r4-r4-nonring angles.

 

Plot of MSM k vs angle (deg) for a9 in Sage 2.1.0. Blue indicates the molecules that are nonring-r4-nonring angles, while red indicates molecules that are r4-r4-nonring angles.
Plot of Espaloma k vs angle (rad) for a9 in Sage 2.1.0. Red indicates the molecules that are nonring-r4-nonring angles, while blue indicates molecules that are r4-r4-nonring angles.

 

For now, I am trying two approaches:

  1. keep a8 and a9, but replace the central atom with a wildcard so it can be any 4-membered ring atom.

  2. Remove a8 and a9, and add two new parameters to the end: a44([*;!r4:1]~[*;r4:2]~[*;!r4:3] ) and a45 ( [*;r4:1]@[*;r4:2]~;!@[*:3]). I specified !r4 instead of !@ in a44 because specifiying !@ led to it not picking up fused rings. I left a45 with !@ because specifying !r4 led to it missing two attached (e.g. connected by a single, non-ring bond) 4-membered rings.

New parameter: Renamed a7 to a42, due to moving it to the end. In (2) above, added a44 and a45 for the respective SMIRKs patterns listed.

Second round of experiments

First, I will change a42:[*;r4:1]-;@[*;r4:2]-;@[*;r4:3]--> [*;r4:1]-;@[*;r4x2:2]-;@[*;r4:3] to specify non-fused 4-membered rings.

Second, I will explore making both sets of exocyclic angle parameters more specific.

  1. Add new parameters a8a: [*;r4:1]@[*;r4:2]-;!@[!#1:3] and a9a: [*;r4:1]@[*;r4:2]-;!@[#1:3] to be specific to ring-ring-nonring angles, and a9b: [#1:1]-[*;r4:2]-;!@[#1:3] to catch H-ring-H angles that are currently treated by the generic a2. Based on the parameter distributions from MSM/Espaloma, I’m not sure the a8 split will do much.

Plot of MSM k vs angle (deg) for a8 from Small Ring v1. Blue indicates the molecules that would be covered by the new parameter a8a, while red indicates molecules that would be covered by the original parameter a8
Plot of Espaloma k vs angle (rad) for a8 from Small Ring v1. Blue indicates the molecules that would be covered by the new parameter a8a, while red indicates molecules that would be covered by the original parameter a8
Plot of MSM k vs angle (deg) for a9 from Small Ring v1. Blue indicates the molecules that would be covered by the new parameter a9a, while red indicates molecules that would be covered by the original parameter a9
Plot of Espaloma k vs angle (rad) for a9 from Small Ring v1. Blue indicates the molecules that would be covered by the original parameter a9, while red indicates molecules that would be covered by the new parameter a9a

 

Plot of MSM k vs angle (deg) for a2 from Small Ring v1. Blue indicates the molecules that would be covered by the original parameter a2, while red indicates molecules that would be covered by the new parameter a9b
Plot of Espaloma k vs angle (rad) for a2 from Small Ring v1. Blue indicates the molecules that would be covered by the original parameter a2, while red indicates molecules that would be covered by the new parameter a9b

 

  1. Add new parameters a44a: [#1:1]~[*;r4:2]~[#1:3] and a45a: [*;r4:1]@[*;r4:2]~;!@[#1:3] to split out H vs non-H parameters

Plot of MSM k vs angle (deg) for a44 from Small Ring v2. Blue indicates the molecules that would be covered by the new parameter a44a, while red indicates molecules that would be covered by the original parameter a44
Plot of Espaloma k vs angle (rad) for a44 from Small Ring v2. Blue indicates the molecules that would be covered by the original parameter a44, while red indicates molecules that would be covered by the new parameter a44a

 

Plot of MSM k vs angle (deg) for a45 from Small Ring v2. Blue indicates the molecules that would be covered by the original parameter a45, while red indicates molecules that would be covered by the new parameter a45a
Plot of Espaloma k vs angle (rad) for a45 From Small Ring v2. Blue indicates the molecules that would be covered by the original parameter a45, while red indicates molecules that would be covered by the new parameter a45a

 

TODO: Might be worth putting 44 and 45 where 8 and 9 are in the order. Maybe they are picking up different things, conflating the comparison.

5-member rings

First iteration of experiments

Currently we don’t have any internal r5-r5-r5 ring angles, so I made one. I just made a generic one: [*;r5:1]@[*;r5:2]@[*;r5:3] but we may want to break it down further. Looking at the MSM parameter distribution, it seemed like the non-aromatic rings were clustered together, but the aromatic rings were all over the place in a way that made it not obvious how to split them.

 

Plot of MSM k vs angle (deg) for a41 in Small Ring v1/2. The distribution is broad but it’s hard to see how to break it down further.
Plot of Espaloma k vs angle (rad) for parameter a41 in Small Ring v1/2. The distribution is very wide, but it’s hard to see how to break it down further.

New parameter: Added a new parameter to the end called a41.

Additionally, five-membered rings with S typically have a 90-degree angle around the S, rather than ~105 for other atoms. As a result I added a new parameter a43a with the pattern [*;r5:1]@[#16;r5:2]@[*;r5:3].

New parameter: Added a parameter a41a after a41.

I’ve also looked into splitting a13, as it currently covers both fused and spiro rings. Splitting them into two separate categories seems clear via the MSM parameters, so I added a13a ([*;r6:1]~;@[*;r5;x4:2]~;@[*;r5;x2:3]), which separates out the spiro rings. However, the split is less clear using Espaloma, as there is a lot of variation even within fused or spiro rings that is not present in the MSM data.

Plot of MSM k vs angle (deg) for a13 in Sage. Blue indicates the molecules that are fused rings, red indicates molecules that are spiro rings covered by the a13a pattern above.
Plot of Espaloma k vs angle (rad) for a13 in Sage. Blue indicates the molecules that are fused rings, red indicates molecules that are spiro rings covered by the a13a pattern above.

 

New parameter: Added a new parameter after a13 called a13a.

Chris Bayly suggested looking into the ring-ring-nonring and nonring-ring-nonring parameters for 5-membered rings as well. I took a look and they didn’t look too different from the distributions they were a part of.

 

MSM k vs angle (deg) for a1 in Sage. Red indicates matches for [*;r5:1]@[*;r5:2]~;!@[*:3]. There is a clear distinction between r5-r5-H (bottom cluster) and r5-r5-nonH (more diffuse top cluster). But these belong to larger X-X-H and X-X-X clusters and I don’t see a reason to split out the r5
MSM k vs angle (deg) for a1 in Sage. Red indicates matches for [*:1]~!@[*;r5:2]~;!@[*:3]. There is a clear distinction between nonH-r5-H (bottom cluster) and nonH-r5-nonH (more diffuse top cluster). But these belong to larger X-X-H and X-X-X clusters and I don’t see a reason to split out the r5
MSM k vs angle (deg) for a2 in Sage. Red indicates matches for [*:1]~!@[*;r5:2]~;!@[*:3], which here is H-r5-H. It isn’t particularly distinct from the rest of the angles.
MSM k vs angle (deg) for a19 in Sage. Red indicates matches for MSM k vs angle (deg) for a2 in Sage. Red indicates matches for [*;r5:1]@[*;r5:2]~;!@[*:3], which isn’t particularly distinct from the rest of the angles.

These external 5-member ring angles appear in almost every angle parameter distribution and usually aren’t very distinct. I think it would require a lot of care to separate them, as there is a lot of diversity currently being captured by the different parameters assigned to the angles, and I don’t want to lose that by lumping them together. Left for later.

Second iteration of experiments

Found a SMIRKs that captures all the 13a molecules: [*;r6:1]~;@[*;r5;x4,*;r5;X4:2]~;@[*;r5;x2:3]

Plot of MSM k vs angle (deg) for a13 in Sage. Blue indicates the molecules that are covered by the original parameter, red indicates molecules that are covered by the a13a pattern above.
Plot of Espaloma k vs angle (rad) for a13 in Sage. Blue indicates the molecules that are covered by the original parameter, red indicates molecules that are covered by the a13a pattern above.

 

a14 [*:1]~!@[*;X3;r5:2]~;@[*;r5:3] treats r5-r5-nonring--split into H vs nonH by introducing a14a:[#1:1]~!@[*;X3;r5:2]~;@[*;r5:3]

Plot of MSM k vs angle (deg) for a14 for Small Ring v1/2. Blue indicates the molecules that will be covered by the original parameter a14, red indicates molecules that would be covered by the new parameter a14a.
Plot of Espaloma k vs angle (rad) for a14 for Small Ring v1/2. Blue indicates the molecules that will be covered by the new parameter a14a, blue indicates molecules that would be covered by the new parameter a14.

 

Issue with fused rings

One issue I have noticed with separating the small ring parameters is that there is no way to specify in a SMARTS pattern that a given atom is in a ring of a given size. The primitive r indicates the size of the smallest ring the atom is a part of, but if it is part of a fused or spiro ring, this may lead to issues. The primitive R denotes that an atom is part of a ring, but can only be modified by the number of ring bonds, not the size of the ring.

The two atoms that are part of the shared fuse bond are labeled as r4, since that is the smallest ring they are part of, so the highlighted angle is not caught by the pattern [r5:1]@[r5:2]@[r5:3]
Chapin suggested getting around this by using the pattern [R:1]@1@[R:2]@[R:3]@[R]@[R]@1 for 5-membered ring internal angles. However, that pattern picks up the highlighted angle above, as the “outer” ring in the fused ring system is 5-membered.

After a lot of experimenting I haven’t been able to find a solution that involves a single elegant SMARTS pattern. To get these right, we may have to add a number of very specific parameters, and increase coverage for fused rings.

Results

First iteration of experiments

Benchmarks for both versions of the Small ring FF are shown below. For DDE, Small ring v1 improves performance over Sage, and this improvement persists regardless of whether or not small rings are present in the benchmark set. This suggests that appropriately treating the small ring parameters leads to improvement in other parameters, that perhaps were pulled in an non-optimal direction to overcompensate for the incorrectly treated small rings. RMSD and TFD performance slightly improves over Sage, or stays the same.

I believe the worse performance of Small Ring v2 is due to grouping together H and non-H angles in a44 and a45, which are treated separately in Small Ring v1.

DDE Benchmark on the Industry dataset for Small Ring v1 and v2, compared to Sage 2.1.0. Left panel shows performance on the whole dataset, while the top right panel shows performance on molecules without small rings and the bottom right panel shows performance on molecules with small rings. Small ring v1 substantially improves performance over Sage, while Small ring 2 matches Sage, and this performance persists regardless of the presence of small rings in the dataset.
RMSD Benchmark on the Industry dataset for Small Ring v1 and v2, compared to Sage 2.1.0. Left panel shows performance on the whole dataset, while the top right panel shows performance on molecules without small rings and the bottom right panel shows performance on molecules with small rings. Small ring v1 substantially improves performance over Sage, while Small ring 2 matches Sage, and this performance persists regardless of the presence of small rings in the dataset.
TFD Benchmark on the Industry dataset for Small Ring v1 and v2, compared to Sage 2.1.0. Left panel shows performance on the whole dataset, while the top right panel shows performance on molecules without small rings and the bottom right panel shows performance on molecules with small rings. Small ring v1 substantially improves performance over Sage, while Small ring 2 matches Sage, and this performance persists regardless of the presence of small rings in the dataset.

 

Other parameters I’ve looked at

All parameters:

High-priority parameters:

Related content