Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

...

...

...

...

...

...

...

1. Tracing the origin of the amide issue

...

(1) Addition of new amide-specific torsion parameters

Torsion profile of N-methylacetamide amide bond generated using v1.2.0 fails failed to reproduce the QM torsion profile(left plots), forming a small hump at the minimum. Re180 degree, which doesn’t appear in the QM profile. Energy decoupling(bottom left plot) showed that torsion term contributes the most to the formation of the hump, which indicated re-fitting the related torsion parameters might solve the issue.

Before this amide issue was raised, I found that there’s a missing torsion term for dialkyl amide in the current parameter set and to fix the dialkyl amide issue(for more details, please refer to here), we’ve decided to add new amide-specific torsion parameters. The preliminary study showed that the addition of the new torsion terms improved the performance of reproducing QM optimized geometries of dialkyl amides.

To see if the new amide-specific torsion terms would also solve the amide issue, re-fitting to the selected torsion targets which rotate an amide bond couldn’t remove the hump(targets scanning amide bond dihedral) was carried out with the new amide-specific torsion terms, then the test calculation was done to see if the re-fitted parameter set improved the torsion profile of N-methylacetamide amide bond(removal of the hump + reproduction of QM basin depth at the minimum). The test calculation(right plots) showed that while the re-fitting improved the torsion energy term(bottom right plot), it deformed other terms to force the formation the hump at 180 degree.

(2) Investigation of fitting targets

The next step of the tracing was investigating targets.

...

From a closer look at the targets used to train amide bond parameters(t69s, t70s), in the re-fitting, I found that some targets behave not as expected at around their planar geometries (having a small peak, instead of parabolic shape near the minimum. right figure), which leads the parameter set fitted to the targets to fail to reproduce the simple torsion profiles with make a small hump even when scanning amide bonds in simple molecules whose torsion profiles have parabolic shape near the minimum (like N-methyacetamide case).

So one One easy fix of the problem is using simple targets which behave exactly as we expect , whose torsion profiles have parabolic shape near the minimum (planar amide bond geometry at the minimum) and here’s what I did:

1. filtering non-planar structure at minimum geometry

since the 2nd generation torsion training set is too small to filter, i pulled Roche torsion set (1st gen) and filtered with various scheme.

...

scheme 1. check improper dihedral angle of input.mol2 (which is generated using openeye toolkits) take targets whose C-center and N-center improper dihedral angles are both between -5.0, 5.0 degree;

...

. + Expecting that this experiment to be one proof of the need for using simple-as-possible torsiondrive targets in general torsion parameter fitting.

2. Filtering out non-planar structures at the minimum geometry

Several schemes to filter non-planar molecules:

  • Scheme 2

    • Check a planarity of the geometry at the minimum point of a QM torsion profile;

    • Planar geometry: C-center improper dihedral angle, N-center improper dihedral angle between -5.0, 5.0 degree;

    scheme 3. check QM data to see if the profile has its
    • number of targets selected: 51 - 1(N-methylacetamide) = 50

  • Scheme 3 (rough)

    • Check if the QM profile has local minima at 0

    or
    • and 180 degree;

    scheme 4. targets passed
    • Number of targets selected: 38

  • Scheme 4

    • Targets passing either scheme 2 or scheme 3

fb-fit2 vs fb-fit2-2

...

SMIRKS
 

...

 initial guess

...

fb-fit2

...

fb-fit4

...

t69a 

...

[*:1]-[#7X3:2]-[#6X3$(*=[#8,#16,#7]):3]~[*:4]

...

2.5 (1+cos(2x-180)) 

...

2.5 (1+cos(2x-180)) 

...

2.516923679597e+00(1+cos(2x-180)) 

...

t69b

...

[*:1]-[#7X3:2]-!@[#6X3$(*=[#8,#16,#7]):3]~[*:4]

...

2.5 (1+cos(2x-180))

...

1.652160092455e+00 (1+cos(2x-180))

...

1.799860521668e+00(1+cos(2x-180))

...

t70

...

[#1:1]-[#7X3:2]-[#6X3:3]=[#8,#16,#7:4]

...

3.459249459574e+00 (1+cos(2x-180))  + 1.356955617521e+00 (1+cosx)

...

3.459249459574e+00 (1+cos(2x-180))  + 1.356955617521e+00 (1+cosx)

...

3.458787578663e+00(1+cos(2x-180))  + 1.357031068121e+00(1+cosx)

...

t70a

...

[#1:1]-[#7X3:2]-!@[#6X3:3]=[#8,#16,#7:4]

...

2.5 (1+cos(2x-180))  + 2.0 (1+cosx)

...

-2.081991778915e-01 (1+cos(2x-180))  + 1.400460484703e+00 (1+cosx)

...

-1.615997113282e+00(1+cos(2x-180))  + 1.289607014774e+00(1+cosx)

...

t70b

...

[*:1]-[#7X3:2]-!@[#6X3:3](=[#8,#16,#7:4])-[#6,#1]

...

2.5 (1+cos(2x-180)) 

...

4.091207931356e+00 (1+cos(2x-180)) 

...

4.079468007826e+00(1+cos(2x-180))

...

t70c

...

[#1:1]-[#7X3:2]-!@[#6X3:3](=[#8,#16,#7:4])-[#6,#1]

...

2.5 (1+cos(2x-180))  + 2.0 (1+cosx)

...

1.301446828323e+00 (1+cos(2x-180))  + 9.476830965983e-01 (1+cosx)

...

1.020531593037e+00(1+cos(2x-180))  + 7.399229872960e-01(1+cosx)

...

t70d

...

[*:1]-[#7X3:2]-!@[#6X3:3](=[#8,#16,#7:4])-[#7X3]

...

2.5 (1+cos(2x-180)) 

...

1.271374777009e+00 (1+cos(2x-180)) 

...

1.142115082029e+00 (1+cos(2x-180))

...

Although the k1 value for t70a seems unphysical, the result below shows the plots got improved after the optimization. checked improvement for all of the four fitting targets.

...

...

SMIRKS
 

...

 initial guess

...

fb-fit2-2

...

fb-fit4-2

...

t69a 

    • Number of targets selected: 63 -1(N-methylacetamide) = 62

(Since the 2nd generation torsion training set is small to filter, i pulled Roche torsion set (1st generation) and filtered with various scheme. )

...

For example, the target #540 is filtered out in scheme 3 (hump at 0 degree), but not in scheme 2, since the geometry at the minimum is planar.

3. Comparison of different filtering schemes

  • Test set 1: (1) the simplest molecule, N-methyacetamide, (2) non-planar molecules filtered out in both scheme 2 and scheme 3;

  • Test set 2: all amide torsions in the 2nd generation training dataset;

  • Overall quality of fitting and test calculation:

v1.2.0

fb-fit2 (scheme 2)

fb-fit3 (scheme3)

fb-fit4 (scheme 4)

Unfiltered

test cal 1

144.59

91.81

82.39

88.84 

-

test cal 2

25.37

15.62

14.75

14.15

23.21

test cal 2-1 simple

7.10

6.21

7.49

8.29

11.04

test cal 2-2 complex

18.26

9.40

7.26

6.87

12.17

final X2 (fitting)

-

1.31e+02 → 2.22e+01 

1.30e+02 → 1.73e+01

2.09e+02 → 3.01e+01

5.28e+02 → 6.90e+01

(1) N-methylacetamide

...

(2) Test set 1 (N-methyacetamide, non-planar molecules in the 1st gen. Roche torsion set)

...

(3) Test set 2 (All amide torsions in the 2nd gen. training set)

...

(4) Check performance on non-planar molecules

...

(5) Direct comparison of final parameters

F(θ) = k1*(1+cos(2θ-180)) + k2*(1+cosθ)

459249459574e+00 (1+cos(2x-180))  + 1.356955617521e+00 (1+cosx)

ID, SMIRKS

 Initial guess

fb-fit2

fb-fit3

fb-fit4

t69a [*:1]-[#7X3:2]-[#6X3$(*=[#8,#16,#7]):3]~[*:4]

k1 = 2.5 (

k1= 1+cos(2x-180)) 

1.679285484776e+00 (1+cos(2x-180)) 

1.778613695219e+00(1+cos(2x-180)) 

t70

.6793e+0

k1 = 1.3647e+0

k1 = 1.7786e+0

t70 [#1:1]-[#7X3:2]-[#6X3:3]=[#8,#16,#7:4]

k1 = 3.

3.566161051804e-01 (1+cos(2x-180))  + 1.354248559088e+00 (1+cosx)

-1.357682774067e+00 (1+cos(2x-180))  + 1.266337423993e+00 (1+cosx)

t70b

4592e+0

k2 = 1.3570e+0

(from 1.2.0)

k1 = 3.5662e-1

k2 =1.3542e+0

k1 = 2.4868e-1

K2 = 9.4280e-1

k1 = -1.3577e+0

k2 = 1.2663e+0

t70b [*:1]-[#7X3:2]-!@[#6X3:3](=[#8,#16,#7:4])-[#6,#1]

k1 = 2.5 (1+cos(2x-180)) 

4.169159533591e+00 (1+cos(2x-180)) 

4.107144298045e+00(1+cos(2x-180))

t70c

k1 = 4.1692e+0

k1 = 4.4151e+0

k1 = 4.1071e+0

t70c [#1:1]-[#7X3:2]-!@[#6X3:3](=[#8,#16,#7:4])-[#6,#1]

k1 = 2.5 (1+cos(2x-180))  +

k2 = 2.0 (1+cosx)

k1 = 1.240609067243e+00 (1+cos(2x-180))  + 9.065552775144e-01 (1+cosx)

1.115929833406e+00(1+cos(2x-180))  + 7.646525300953e-01(1+cosx)

t70d

2406e+0

k2 = 9.0656e-1

k1 = 1.5283e+0

k2 = 7.3406e-1

k1 = 1.1159e+0

k2 = 7.6465e-1

t70d [*:1]-[#7X3:2]-!@[#6X3:3](=[#8,#16,#7:4])-[#7X3]

k1 = 2.5 (

k1= 1+cos(2x-180)) 

1.303139175986e+00 (1+cos(2x-180)) 

1.167960735598e+001+cos(2x-180))

...

  • For the test set (amide-rotating Roche torsion targets which are not selected either from scheme 2 or from scheme 3), can’t see the effect of splitting endo-cyclic rotation and exo-cyclic rotation

  • Parameter sets fitted to scheme 2 and scheme 4 performs similarly, while the parameter set fitted to the scheme 4 slightly more prefer to form hump at the near-planar structure. (could be good or bad)

...

v1.2.0

...

fb-fit2

...

fb-fit2-2

...

fb-fit4

...

fb-fit4-2

...

test cal.

...

144.59

...

90.79 

...

91.81

...

88.54

...

88.84 

...

final obj. fn. (fitting)

...

-

...

1.38e+02 → 2.21e+01

...

1.31e+02 → 2.22e+01 

...

2.16e+02 → 2.95e+01

...

2.09e+02 → 3.01e+01

...

.3031e+0

k1 = 1.6149e+0

k1 = 1.1680e+0

* About the counter-intuitive negative k1 value of t70, [#1:1]-[#7X3:2]-[#6X3:3]=[#8,#16,#7:4]

...

Although the k1 value for t70a seems unphysical, the result below shows the plots got improved after the optimization. checked improvement for all of the four fitting targets.

...

4. Conclusions

+ Selecting simple geometries (no strong electrostatic interaction) may need to be considered in the next torsion training dataset selection procedure.