2021-05-07 WBO/Impropers meeting notes

Date

May 10, 2021

Participants

  • @Pavan Behara

  • @David Mobley

  • @Jessica Maat (Deactivated)

  • @Christopher Bayly

  • @Simon Boothroyd

  • @Trevor Gokey

Goals

  •  

Discussion topics

Time

Item

Presenter

Notes

Time

Item

Presenter

Notes

 

 

@Jessica Maat (Deactivated)

  • JM: I looked at all of the molecules in the aniline set and calculated the average of the impropers, most of them are around 30 and only one of them is planar

    does this mean we need to select more molecules or the optimization is not giving a full picture?

  • DLM: Here’s Chris, we are looking through JM’s aniline set structures and the improper angle for the optimized geometry for the nitrogen.

  • CB: The central one or the top one?

  • JM: Sorry, I didn’t highlight the nitrogens

  • CB: So, that’s only in cases of substituents with two nitrogens, we can still look at the others, so the set is for a wide spectrum of electron withdrawing substituents with the aniline fixed on one end?

  • JM: Yeah, that’s right.

  • CB: So, the electronic structure calculations are at a decent level of theory that show flattening/pyramidilization, which is what?

  • DLM: This is with our current level of theory we are using for FF fits.

  • PB: Yeah, B3LYP-D3BJ/DZVP

  • CB: Okay, so this is a level of theory which should show the correct pyramidalization within a stone’s throw, and these are all minimizations right?

  • JM: Yeah.

  • CB: So, the terminal amine is allowed to get its most favoured level of pyramidalization, and this is where we are finding that except one case of flat, rest all are pyramidal

  • DLM: So, this structure has 3 nitrogens and we are not sure for which the improper was calculated.

  • JM: So, this is not like grid optimization where I had the indices and I used openeye to check which nitrogen is invertible in the molecule and I calculated for the first one I encountered, that was a mistake, will correct it.

  • DLM: That’s okay.

  • CB: So, firstly this seems a bit counterintuitive to me, the pyramidalization of nitrogen is sensitive or should be sensitive to how conjugated it is and that should be affected by the substituent in the para position.
    N(+)#N is the most EWG and I am expecting a continuous scanning of the pyramidalization, most EDG are supposed to be highly pyramidal and most EWG should be planar.
    OSO-CF3 is EWG and it has 23.8 deg, aldehyde etc are also EWG.

  • DLM: I think there are two things here,
    1. most EWGs have values of around 20-25, and most EDGs have 30+
    2. another issue is having a broad flat minima spanning -20 or below to 20+ degrees and depends on where the minima lies.
    That’s probably how some of these are looking like.

  • CB: I think you’re right, -25 to +25 seems to be the range. All EDGs have highest values and EWGs have lower 20’s, I am wondering why there aren’t many other lower 20’s or more flat ones from the list of EWGs.
    On this slide at the bottom a pyridine with -NH-CH3 at the bottom is supposed to be flat.
    Having the difference between top row center, which is flat, top row right hand side which doesn't want to go flat, and then the bottom row left hand side (flat bottom) that’s the spectrum I want our WBO interpolated improper to cover.
    JM, do you have the corresponding wbo values for these structures you’re showing? Can we find anything which has a well behaved relationship between the improper angle and wbo, and I was proposing to use the minima with the para substituted EWGs, whether planar, pyramidal and in between.

  • DLM: I think where we are right now is we are about to run a set of 2D scans, and then we said we should enumerate a set of substituents and pick up a set of molecules we bother to run the 2D scans.

  • JM: Yeah, exactly.

  • CB: Okay, I want to throw a strawman proposal, so far what I have seen is the N(+)#N is one extreme, planar, and at the other end of the spectrum we have phenoxide, -O-, which is pyramidal, in between you got some other, and the pyridine with the -NH-CH3, these are all cheap computations, I didn’t see anything in between that’s like slightly planar, we have the long list of substituents and I think we can run them quickly so that we may sample in the region of 0 to 20 degrees.

  • DLM: We’re talking about the Nitrogen and ring wbo, right?

  • CB: Yeah, aniline nitrogen and the ring.

  • JM: I didn’t calculate that, just to clarify do you want the AM1 wbo or from QM?

  • CB: I want both actually, I have a mental mapping of AM1 to QM wbos.

  • DLM: Visualization I am wishing I could have is a 2D plot, x-axis has AM1 bond order and y-axis has the average of impropers you are showing now, and each point is color-coded on how much electron withdrawing it is.

  • CB: Sure, Taft-Hammett parameter gives a value for the sigma withdrawing and pi withdrawing, we are particularly interested in pi-withdrawing if you have got a choice.

  • DLM: I would say we can have two plots with AM1 wbo and the second one with QM wbo, and we can pick molecules that follow linear correlation and outliers for our 2D scans, also planarity. If there is a good correlation then we pick less molecules otherwise more.

  • CB: JM, if the N(+)#N has a planar structure and yet nitro another strong EWG is showing us 24 degrees, so we should poke more substituents to sample around 0 to 20 degrees.

  • DLM: I think if we have an idea of the computation time for the 2D scans and if it is cheap then we should go ahead and do that instead of theorizing without numbers.

  • SB: Yeah, I completely agree, we have a subset of the dataset with 12 2D scans in the queue and we can add more to that.

  • DLM: Sounds good.

  • CB: So, we were hoping to get 0 to 37 degrees and we go more in the 20 to 37 range. We do have something to look at here,
    - choosing substituents with planarity we have the values here
    - another aspect is to look at the range of WBOs

  • DLM: Okay, what this means is in the 12 molecules you picked for 2D scan make sure you have the flat one, something between 20-25, 25-30, etc., and run them, and while they’re running do the analysis of wbo versus pyramidalization.

  • JM/SB: Okay, sounds good.





@Pavan Behara

  • Summary: https://openforcefield.atlassian.net/wiki/spaces/FF/pages/1691254806

  • PB: I looked at the molecules that went into training the a30, a31 parameters in 1.2.0/1.3.0, and a30 spans a range of 97 to 127, while a31 spans from 80 to 115. Here is a molecule which shows a wide range of values for a30.

  • CB: So, one of those is an angle of 97 deg?

  • PB: Yeah, between (15, 24, 22), NSO angle is 97.51

  • CB: It seems fishy though.

  • DLM: You do have some weird sterics going on here, the two oxygens coming out of Sulfur are pretty congested.

  • CB: I am still surprised to see that.

  • DLM: Do you have it in 3D viewer?

  • PB: Just a second, I can grab that.

  • DLM: Yeah, I agree if we see the geometry then we can decide.

  • PB: Here you go.

  • CB/DLM: Yeah, there is nothing fishy here. That’s a really interesting molecule.

  • CB: And, this is minimized with a good level of electronic structure theory, that’s interesting.
    Also, I appreciate the fix SB put in with 109.5 and it works as a first aid.
    Now, we have to figure out whether the chemistry is contributing to the parameter values or the force constant needs to be low so as to allow the bond to be more flexible such that a wide range of angle values are picked up.
    Let’s look at the angles of others, what I can see is that the O=S=O is always high at 120+, and we may need a child parameter a30a for these sulfone oxygens and Trevor’s method could help us figure this out without getting deviated by the intramolecular stuff.

  • TG: Yeah, I can do that easily, to split a parameter when the angles are different by 10 degrees.

  • CB: The second molecule, looks like it has a formal charge of -2, that looks weird, might be a double bond between the nitrogen and sulfur with a broken valence.

  • DLM: I think we have some of those with weird charges due to tautomer enumeration.

  • CB: So, even if we have a O=S=O parameter it wouldn’t match this, so we should have to include this too. And, when we open up a tetrahedron wide then it compresses the other angles.
    The data you showed PB has a minimum value around 99 (97 degrees) and still the parameter takes up the minimum value, instead of an intermediate value, this needs to be checked.

  • PB: So, a31 has these ring molecules that have a wide range of angles.

  • CB: Okay, even if there are cyclic molecules the equilibrium angle should be 109.5 degrees and we shouldn’t bother about the ring molecules, and forcebalance too shouldn’t pick this value.
    When you look at the parameter a31 it is quite general and caters to both cyclic and acyclic molecules, and the value should be much higher than the 90 degrees we have now. So, what are the values of the acyclic a31 s?

  • PB: Around 100-104 degrees.

  • CB: The equilibrium angle of a 4 membered ring is 90 degrees and it shouldn’t dictate what value this parameter should take.
    So, even if you put 109 as parameter value and a strong force constant when you put it in a ring it would still optimize to 90 degrees no matter what. I wonder why 1.3.0 picked up this value with all the cyclic and acyclic molecules in the training set, and we should look into that direction.

  • DLM: So, at some level my sense of what’s going on here, Sulfur occurs in a wide variety of environments and lot of the other atoms we deal with which ends up weird angles and

  • SB: I agree with Chris, this is the question I want answered as well, what I would like to see PB is when you optimize this molecule with 1.3.0 what does the angle actually come to?

does our too small angle parameter really drive the angle too small? What I would like to see is how different are these angles wrt QM? IIRC we saw the geometries to match closely and the error occurs while doing MD.

We focused more on the equilibrium angle but we should look at the force constant too. And one of my observations where small angles happen is not just sulfones but primary sulfonamides, it seems the hydrogen oxygen pinching, I would love some more data. One small experiment is take a bunch of very small rings 4-, 5-, 6-membered alkanes plus a bunch of straight chains and optimize this with forcebalance and see why this is happening, is it the geometries, or vibrational frequencies? I recently found the rmsd to be worse with vib freq and without vib freq it was good.

  • PB: Yeah, when we did geometry optimizations with 1.3.0 and the changed parameters on the JACS set they are almost aligning with each other. (link)

  • CB: My recollection on what Hyesu was saying is that with Parsley 1.2 and 1.3 is that there were some perturbed molecules.
    SB, I agree with both your suggestions, lets compare FF minima with QM minima, and secondly about vib freq.

  • SB: Yeah, exactly. Here I can show the benchmark plots where without using vibfreq targets we have a very good RMSD value.

    There is definitely something going on here and I am very curious on why this is happening.

  • CB: Yeah, that’s a very motivating example.

  • PB: These are the ones with the LJ’s refit?

  • SB: Yeah, from 1.3.0 and refitting LJ params.

  • CB: So, is this with the vib freq itself or with the internal hessian stuff.

  • SB: This is still with the vib freq.

  • CB: Ahaan, I have been questioning vib freqs for a long time and this is a smoking gun.

  • SB: So, if we don’t include vib freqs do we get our force constants correct from torsiondrives, optgeo only.

  • CB: So, my take on this is that when you’re working on a prototype method you use two or three structures then the equlibrium geometries cannot inform the force constant, but when you are fitting to a span of geometries then that’s enough to explain the differences in equilibrium geometries.

  • SB: That’s perfect! I was also hoping that. Another thing we talked about in the past is to include the geometries in the optimization trajectory so that along with the minimum we are also overwhelming with data along the potential wells.

  • CB: I was proposing that early on too for the same reasons you are saying, as long as we do it the way Hyesu was suggesting for vib freq, I would not look at the gradient directly, I mean not the x, y, z of the atoms of the gradient but rather at the internal coordinate gradients.

  • SB: Yeah, sounds good.

  • CB: If you go with the x,y,z then you will have the same issue again with frequencies.

  • SB: Yeah, I am also thinking the same way.

  • CB: It looks like even with out the gradient we would get better results. The scary thing to me is the how qualitatively wrong sulfonamides are and what could make it qualitatively wrong and I was thinking about rings, but your slides imply vib freq might be the culprit.

  • SB: Yeah, I agree. But, I think we need to explore the rings direction too to show that the parameters are not pushed in that direction. PB, can you look into that if you have time?

  • PB: Yeah, sure. So, in general for alkanes, right? or in the context of sulfonamides?

  • SB: Yeah, in general, I think you can pick simple rings from QCA and do test fits, with subsets, and also including optgeo, torsions, and vib freq targets.

  • CB: Trevor has a nice dataset of alkanes, would that be a right dataset for PB to use?

  • TG: PhalkethOH has them, from 3- to 9-membered rings.

Action items

@Jessica Maat (Deactivated) will look into the aniline dataset analysis, plots with wbo, avg. improper, and electron withdrawing-ness (color coding with Taft Hammett pi- or sigma- values)
@Jessica Maat (Deactivated) will submit 2D scans picking substitutents from current list covering a range of planarity and wbo values
@Pavan Behara will do some test fits with cyclic and acyclic compounds
@Pavan Behara will compile the QM-MM RMSD differences with 1.3.0
@Pavan Behara will do a refit of 1.3.0 without vib freq, starting from SmirnoffFrosst param values for a30, a31
@Pavan Behara another experimental fit with a child parameter for *=S=* matches

Decisions