Sulfonamide issue in Parsley
HJ: During the opt NH2 is being pulled in by the SO2 group and Sulfur breaks the tetrahedral structure and the distance between the oxygens and NH2 group is not chemically intuitive. This was not the case with SIMIRNOFF99Frost. This was introduced in parsley due to bad parameters.
The parameters involved are a30, a31, the values moved far away in 1.2 and 1.3 when compared to SMIRNOFF99Frost.
The possible need of higher order potential terms in angle terms.
Long back when we talked about sp2 carbon’s sum of angles going over 360 and discussed about a higher order term.
CB also suggested a basic check of sulfonamide or hypervalent nitrogen containing molecules similar to HMR canary test.
BS: Is it for the gas phase too
HJ: Yeah.
CB: Thank you HJ for digging into this. First-aid to stop the bleeding, even if it is not perfect, is to go back to 1.3.0 and correct the equilibrium angle for the tetrahedron to 109.5 from 99.1.
First thing is to find how and where this is happening? How can we prevent it from happening? Something has terribly gone wrong with those equilibrium angles being 99.1 and 89.8 in Parsley 1.3
We saw a similar thing with phosphonic acid before and you fixed it.
SB: Surely, there is something gone wrong in training. The force constants are tight.
BS: You have two hydrogens on the nitrogen, what are the charges on the oxygens.
CB: I checked the charges on the oxygen are normal sulfonamide values, IIRC -0.75…Both oxygens being pulled off symmetrically out seems wrong. I would immediately change the equlibrium value as a first aid.
BS: That certainly is an easy thing to try.
CB: The real question is how could it happen and how do we prevent it from happening again. I have been using this since it our best version. And guess how many of those JACS sets have sulfonamides.
DLM: Quite a few. I think we should go beyond the current metrics like RMSD, etc., and look at some specific benchmark systems like this that we know can be problematic and throw us off the guardrails.
CB: Yeah, I completely agree. In the case of phosphoric acid I was going away from tetrahedral structure and found it.
DLM: SB, do you agree
SB: Yeah, completely. We should do this.
CB: What happens to sulfonamide structures at the minimum?
HJ: I did the simple minimization w/ 1.2 and 1.3, when I used 1.2 even with the weird equilbrium values the final minimized geometry is not that weird. But, in 1.3 that’s not the case.
CB: Yeah, then we need to look at SB’s suggestion of what structures contributed to this pathology.
So, what all other pathologies we might be missing since a30 is a general parameter with wildcards. And, especially hypervalent structures.
Put one of these inside a ring and I would bet that would behave well.
DLM: I think the in-ring training molecules are forcing it to change.
CB: We need one valence angle that would give a good value out of rings and squeeze when it matches to a ring.
DLM: I still think it’s a case of in-ring training molecules.
CB: A molecule that has an in-ring sulfone is still valuable for us.
DLM: We don’t have an infrastructure to filter out training data another practical thing is to split in-ring parameters separately.
CB: I completely agree. But that means we take out molecules completely?
SB: I can see why it might be attractive to do a split but careful automated design of training datasets is the way to go.
DLM: Yeah, splitting things is not the better way.
SB: I was talking to Josh Horton this morning and thinking of more elaborate way of training rather than dump a dataset in a blackbox of forcebalance, some insights other than just the loss function, like individual gradients, etc.
CB: I immediately need a patch for 1.3 here are some Free energy results where I see the bad performance.
SB: I agree for the first-aid but be cautious that this didn’t break anything else. Also, a small set of canary tests. Also, picking a few sulfonamide containing ligands from JACS set and do free energy calcs. After these sets I would greenlight a patch.
DLM: David Hahn mentioned some compute availability.
SB: Yeah, he reached out to me.
DLM: May be CB can run some on Orion.
CB: I do, absolutely looking forward to it. I am doing a benchmark and I could rerun these with the patch. What other X4 molecules may have these small values of equilibrium angles (away from tetrahedron), any atoms with square planar geometries would be those. I don;t think we have any of those chemistries.
SB: So, if we were to make an alpha patch release 1.3.1 and send it to you would you suggest any tests we can do.
CB: I can run two offending datasets within 2 weeks and see if there is any improvements in free energies.
SB: That’s great we will probably get it to you by later today or by tomorrow morning.
DLM: I would like to say the time scale of modifications is amazing from years to days.
SB: I really like this too.
CB/BS: This is impressive.