Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Time

Item

Presenter

Notes

Sulfonamide issue

Hyesu Jang

  • Sulfonamide issue in Parsley

  • HJ: During the opt NH2 is being pulled in by the SO2 group and Sulfur breaks the tetrahedral structure and the distance between the oxygens and NH2 group is not chemically intuitive. This was not the case with SIMIRNOFF99Frost. This was introduced in parsley due to bad parameters.
    The parameters involved are a30, a31, the values moved far away in 1.2 and 1.3 when compared to SMIRNOFF99Frost.
    The possible need of higher order potential terms in angle terms.
    Long back when we talked about sp2 carbon’s sum of angles going over 360 and discussed about a higher order term.
    CB also suggested a basic check of sulfonamide or hypervalent nitrogen containing molecules similar to HMR canary test.

  • BS: Is it for the gas phase too

  • HJ: Yeah.

  • CB: Thank you HJ for digging into this. First-aid to stop the bleeding, even if it is not perfect, is to go back to 1.3.0 and correct the equilibrium angle for the tetrahedron to 109.5 from 99.1.
    First thing is to find how and where this is happening? How can we prevent it from happening? Something has terribly gone wrong with those equilibrium angles being 99.1 and 89.8 in Parsley 1.3
    We saw a similar thing with phosphonic acid before and you fixed it.

  • SB: Surely, there is something gone wrong in training. The force constants are tight.

  • BS: You have two hydrogens on the nitrogen, what are the charges on the oxygens.

  • CB: I checked the charges on the oxygen are normal sulfonamide values, IIRC -0.75…Both oxygens being pulled off symmetrically out seems wrong. I would immediately change the equlibrium value as a first aid.

  • BS: That certainly is an easy thing to try.

  • CB: The real question is how could it happen and how do we prevent it from happening again. I have been using this since it our best version. And guess how many of those JACS sets have sulfonamides.

  • DLM: Quite a few. I think we should go beyond the current metrics like RMSD, etc., and look at some specific benchmark systems like this that we know can be problematic and throw us off the guardrails.

  • CB: Yeah, I completely agree. In the case of phosphoric acid I was going away from tetrahedral structure and found it.

  • DLM: SB, do you agree

  • SB: Yeah, completely. We should do this.

  • CB: What happens to sulfonamide structures at the minimum?

  • HJ: I did the simple minimization w/ 1.2 and 1.3, when I used 1.2 even with the weird equilbrium values the final minimized geometry is not that weird. But, in 1.3 that’s not the case.

  • CB: Yeah, then we need to look at SB’s suggestion of what structures contributed to this pathology.
    So, what all other pathologies we might be missing since a30 is a general parameter with wildcards. And, especially hypervalent structures.
    Put one of these inside a ring and I would bet that would behave well.

  • DLM: I think the in-ring training molecules are forcing it to change.

  • CB: We need one valence angle that would give a good value out of rings and squeeze when it matches to a ring.

  • DLM: I still think it’s a case of in-ring training molecules.

  • CB: A molecule that has an in-ring sulfone is still valuable for us.

  • DLM: We don’t have an infrastructure to filter out training data another practical thing is to split in-ring parameters separately.

  • CB: I completely agree. But that means we take out molecules completely?

  • SB: I can see why it might be attractive to do a split but careful automated design of training datasets is the way to go.

  • DLM: Yeah, splitting things is not the better way.

  • SB: I was talking to Josh Horton this morning and thinking of more elaborate way of training rather than dump a dataset in a blackbox of forcebalance, some insights other than just the loss function, like individual gradients, etc.

  • CB: I immediately need a patch for 1.3 here are some Free energy results where I see the bad performance.

  • SB: I agree for the first-aid but be cautious that this didn’t break anything else. Also, a small set of canary tests. Also, picking a few sulfonamide containing ligands from JACS set and do free energy calcs. After these sets I would greenlight a patch.

  • DLM: David Hahn mentioned some compute availability.

  • SB: Yeah, he reached out to me.

  • DLM: May be CB can run some on Orion.

  • CB: I do, absolutely looking forward to it. I am doing a benchmark and I could rerun these with the patch. What other X4 molecules may have these small values of equilibrium angles (away from tetrahedron), any atoms with square planar geometries would be those. I don;t think we have any of those chemistries.

  • SB: So, if we were to make an alpha patch release 1.3.1 and send it to you would you suggest any tests we can do.

  • CB: I can run two offending datasets within 2 weeks and see if there is any improvements in free energies.

  • SB: That’s great we will probably get it to you by later today or by tomorrow morning.

  • DLM: I would like to say the time scale of modifications is amazing from years to days.

  • SB: I really like this too.

  • CB/BS: This is impressive.


Training set feedback from advisory board

Simon Boothroyd

  • DLM: Feedback from advisory board

  • SB: Alberto Gobbi looked at the substituents list and gave some feedback about some exotic looking polyhalogenated molecules to discard. I will post them on ff-release slack.

  • CB: Hydrazines are encountered but not so frequently. R2#40-R2#52, we screen them frequently but I don’t remember seeing any of those in a drug.

  • DLM: I think it falls into the category of not training but validating.

  • CB: Chlorine on nitrogen, R1#101 and R1#102, that’s crazy.

  • HJ: So, I would remove them all?

  • SB: Yeah, exactly.

  • CB: R1 #91 is horrible pentavalent nitrogen.

  • CB: Tri halo methyl , di halo methyl, mono halo methyl are important.

  • SB: If it’s a non-carbon with halogens attached I think it is safe to filter out, like halogenated phosphorus, etc.

  • CB: Those are precursors, like people order reagents from enamine.

  • SB: Okay, we get rid of multiple halogens on the same carbon instead of same halogen.

  • DLM: R1#65 has a particular isotope of flourine.

  • SB: I don’t think TK handles an isotope.

  • SB: JH, do you know qcsubmit during deduplication would remove isotopes?

  • JH: It is inchi based so depends on it.

  • SB: Okay, other thing in terms of stereoisomers we may just keep one, to keep the load light.

  • CB: In short term, because of our need for chemical diversity yeah we can remove stereoisomers.

  • HJ: Yeah, I agree with that. But, I think there aren’t many added since the molecules are so small.
    I got a comment from John Chodera, that I might be missing some substituents since I am not using the complete set?

  • SB: I don’t think we are missing anything significantly right now and we can go ahead with the current set.

  • DLM: I think the question was why we are not doing all of enamine or something like that.

  • SB: I think that sounds good.

  • DLM: CB are you happy with the dataset?

  • CB: I am very happy. I think we are making some decisions here that are short term based on current priorities and we need to keep in view what long term decisions that we are postponing or putting on back burner (or later burner)

  • SB: I completely agree we should document those.

Bill Swope

  • BS: Poly halogens on a carbon atom are typically encountered beyond pharma, certainly in atmospheric chemistry where radicals do all sorts of things, I hope we don;t get blindsided by focusing on drugs.

  • DLM: I thought about doing some studies beyond pharma where a general FF is being used.
    Thermo ML may have some, SB can look into this.

Action items

  •  

Decisions