2021-11-05 Meeting notes

 Date

Nov 5, 2021

 Participants

  • @Pavan Behara

  • @Jessica Maat

  • @Simon Boothroyd

  • @Daniel Cole

  • @David Mobley

  • @Christopher Bayly

 Goals

  •  

 Discussion topics

Presenter

Notes

Presenter

Notes

@Pavan Behara
@Jessica Maat

  • Slides: https://docs.google.com/presentation/d/13-pcW8uGxtwweIzHvyYKzwjmkYNsiMw5k-wLuixjBvw/edit#slide=id.gf791a9c705_0_62

  • DM: Yeah. Pavan. Jessica, do you guys have some? I should've thought more about this. Do you have some background we can show Danny? Yeah. Sharing the background, you would probably help. Chris.

    CB: Morning.

    DM: So we have Danny Cole on today too, so we can kind of get his input on, some of the tricky things related to interpolation and things.

    CB: Hi, Danny.

    DC: Hello. I can try at least anyway. Yeah.

    DM: Okay. So maybe I'll take this one really to get us going. So, you know, this is some work put in your dissertation just looking at some different biphenyls and you know, on the bottom left you see these torsion skins where the barrier height varies dramatically as you change. Thus the bond order for the central bond, as you know, as you're going from neutral to cation to anion to zwitterion. And this is the barrier height from the QM torsion scan in the top right, and in the force field, obviously, that's the same bond.

    DC: Oh, I think we've lost your sound, David.

  • PB:

    CB: Well, so. If I could just give my $0.02. So those four structures that were high as original work or ones that I gave her just sort of challenge her with the notion that the electron withdrawing and donating capabilities across the central bond of the biphenyl change its bond order and this changes the barrier height. This she confirmed and it went on to become a part of her PhD work. So this is was the genesis of the whole outset, the outset of saying let's make such torsion barriers for conjugated bonds, which can conjugate and with the electron pushing vary between single bond ish and double bond ish, let's get that incorporated into the force field through the Wiberg Bond order. So what, what we're seeing on these graphs of on is the default behavior that we are trying to move away from with the green points, K1 General and the purple points openFF-2.0.0. We see that because there's no correlation, there's no inclusion of any kind of bond order effect. Of course, the torsion is more or less invariant to whatever the bond order is, when we begin to use the electronic structure component from the Y bond order calculated at the am1 level, we get the orange dots and the red dots, am I right, Pavan? So the orange dots and the red dots show coming up with a using the quick to calculate Wiberg bond order to map the torsion potential from QM. You're getting the behavior of the orange dots and the red dots which are at least including the qualitatively the phenomenology of having the torsion barrier increase as the wiberg bond order increases and what you'd really like is if this kind of very basic, some would say compromised wiberg bond order from AM1 to do the mapping, well, it doesn't get the QM barrier correct, but it gets close. And this is this is is why we're looking at why are bond orders in the first place. But what stopped us for so many months was that when we tried a general fit of this kind of behavior or the general data set that had Substituents Ortho, that we were successful in getting a useful model for the increasing torsion barrier because of the confounding effects. So what we're seeing here Pavan am I right, is that with the orange dots and the red dots, we are getting what looks like to be a pretty reasonable correlation of the QM torsion barrier height for the unsubstituted biphenyls population with the QM using the Wiberg bond order calculated to the AM1 level. So am I right? Okay. So my question to you, my first question to you is, what's the difference between the orange dots and the red dots?
    PB: Difference in optimizations, in k1_wbo only the torsion is optimized and in k1_wbo_optgeo_td I optimized the bonds and angles as well.
    CB: Good. So so what we're seeing is that when you include these factors. This actually helps the overall model approach closer to the QM behavior. So Daniel, are we we bring you up to speed on this or are we sort of are we visiting? How have you seen this before?

    DC: I've seen sort of plots like this. I think the last ones I saw, they didn't work. So this is news to me. Okay.

    CB: So yeah, so to me, the really good news here that I'm seeing in this just to the whole group is look at those orange and red lines when we include the WBO with the unsubstituted biphenyl. We've got a bunch of compounds with a very low barrier height at five cals per molar smaller meaning this biphenyl is easy to flip around. It's easy to we would expect a fairly large amount of rotation of the phenyl along that biphenyl bond. But then as you get up to just this wide part bondorder of I'm just going to go up to 1.15, which is where we're actually or on the, on the left hand plot, you know. Yeah. At 1.3 to 1 point. Yeah. With the training set targets. We actually and the test set targets we've got this dynamic range of wiberg bond order, but yet it's the behavior to capture a species which would actually not flip the biphenyl very much at all. And that's what we're getting with the orange dots and the red dots for getting this range of behavior of how much this panel would want to rotate, which is quite significant going between five kcal/mol or just under up to 15. And the interesting thing is that K one general the green dots would have all those at a very high rotation where they wouldn't want to flip and openff 2.0.0 actually has them all at a very low barrier height where they would be it would be too easy for them to rotate. So I'm just saying what I'm trying to do when I give that descriptive feedback is saying where I expect this wiberg bond order dependence to actually matter to users in the field where they're trying to have got a biphenyl ligand and it's in the active site and with open F 2.0.0, we would be expecting that biphenyl to flip much more often than it really would. In reality and with k one general, it would flip a lot less often than we would expect it to generally for most of the structures. But with either the orange, the orange model or the red model, we would get the spectrum of behavior which correctly reflects more correctly reflects the chemistry pretty closely. Well. More in some cases than others. It isn't perfect. At least getting it somewhat correct compared to the QM torsion barrier. So the QM torsion barrier is what we would get from Bespoke. So, what I'm looking at is if you compare just the QM with the K-1 general green dots or the openFF 2.0.0 purple dots, we would be looking at a graph which says why we should always do the bespoke fitting. And people who would be advocates of bespoke fitting would say, here is why we we never try and use one of these lame force fields to do real force fields where we actually care about things like in free energy calculations. But with the wiberg bond order dependence, we actually are able to address in a very fast way this phenomenology to make bespoke fitting less essential. That should matter to people who are doing want to do real models, but it should also matter to people in our consortium and people in industry and people who want to do ultimately free energy calculations but want to have better general force fields going in when they're doing faster assessments of many, many compounds. Okay, so I've given you my diatribe on it, David. I'm giving value judgments. I'm doing comparative analysis. Is this a compelling case for putting this into our force field, these graphs?

    DM: So I think part of the issue I don't know if you have slides on this Pavan is that while it's doing something good here. There aren't very many if we look broadly at what chemistry occurs in our data sets we don't have. That many cases where there are higher bond orders because you have more slides you're going to show on. More on WBO.

    DC: Can I just have one more piece of information to fill in the picture? Could you just go back to the molecules? I just want a sort of picture of what they. What these various substituents might look like. Do you happen to know one that. What happened?

    DM: So he wanted to know, is there do you have an example of one that would have a low barrier and have a high barrier?

  • PB:

    DC: Okay. That's a yellow, is it?

    DM: Yeah. Something with a higher barrier.

  • PB:

  • DM: Oh. At the positively charged nitrogen on the other end. Yeah.

    DC: Okay. Do you always have to change the charge state to get those high? So I mean, yeah, there might be relevance to whether it's. But it should be included in force field is I guess how. Maybe how often a medicinal chemist designing a generic series of molecules would actually observe a big change in WBO and therefore big change in the torsion.

  • PB: Mostly by changing the EWG and EDGs on the other end we get this wide range of wbos, not necessarily the change in charge states. And, here are the benchmarks on the subset of unsubstituted biphenyls on the industry benchmark set

  • DC: So that these are RMSDs from minimum energy structures.

  • PB: Yeah.

  • DC: Yeah. So, maybe improvements in barriers may not show up in.

    CB: That's true.

    DM: Yeah. I mean, maybe. I guess one question is how many molecules is this? So you said a subset of molecules from the industry set that actually said this parameter. How many is it? And the other question is like, I wonder if we have any how much torsion drive data we have that we wouldn't be using in fitting that would involve molecules that use these parameters. So not necessarily drives of these torsions, but torsion drive data. Yeah, because so you can imagine, like, as as Danny said, like, we should see more differences as we look at like off minima geometries, I think. And you may want to check how much data we have to assess that.

    CB: Could I just brief? I mean. Do that, and it just has to do with relevance here.

    DM: Yep. Can you stop, Shane? Oh. Chris wants to show us something. If you could stop sharing, that'd be great.

    CB: Can you see my screen? Yeah. What I'm showing is the I'm in Orion I'm showing the tile representation of a result from a short trajectory MD calculation where we're trying to look at the beta secretase data set of the JAX 15 data.

    Okay. Real molecules, real data set. We want to look at the. We want to get some correlation with affinity for looking at binding. So I'm just going to home in on this molecule CAT-4c, unsubstituted biphenyl

    I'm sure all of you remember all your undergraduate chemsitry. The ether at the top is a fairly significantly electron donating substituent, which will give this single Bond more double Bond character. The other one at the bottom can it be inductively withdrawing, but not pi withdrawing over in the meta position here. But I'm just saying, to the extent that it withdraws at all, it will increase the the double bond character of this central bond in this unsubstituted biphenyl. Okay. So there we have got a case where we've got an electron donating in the real molecule. Here is the other another compound in their series. This has a pair of nitrogen. This is strongly electron withdrawing. Here. So this would be pulling density across the bond, except we've got the cation inductively pulling as well. So we would expect this bond to be less double bond in character. So we go back here and in the same data set, where is that unsubstituted ring? Somewhere in this is the substituted ring.

    CB: Oh, here we've got a cyano group, strongly electronic withdrawing. I'm having trouble finding it in my list here. Okay, well, I can go with this guy.

    Here's the ortho the meta chloro which would be so this isn't much electron withdrawing or donating and it's in the meta position. So this would have a pretty much undisturbed in terms of its double bond or single Bond character. So I'm just coming in and saying in the 3000 set that you have that you're looking at. One could look at that set and say, if this is representative, if there's 3000 molecules in our openFF test set is representative, maybe we don't have to care too much. But what I'm trying to just show you is with kind of one of the no brainer data sets routinely used in the industry, which is the beta secretase data set. It's absolutely looking at substituents across with this potential substituents and in fact all the the R-groups being exercised are all in this, it's actually exercising this ring and also somewhat this substituents so two R-groups, but all the ones here are influencing the double bond character of this on substituted biphenyl. So it really matters. So when we go into data, I want to show you that the correlation is pretty bad. So here's our correlation with MMPBSA it's terrible.

    If I went into the non-equilibrium switching results, which is in a different project sort of. So I don't have time to do it today. The correlation is also pretty bad. So we could all be asking ourselves for years, why is the correlation so bad? Could it have to do with this bad behavior of the phenyl group? And maybe I could do what people are beginning to believe they have to do. This is from the Mckrell group and from Schrodinger. That really the only way to deal with this is with the bespoke fitting. And you'd only do bespoke fitting for the small number of compounds you're actually looking at. So I've taken a few moments of your time. I wanted to just show you was you're never going to make me believe that these wiberg bond order contributions are not important. And if our tests if our test set is telling us that these aren't very significant, then there's something wrong with our test set. And if the qualitative behavior of these biphenyls is going to matter, I can show you what one of the standard data sets we use is a data set where it matters.

    CB: So what I'm what I'm trying to say is I think we need to stick to the course for our OpenFF some of the decisions being made in the collaborative effort. We're kind of ending up soft shoeing the wiberg bond order because we had these initial problems dealing with the fact that we were training on sets with highly convoluted, you know, steric interactions and so on. But now that we fix that, I would wish that we would see that this is an important added value to give to a general force field. And that those red that red line, the correlation on those graphs, the red lines and the and the red dots and the orange dots to me say that we're we're there and we can begin to put this in the force field as it is. Do we still have to prove anything or show anything. So when we show slides that say, well, we've got this correlation, but it's not showing really a big contribution to the. Overall because we don't have many substituted biphenyls in our test set what I would hope we would do is get the message from that, that we've really got to address our test set and not get the message from that maybe why wiberg bond or dependence is not very important.

    DM: Yeah. So we have Danny here for his expertise, and I want to try to jump to the part that we really needed that for us. I think we're at that point with torsions and what Pavan was showing today, it looks like we may finally be seeing, I guess, signs of life that it may be helping after a great deal of effort. It took us a while to get there and we've also been doing this work on impropers where like if you look at so Danny, if you do this same kind of thing for impropers, Jessica, you could show your series that you constructed and looking at the improper angle as a function of electron withdrawing and donating.

  • JM:

    DM: If you make a similarly carefully constructed a series of molecules, where around a specific nitrogen like an aniline nitrogen you carefully modulate the amount of electron withdrawing or donating substituents neighboring it, you can take it from pretty flat to pretty tetrahedral. And. So that works well in carefully curated sets. You have this plot. So the vertical axis is wiberg bond order. Horizontal is average improper angle for the amount of pucker and then as colored by electron withdrawing or donating level. And this is Jessica this by like the Hammett coefficients stuff. But then if you flip down to the histograms of the datasets

    we have we just don't see much range of wiberg bond order in so CHEMBL or the data we have on QCarchive.

    DM: Kind of running into this issue where if we carefully curated a set, there seemed to be signs of life. Then maybe you can show 19 also.

    But then if we look at real molecules we have on hand, we don't see very much. That's interesting, looking at all the anilines we're seeing in our data sets, there are all covering a pretty narrow range of wiberg bond order. So I think and Simon might be able to say this question better than me. But I think the question we sort of wanted to ask was, do you have chemical intuition, or electronic properties we should be looking at to do a better job sorting through this and figuring out what the trends are. It seems like we keep getting stuck both in, also stuck with an asterisk, both in the proper torsions and improper torsions. If you're getting stuck on this issue of carefully curated datasets show nice trends, but just processing lots of molecules from your large scale data sets doesn't show nice trends. And so then we end up having to invest a bunch of time and human effort into figuring out what to do next, and it feels like there should be a better way forward. In the torsions we may be finally coming out the other side of that time and effort and getting something that works, but not yet on impropers. So any tips? I mean, was that the right question?

    SB: Yeah, I think that sums up pretty well, I think there were questions that were kind of asking with a thought and it kind of make sense to use the wiberg bond order because it captures the electronic structure around that bond. But when one thinks of like an improper well, then you need to kind of incorporate the electronic structure you really want, like a number that describes the electron around the atom.

    DM: Oh, yeah.
    SB: In some ways, incorporates like the electronic structure of the bonds surrounding it. And so it doesn't seem to appear was necessarily appropriate there. But we thought maybe the atoms in molecule like you especially would maybe have a metric for, you know, can we get a number that describes the electron characteristics of an atom to say a bond?

    DM: That's very important. I'm glad you brought that up. Knew I was forgetting something because so in the proper torsions, you're looking at a bond that's rotating in improper torsions you have several bonds coming out of the nitrogen. So here we can look at a bond that we know is important in the Aniline case because we've designed the set. But if we look at all of our nitrogens, which bond do we look at? Which bond’s WBO, so that's what Simon is asking.

    DC: Okay. So so in the plot where you've got correlation, what's the WBO doing there then? It's sort of a surrogate for how much charge is on the nitrogen or how much electron density is on the nitrogen.

    DM: Yeah.

    DC: Because it's measuring how much the neighboring groups are withdrawing electron densities and also. And another sort of fundamental question so I can see in the case of torsions, why you want to get rid of the need to do the whole torsion scan to decide what the barrier heights should be. But what's what's the goal here? You want to be able to say what an improper angle should be in a molecule without having to optimize it.

    DM: So, it's partly that, as you change what's going on around a nitrogen we know that it does go from relatively very flat to very tetrahedral and somewhere in between, or it can. And you could either make a custom improper for each new molecule you look at in order to capture that or you could get something out of the charging calculation you're doing anyway to tell you about how flat it should be. So you could use interpolated improper portions to use the charge information or bondorder information or whatever other information you're getting out of your QM or AM1 calculation to drive that interpolation.

    DC: Okay.

    DM: So it's kind of like getting bespoke parameters without them actually being bespoke.

    DC: But I've still got my bespoke thinking hat on so we can't cheat and just take the optimized geometry as the the input.

    DM: And even if you did though, like currently that tends to make things very stiff. And in the few cases where we've looked at this, things that are like they're actually many cases where the you might have a fairly soft improper where it's happy being flat or being tetrahedral.

    CB: Dan, do you prefer Danny?

    DC: Anything, Danny is most common.

    CB: Danny, that the way to think about this for a general forcefield is we want to make a force field that does a reasonable job for completely parameterizing and I mean completely parameterizing small molecules at the rate of 100 per hour on one CPU. So that is in the realm of doing a AM1 calculation for all of those molecules. But it completely rules out any possibility of doing a QM calculation for any of those molecules. So think of it at the level of throughput and what we're trying to do with the openFF effort is come up with a general force field that will parameterize small molecules at the rate of 100 per hour. And we are already there. We can already parameterize small molecules at the rate of 100 per hour. We're there, no we're just trying to make the force field even better. So everything that we're trying to do cannot have even a single QM calculation on one of those molecules. That's what we're trying to do. But I want to ask a question. So you brought something up, which I thought was pretty interesting. I'm the person who's always sort of looked at this nitrogen pucker in terms of the wiberg bond orders of the constituent bonds, because it's through conjugation that I think we're making it planar. But a point that you raised that I thought was pretty interesting is you talked about the electron density on the nitrogen. So and that's kind of an interesting surrogate measure as well. I'm wondering, have we ever looked at just the AM1 Coulson population, many people think it's a Mulliken population, but there's actually a coulson on the on these nitrogen runs. Could that correlate? And the thing is that simple. It's just sitting on the nitrogen and looking at one quantity rather than trying to figure out different or bond orders around it, which has to do with the associated bonds. Do we do we have those AM1 populations densities.

    DM: I do not know. Do you know if they're exposed in the openeye toolkits?

    CB: Yes, you can use a charge engine which is just AM1.

    DM: But then you just get the charges, right? You're talking about the populations, are they the same?

    CB: They're the same.

    DM: Okay.

    CB: So I would want to believe that the population on a nitrogen is what makes it flat, it's the conjugation that makes it flat. And that's why I've been always looking at saying, oh, the wiberg bond order, but Danny is right. I mean, even if it's not a direct measure, it might be a pretty good surrogate measure because anything that conjugates with that nitrogen is pulling density off it. So maybe that would be a good surrogate measure. And it's a lot easier than than the Wiberg Bond order and it doesn't matter so much what's bonded to it.

    DM: It's worth checking.

    DC: Yeah. To me, it was sort of with my physics background, thinking of sort of competition between conjugation and the lone pair sitting on the nitrogen. So you can do things like natural bond order analysis, which gives you the like the population of the lone pair on the nitrogen, which might be absolutely ideal. But then I don't think you want NBO analysis, I think it's pretty much only available through Gaussian, which was the reason for suggesting electron density as a sort of surrogate for how many electrons are sitting in the lone pair.

    CB: Well and added to that issue is also the fact that semi empirical methods, or traditional semi empirical methods in general, and one in particular I think has a known defect that it doesn't do all that well with nitrogen pyramidalization. Well, which would be a specific screw up on the on the lone pair population but it might do okay with the overall charge density.

    DM: I don't think we necessarily need to restrict ourselves to AM1. So for example if we said we know we have a lot of data sitting in the archives. So if there were anything else we could get. Anything else we should be looking at instead that we couldn't get from AM1. If it turns out that that other thing is a silver bullet that can solve all of our problems, then we could think about building a model that's going to be able to predict that.

    DC: Yes. So it turns out Simon mentioned the atoms in molecules stuff. So that's now available in psi4 so you can do a psi4 calculation and get MBIS charges, this atoms in molecule electron density, which you could then count up how many electrons on the atom for the charge? And in addition, it gives you the dipole and quadrupole of that atomic electron density, which might give some information on the lone pair.

    DM: Be interesting to see, I don't have any intuition for this at all, but like if you looked at all trivalent nitrogen agents or something. Would there be so we more or less? No, there's not a correlation between wiberg bondorder or and improper angle across all trivalent nitrogens for the logical reason that we don't know which WBO to look at, but maybe there would be a correlation between improper angle and like MBIS electron density.

    DC: Maybe, yeah, if the AM1 charges it didn't already do it.

    DM: Yeah.

    DC: That would be important.

    DM: So we should look at AM1 charges and geometry and, we should look at MBIS charge and geometry.

    CB: Well I guess all this is about mapping so AM1-BCC is now over 20 years old, it's about mapping. It's not saying the AM1 charge is any good at all, it needs to be corrected by something, it's mapping. So the AM1 charge I would not hold out a lot of hope for directly correlating with the QM pyramidalization, but it only has to map, it only has to be following the same trend. In a way that looks like this beautiful graph here.

    DM: Yeah.

    CB: And then we're okay.

    DM: Yeah.

    CB: So that's how I'm thinking we should look. All the same empirical methods and AM1 they don't do well on on nitrogen pyramidalization. So any if we look at something which is known to be bad, I would expect it to be bad. But compared to the QM pyramidalization. Yeah, but we're looking for surrogate measures and I've been very, you know, conjugation centric, and I think Danny's point from quite a while ago is maybe it's just has to do with the density. And let's see if that can map.

    DM: Yeah. So it seems like one thing, one simple thing we ought to do, Jessica, if you flip back to 17. Is to read you those graphs where we look at the AM1 charge on the nitrogen instead of the bond order. You think you could do that?

    JM: Oh, yeah, I can do that.

    DM: Okay. Because that will be a quick way of telling us if we're going to end up back in exactly the same place. Which is to say. The charge on the nitrogen doesn't vary very much in CHEMBL or QC archive, and it only varies a lot when we design a data set.

    CB: Well, but this one graph here is really catches my attention. It's the top, right QCA, this is all our compounds. And this is saying how much difference with wiberg bond order we're getting over the whole set, right?. So so one of the things that I would love to see for QCA is how much variation in the pyramidalization, all this is about the pyramidalization of the nitrogen.

    DM: Yeah.

    CB: So do we have a readout, does your does your slide 19 show the spectrum of pyramidalization that we have for QCA?

    DM: Just as a disclaimer though, this is these three slides are only for aniline nitrogen in these datasets.

    CB: Yeah. Okay, but that's but that's what we're looking at. So this is the spectrum of the average improper angle in degrees on the x axis, right on the horizontal axis. So it varies by a whopping four degrees.

    DM: Not very much.

    CB: And even the compounds where it varies by a whopping four degrees, there's two, four, six, eight. Actually there's two, four, six higher than three degrees, and the vast proportion are within two degrees of planar.

    DM: And Jessica do you have the updated plot here, that has the conformers removed. It turns out that that's showing multiple conformers of some of the molecules with the same bond order because the bond is only calculated once, which makes it look a little weird, but it also makes it broader.

    CB: Okay, so but let me just take that graph as it is, and I'm going to interpret it two completely different ways. The first one going to interpret it is, I'm going to say these compounds from QCA they are representative of pharmaceutical space. And what we're seeing is that aniline nitrogen is actually just don't change their pyramidalization very much. Whoever thinks and Chris Bayly is the guy who thinks that nitrogen pyramidalization on aniline nitrogen changes much he's obviously wrong because look we're looking at a representation of pharmaceutical chemistry and the nitrogen just doesn't change its pyramidalization much. Another way one could look at the same data is to say, Well, if the nitrogen pyramidalization actually does change in pharmaceutical space, it's not being represented in our data set very well. And this means that any measure that we use to decide whether a model is important that's based test set won't make it look important.

    DM: Yeah.

    CB: So those are two ways of looking at it and and I think the overall research effort at some level has to decide maybe before we build models around the nitrogen pyramidalization of Aniline Nitrogen pyramidalization one has to decide if it's important, because if we're being objective, we would say maybe it's just not that important to pharmaceutical chemistry.

    DM: Yeah. So this is this is the updated version of the plot that removes the extra conformers.

    And so you only have now one conformer per molecule. So it's even less broad.

    CB: Okay. So this says this. This says that firstly pyramidalization of aniline nitrogen is unimportant. It goes between zero and two degrees, who cares, that's the first thing it says. The other thing that graph says is that while this nitrogen pyramidalization is not changing very much, the wiberg bond order is actually changing substantially. So if there's one thing this graph shows, it's that WBO does not have very much to do with nitrogen pyramidalization, which isn't happening anyway.

    DM: Yeah.

    CB: So I'm just saying I had I could tell you war stories of projects where that nitrogen pyramidalization of an aniline type nitrogen was important. So I have my war stories and they're all anecdotes.

    DM: Yeah, so Jessica a minute ago we talked about doing updated versions of those plots. I'm actually, I think in those interested in looking at all of our data for nitrogen and trivalent nitrogens, not just the anilines. Because, you know, as Chris is saying, our analyzed data set just is pretty flat.

    JM: Yeah. I can show that.

    DM: Oh, you already have that. Wow.

    JM: So this is. Well this is all the impropers that have like a central nitrogen.

    DM: Nice.

    JM: Plot on the left there's conformers and then the right, the green is no conformers

    DM: So I'd really be very interested in looking at what happens if we pull the AM1 charge for all of these.

    CB: And I'd be interested in seeing just what are those? What is the chemistry of those trivalent nitrogens that have large improper angles. Wait a minute, are you capturing aliphatic nitrogens in here? Jessica. Looks like they would include it looks like this would look at all Nitrogens, which we would expect to be pyramidal. Right.

    DM: And then you have some plots on the subsequent ones that are slightly are different chemistry, right.

    JM: Yeah.

    CB: So what, what I think we need to see Jessica, I'm just so I'm looking at this graph and my chemistry intuition wants to say, oh, this initially surprising pyramidalized nitrogen, that's actually the usual nitrogen for a aliphatic nitrogen, that is a nitrogen which is not conjugated. Yeah, so if in your Smart's pattern you had one of those substituents, maybe colon one, not the wild card of star, but via a #6X3, so it's got to be some kind of trivalent carbon. So the nitrogen has to have at least one trivalent carbon attached, which suggests that it's conjugated. That would be the graph that I would like, I think that would change the graph substantially. Or at least I want to believe it would.

    DM: Do you want to flip to any other graphs so we can see what you have there? Really? Okay. That's all right. Let's keep going.

  • JM:

    CB: Oh, that's it. Wait.

    DM: Go back.

    CB: Is that #6X3 for the second carbon.

    DM: Yeah.

    CB: Okay, so that says there is a it is a conjugated nitrogen. We still have substantial populations up at an improper angle of 30 degrees.

    DM: Yeah. It sounds like Chris might want to stare at one of those mouse over plots of this where you can mouse over the molecules. Would that be right, Chris? Yeah, he would. Oh, she has one already.

    JM: You have it? I have it for the two.

    DM: We just love that. Yeah. Okay.

    CB: So let's look at one of those. So let's look at some chemistry between 20 and 30 degrees in proper angle. So that's an that is a diazyl compound. So let's look at another one, which isn't diazyl. So in this structure, all the nitrogen are being shown, right?

    DM: Yeah.
    JM:

    CB: So. 12. We'd have a strong steric interaction between the methyl and the carbonyl and it would want to go out of plane. That's I'm guessing that, possibly. So there's a guanidinium substructure in the ring. Do you see the carbon C4 has three nitrogen attached. See that and that structure. Over on the right hand side.

    DM: Yep.

    CB: Those would I would I couldn't imagine those things being anything other than planar. So fighting over this structure at an improper angle of between 20 and 30 kind of is going to restrict it to either N12 or N13.

    And so that so I'm I'm I'm kind of encouraged by the fact that we've got some non-planar nitrogen here, but just as we had with the the biphenyls we had confounding steric interactions that kind of threw off what we would originally have, how we thought it might go, just in terms of the electronics and likewise concerned that, oh, those are hydrogens, H27, H 26.

    DM: Yeah.

    CB: Oh, then that's real that that would be hysterically forced anywhere.

    DM: Can you also go down to the one at the very bottom, about 25 degrees. Yeah.

    CB: Yeah. That's a nitroso compound. So in my mind what you're hearing me so I apologize for afflicting you with my stream of consciousness here. But what I'm saying there is that the diazyl compounds are weird, Nitroso compounds are weird. We don't need to have a good force field for them. But, my chemical intuition on those isn't very good. And they’re all not that commonly found, so we're seeing this non-planarity so this is an amide in this structure. So amide we've normally considered to have a pretty planar nitrogen and here we've got a pretty non planar one. I'm guessing because of the consequence of having the attached oxygen to the nitrogen, the hydroxy.

    DM: Jessica can you go up one on the graph. Yeah, that one. Okay. That's one of the big ones again. I was just looking for. Smallish one at a low improper angle in a higher improper angle. But the X is near 20.

    CB: But the one you were just on was good because when I said, I think I have war stories, the war stories involved groups that looked kind of like 19. They're in other words, it's very much an aniline nitrogen and it's attached to a ring that's substituted. So you'd expect a lot of confounding factors. But what's happening is that we've got the personality of the nitrogen which is being affected, which doesn't really perfectly line up with what you'd expect from a confounding factor of hysteric interaction. So what we're looking at is a six membered ring with 19. See that? Or is it a six membered ring? Or is that actually a ring over there? Maybe it's not a ring.

    DM: So are you asking if N19 is in a ring.

    CB: Yes, I'm asking that question.

    DM: It looks like it is. I could just barely see black in between those two things.

    CB: Okay. So my war story would talk about something which had a five membered ring with the Nitrogen, with the end, and imagine 19 in a five membered saturated carbon ring and the phenyl coming off it. That ended up my the inability of the force field that I was using at the time, which was an amber like force field to make that nitrogen planar. Seriously screwed up my modeling, my predictivity and my modeling all the QM results had that nitrogen pyramidalized and all the Molecular mechanics had it planar. So that was one war story, one instance of being screwed up by this. Which is why I brought it up to the Open Force Field effort a number of years ago.

    DM: So I'm Jessica. Afterwards, you want to send these plots along in the channel. So in case you want to stare some more ourselves.

    CB: So does this graph convince us that pharmaceutically relevant fragments have enough nonlinearity in their aniline nitrogens that we should care about them?

    DM: Well, this is this is not necessarily aniline nitrogens in this graph, but this convinced me that we have enough nonlinearity that we should care about we should care about.

    CB: I'll take it.

    CB: So maybe what we're doing is in the case that we've been looking at above, where I'm saying we were seeing graphs which said aniline nitrogen is just aren't very non planar maybe. Me looking at that structure and interpreting it's an aniline type nitrogen, which is the non planar nitrogen, I could be wrong. So maybe we need to, Jessica, I think this is a very important graph here, but can we begin to focus it on aniline type nitrogens there.

    DM: What I think we maybe should. Look more broadly for now. So the thing I'm interested in following up on. Well, so Jessica is also separately working on should we be just introducing more impropers in the force field? So for example, right now this graph is one improper in the force field. And it covers things that are very flat and things that are very not flat. And that's probably bad, but I'm also be interested in looking at what if we pull the AM1 charge on the nitrogen in all of these cases. And does that, you know, like if you overlaid that on this as a color coding or something.

    CB: And if we pulled the AM1 charge, what if we pulled the wiberg bond order as well?

    DM: For which bond?

    CB: Well, start for all of them and then at least we have the data basically saying if we're going to harvest the data, yeah, harvest it once and then work with that.

    DM: So, Jessica, does that make sense? We can just pull all of that data and then play around with it.

    JM: Yeah, that makes sense.

    DM: Yeah. We can maybe even share share some of it with Chris and he can play around with it too, Danny any other ideas for stuff we should be pulling at the same time.

    DC: Uh, yes. Not at the same time, but yeah. If you want to go different to that or be on that in terms of expense, then yeah, what you should be able to do is pull down optimized geometry from the QCarchive. Do a single point psi4 calculation to give the MBIS charges, dipoles quadruples, even, that might be of interest. I can say that's possible, but I can put you in touch with Josh to to show you how to do that, if you like.

    DM: And can those be obtained from. So are those only available for new calculations on archive, or do we already have what's needed for calculating?

    DC: Um. So the. Well, we need the wave function, so. So we would have to do a one off new calculation anyways. So you would have to use an updated version of psi4.

    DM:Okay. Yeah.

    DC: Yeah.

    DM: Okay. So that's probably, it's something we may want to do, but we won't do it at the same time.

    DC: Yeah. And we're looking to put a data set on QCA at some point with wave function information. To try and feed into espaloma. So if you're interested in giving us a dataset with lots of trivalent nitrogen, then we could upload. Then we would have wave function and we could process it however we liked it.

    DM: Okay. How many do you want?

    DC: Don't know. I think we're only we're looking at 800 molecules for the Espaloma fitting.

    DM: Okay.

    DC: Um, so on that order again would be good.

    DM: Jessica do you know, how many are on this plot.

    JM: Yes. So around 2000.

    DM: And then you subsetted them. Yeah, maybe because one thing, it seems like what would be interesting would be even if we just had the data in the right hand plot. You could draw evenly from each QM improper angle bin. So you down sample the stuff in the 0 to 5 degree bin. And probably sample roughly evenly across the other bins. Does that make sense? What I'm saying, Danny?

    DC: Yeah, that makes sense. Yeah. We take a subset of that 2000. You can expect to give me space. Yeah.

    DM: How would it be if Jessica just fired the data? That plot from that plot to. If the molecules and the and proper angle.

    DC: Yep. That's fine. Yeah, it wouldn't happen immediately because we're still working on the QCA scheme. But yeah, it's important we do that.

    DM: Jessica, does that sound doable? Just send the data from the plot. So molecule and QM improper angle.

    JM: Yeah, I think.

    DM: Yeah, I know. We're sort of overtime, so we should probably wrap up, but any last things we need to.

    DC: So just to check. So each point on that plot maps to an optimized geometry on QCA.

    JM: Yeah, that's correct.

    DC: Okay. Yeah. So as long as we know how to get that, that's all.

    CB: Well as we do more work doing more QM calculations to find derived properties from a really good electronic wave function that can tell us that this nitrogen is pyramidalized, I think we still also want to at the same time keep monitoring the corresponding or the closest corresponding semi empirical, so some kind of number that we would use for mapping because the espaloma direction doesn't need to do any mapping because it's going to learn it through AI. But if we're still going to do any mapping, we just need to follow as we get a heartbeat from a advanced QM calculation that we can override for high throughput, we need to see if we can map it from a simpler, simpler property.

    DM: Yeah, that's right. So either we need something we get out of AM1 or we need to come up with a standard for it. Simon, any last thoughts from you before we wrap?

    SB: No. Nothing else from me.

    DC: Oh. Do you have a Slack channel for this? Just in case I need to point Josh to.

    DM: We do not have a WBO channel, do we? Well, this is improper. So there's a torsions improper channel to use that one, I think. Yeah.





 Action items

 Decisions