2021-10-29 Meeting notes

 Date

Oct 29, 2021

 Participants

  • @Pavan Behara

  • @David Mobley

  • @Simon Boothroyd

  • @Jessica Maat

 Goals

  •  

 Discussion topics

Presenter

Notes

Presenter

Notes

@Pavan Behara

  • Slides - https://docs.google.com/presentation/d/1P8wUbRJwbCnHc2W0u21nXB-VX0xyAVggR3rA0oP4xxE/edit#slide=id.gfb9a0d918d_0_14

  • DM: We can kind of recap what we talked about yesterday for him, not directly related to every bond, or is it just sort of a brain dump of always thinking about how many torsions got through this bond into the open force fields channel a few minutes ago for Megan to take a look at. I think that she probably can figure out how to do it by just looking at the some of the examples that I pointed her to. But if you have any other code or examples that immediately strikes you as something that would really help her if she could see such and such an example, please send along. But yeah, I think it should be fairly straightforward at least to just look at this. It'll be interesting to see, I was scrolling through the force field and I am not sure how often this is going to come up. Because I think in any case, where the two atoms neighbouring the central bond have explicit valence, I don't think we should see this coming up. At least when they don't vary formal charge. Because I think it's only when you group together things of different valence you'll see the number of torsions passing through a central bond change. There are some that are like that, though, but they may only be small, minor pieces of chemistry. Or there may be things that I'm missing. You think it will come up there?
    PB: Yeah, there are lot of parameters with mixed valence and no explicit valence.
    DM: Is that what you think? Yeah. It could be. Anyway, I'm not sure how often it'll happen. I found, like, a couple of cases where it looked like some of Nitrogen, phosphorus, bonds and so on didn't have explicit valence, but a lot of them do. Well, anything else from came up since yesterday we should be talking about.
    PB: Turning off attenuate option while fitting shows an improved fit (one on the right). This is the term that weighs down repulsive interactions. From force balance manual "Normalize interaction energies using 1/(denom∗∗2 + reference∗∗2) only for repulsive
    interactions greater than denom"


    DM: What do you mean by low weight for repulsive interactions?
    PB: The steep parts of a potential are weighted down.
    DM: Okay. So, this is the thing that reduces the importance of things above some height. Okay, when you said repulsive interactions, I thought steric repulsions or something. So strong unfavourable interactions, maybe? Yeah, for strong unfavorable parts of the potential because that was the part that drew me was that isn't necessarily an interaction, I guess. Maybe it is not yet, so it goes up.
    DM: Yeah. So you probably better backtrack slightly then.
    SB: Can I just clarify what I'm seeing? Sorry. It's a bit late here, so first, if you could just point to the colors, because I just want to make sure I'm looking at the right things. So, what's the composition of the test set.

    DM: So if I recall correctly, what's going on here is he had this fairly large hand built biphenyl set, which had relatively lots of molecules that are fairly low WBO and relatively few molecules that have high WBO. And because that was like kind of making the fitting somewhat difficult, he broke that up into like a training set that has sort of uniform coverage of molecules at different WBOs. And then probably a test set, which is everything left. So his training set, as you say, is relatively uniform in terms of WBOs and what's left is test set. So it's more biased towards low wbo.

    SB: One thing, maybe it's just because I can't distinguish the points particularly well. But, I wonder why the red points are assigned such a high barrier.

    I guess because the training set, all of the training instances all looked like they had.

    PB: General parameter tries to strike a balance between the high barrier and low barrier, that’s why it’s kind of in the middle.
    SB: Okay, that makes sense because in the training set, you've got kind of like a just trying to fit the best of a bad bunch. So, just out of curiosity, when you apply our current forcefield that we have to your training set what does that look like? Does it look like all points below WBO of like 1.2 have one torsion parameter and like all points between that and 1.3 have like another that have like another. Is k1_general what they would be assigned, they all just get one torsion parameter?
    PB: Yeah, the range is something like 1.5 to 6.

    DM: So it's yeah, it's t43 and the K1 is 1.16. You know, so if you multiply that by four, it's about five, six. Because your average barrier height is higher in this set.

    SB: Yeah, that makes sense. I mean, this definitely looks like a heartbeat. And my concern is we're kind of seeing these pictures before. I think I think this is a positive, like cautious good first step. But I think until we test on like real data, that's like the kicker. Because if we just find the same situation of a test that is actually all the way down at 1.0. It doesn't really matter that we're getting performance on our contrived test of 1.1. That's better, like. We could probably just not like one extra torsion parameter and be done with it. If that was the case.
    PB: Yeah, I will run a benchmark on the biphenyl matches in industry benchmark set. Moving to the next one on a fit with Trevor’s new parameters

    DM: Yeah, I agree with that. This is overall, though. So have you done anything where you look at like specific subsets. Yeah, this doesn't make a dramatic improvement in overall quality. But I think one thing that's increasingly going to happen is as the force field gets better, it's going to be harder to do any single thing that makes a dramatic improvement in overall quality. But if we do something that that improves the thing where we're working on without really breaking anything else, that may still be worth doing.

    SB: Yeah, I think so. I think that's why it's so difficult with the current benchmarking infrastructure. In some ways, it's going to be hard to make any progress on this until hopefully one of the software scientists frees up some time to kind of rebuild it to more granular metrics out of this because yeah, I think we just don't see the clear picture anymore. Although one interesting thing that I haven't looked at and I feel like David you or your group has done this before, like do we have a good sense of like what's out in the tables? Have we checked if there's any, like low hanging fruit out there? Like any specific chemistries that we could maybe just, like, knock off?

    DM: It is a good thing to keep going back to. We can look at easily what's overrepresented in the tails. What parameters are overrepresented out there?
    PB: Yeah, I shared a set with Hyesu before Sage training.
    DM: Do you know if we've done that again since Sage? You should probably check that again.
    PB: Okay.
    DM: Yeah, maybe resurrecting that since she's gone off now. The only thing we were going to do something that kind of relates to your question about what is low hanging fruit is Megan in my group just taking a look at, I call it something like the multiplicity of torsion. So if you look at central bonds through which some number of torsions pass, you can count how many torsions pass through each central bond. And in any case, where you see different the same torsional parameter being used across molecules with different numbers of torsions passing through the same bond you may have a problem. And part of why you may have a problem is because there used to be Amber FF actually had explicitly in them a multiplicity, like nine or six or four or something and depending on how many torsions you're dividing by that number when you combine your torsions. And if you mix things with different multiplicity as they add in different or they combine differently than you are sort of expecting them to.

    SB: Really?

    DM: So it's just a quick thing that. There may not be many cases where we mix things with different multiplicity, but in any case where we do, we should look at that chemistry and figure out how is it is it behaving as intended?

    SB: Is this so strange? Trying to think back to the forcefield. We've got this idea of X Factor somewhere in the forcefield. It's not the kind of scaling factor that you'd think.

    DM: Yeah, that's why that was there. The AMBER Force used that explicitly to divide this out. We dropped it, we had it before because it made it easier to port the Amber FFs in because then you didn't have to manually divide separately. So then we haven't continued using it because there's no reason to fit. You would fit the idea of F and you don't necessarily want to track the idea of F. So it's really sort of a vestigial organ at this point. And it's also probably not the right way to deal with the issue anyway. So but that's why it's there is because Amber had it.

    SB: Okay, that's cool. I was just trying to think I think I think that was one of the bits of the code that I pointed to, Jeff, because I think we include it in the force field and do nothing with it. That's right. Okay, cool. That's good.

    DM: Yeah, so is used it when we're porting, when we're creating Smirnoff99Frosst used it fairly extensively to make sure that we're copying parameters correctly. One other issue relating to this is I think deep buried in my notes somewhere. I have some notes from Chris on from 2016 about how some of these issues relating to how many torsions passed through a bond. We may not be dealing with it, right. Right. So right now, we just add all of the different portions passing through a bond and there are some arguments that I should probably dig up again where maybe it would be better to like average them or something. But this really only becomes an issue, I think, if there are cases where we're looking at different multiplicity.

    SB: Yeah, that makes sense. That's really interesting. I hadn't really thought about that before, but thanks for mentioning that.

    DM: Yeah. There's a bunch of complicated, tricky things here because, like. It's really only one bond you are rotating, right? But then you have a bunch of torsions passing through it. And not all of the forces are necessarily fitted for exactly that context. So there's a bunch of tricky things in how they combine.

    SB: Yeah, that makes sense. I think maybe I had slightly thought of this before when we did, wbo. I think. I think the question gets a bit tricky, even more so if you then like Wbo and some of the like torsions passing through the bond are interpolated and so on. And then are you going to have like things cancelling each other out or like going a different directions and, lots of fun stuff there.

    DM: Yeah. Anyway, so I guess we'll see. And eventually I'll revisit those notes, but especially if this starts turning up things with multiple multiplicity then ought to get to these right away. Sorry to go off on a tangent there, but it seemed to relate to some question about low hanging fruit.
    DM: Yeah. So I thought we were mostly, like, explicitly avoiding fitting our torsions to in-ring torsions. So do you know what happened here? PB: We do train for ring puckering.


    DM: Mm. That's a good point. So we're not we're not avoiding fitting to them, we are explicitly avoiding to drive them I guess. t75 concerns me because you know, your second from the left example and your second from the right example. Are very different torsions than those ring ones. The one in this second from the right molecule doesn't seem terribly important. Uh. That is to say that the controversies that you separate by. Rotation around that bond are uninteresting, but the second from the right one is quite I'm sorry, that second from the left one is quite important. Yeah. Yeah. Simon, any thoughts?

    SB: Nothing. Nothing immediately springs to mind at the moment.

    DM: Yeah. I mean, I think I would probably want to look at. You know, some of these torsion drives and some of the data set quality stuff and, you know. Because as you said, these look some of these look like ones where you may need this torsion may be important for ring planarity. But then some of them look like ones where the tuition is going to be important for how this R-group is oriented. And those things don't chemically seem like they should be lumped together. I think we were talking about that yesterday and we didn't really, I think we think that just accidentally got dropped somehow.

    SB: So we think that the constant got dropped too low in there so it.

    DM: So I think this is a case we had looked into and kind of concluded, Oh yeah, we should be increasing the force constant on that. Then. And I think we'd planned on including it. And then we just forgot, probably.

    SB: Yeah, that definitely sounds. But definitely something I was trying to. I feel like I had a conversation about this with Josh at some point about whether to include it or not. I'm just trying to think back if I'm getting anything. Because there was a change in. There was a change in torsion parameter, this parameter between parsley 1.2 and one point. Two. The one, right? And wasn't the first constant and the first constant we said problematic closer to a value of like 2000. Maybe this was a different thing, though.

    DM: I think that was a different one, but I'm not sure. Yeah.

    SB: Yeah, it looks like the force constant was dropped explicitly to 1000. We're just trying to see what the value was. Well, let me see if I can find it.

    DM: Yes. So there is this strike team discussion around August 2020 where there is the resonance issue. And we hand-modified openforce fields issue 19, it looks like.

    SB: Oh yeah. I think I can remember if we got the constant or increased it looks like the what used to be B 27 is now B 28 used to have a force constant around 822. And then this PR bumped up to 1000. So it seems like it would be safe to bump it even higher to 2200. The reason it's probably. Yeah. Interesting there. It's not moved too much, to be honest. I wonder if the bond priors are pretty high.

    DM: Yeah, I think that's right. Like I think it can't get to several thousand from 1000.

    SB: Yeah. It's outrageous. I would have expected to move more than point six. Like, there's a different. There's a lot of, like, the two different in the motion.

    DM: Yeah. I wish we had the record of this somewhere of the 2200 that we that you have on this slide. You don't have to look for it right now, but I'm just. I know we like. Dealt with it somehow and like.
    PB: Yeah, here it is, https://openforcefield.atlassian.net/wiki/spaces/~868974935/pages/1759936519
    DM: When? Yeah. Oh, yeah. Yeah. Okay. To get that maybe get that in the queue for one of those tests. That's. Then we should make sure to run some dynamics with the results and like. Make sure nothing blows up.
    PB: General question about why we didn’t retrain the parameters after the hand modification of a16 in Sage?

    SB: I think the main reason is that it shouldn't really change anything. I think the history that's gone from the work there was that that parameter was allowed to get above 180 just by a small amount and presumably openmm clips them in certain places, but not others. And it was leading to like an explosion in one certain case. So I think basically it was just seen as like it's probably not going to really change things if we don't refit just because it really was just moving it back into a sensible range when it had probably been clipped into a sensible range by openmm anyway. I think if it was like a more like if it was a law to fix rather than just like clipping the value, then yeah, optimization probably would have been warranted. But yeah, I think it wouldn't change much here.

@Jessica Maat

  • DM: Then yesterday, Jessica, Pavan and I were talking through some of Jessica's latest data on WBOs and impropers and more or less we're still not seeing any kind of good signal or hope or light at the end of the tunnel. So we were thinking we're going to head off more in the direction of just working on conventional impropers. Is there more you wanted to say on that, Jessica?

    JM: I don't think so. I guess, like, what I'm working on now is, um just seeing all the matches to the current impropers on QCA. And then I'm benchmarking the QM versus MM improper angles.

    DM: Yeah. So we were thinking we would look through. One question being, are there cases where we have nitrogen that are planar that we're making tetrahedral or tetrahedral nitrogen that we're making planar and do we need to fix those? And then also, are there cases that are of intermediate clarity or more flexible that we need to address?

    SB: That sounds really interesting. Um. Just out of curiosity, what would you say to this kind of no signal in the noise? Was the issue that there was no clear light linear trend between light WBO and like bond or angle, but what's there maybe like some discreteness to it? Like even if it wasn't like a linear trend, could you see that? Like if you just like chunked based on like WBO could you see that like WBO of like 1 to 1.3 give you like non-flat. And then the next chunk will give you semi flat. Like could we get away with not doing interpolated but can we get away with like. Basically introducing a single parameter that picks its value based on which like region the WBO falls and maybe not WBO, but like some quantity that describes the electron density around the atom.

    DM: You want to show the data.
    JM:


    DM:So this is for that Aniline set.

    DM: Yeah. And we I mean, we aren't seeing very much variation in proper angle here either. So. Or nor, I guess. So. Maybe this is just telling us anilines are mostly flat.

    SB: That's kind of crazy. So this this was the optimization dataset that we did on the molecules that we kind of contrived to have all if there is like very strongly electronic groups on the right.

    JM: Ahh no, this is all matches to anilines on QC Archive Optimisation dataset.

    SB: Uh oh. Okay, that makes sense. So basically, it seems like even if we did see like a strong correlation, with WBOs but aniline is just planar quite a lot.

    DM: Yeah. I guess this is telling us like industry folks or the data we have in QCarchives doesn't do that much to modulate the linearity of aniline, even though in principle you could.

    SB: That make sense. It's so frustrating when you get these beautiful conceptual ideas and the data just doesn't support them. This could have been such a beautiful result ahh, okay, that makes that make sense. It's amazing how much value like pharma partners data adding out there. As long as we can be confident that it's somewhat representative of what they're interested in, I guess it does help just kind of informed early on.

    DM: It does bring up an important point, which is that. If you think about what might normally happen in academia, somebody might make that data set the first one and then, you know. Okay. Look how we can fit this with this type of force field isn't this great? We'll write a paper. And then it goes into the literature for X number of years and then eventually somebody tries to resurrect it and use it on real world data and finds out that it doesn't actually end up being that important. So we just shortcut that whole process into a couple of months or something. So I guess one take away from that Jessica is you really should while you still have it, make sure you collect these data sets like this but then also the stuff on the WBO, interpolated torsions on broader datasets and dump it into a chapter of what would become your dissertation. Because otherwise nobody will ever see it. You know, because it's one of those things that, like, didn't work, but it was a good idea. And, it's important to save it so that we can say, yeah, we try to do more. And the reason that they stayed on probably in the whole paper on making WBO interpolated proper torsions for. Anything else?

    SB: Maybe just. One flight like, possible hope for this. I remember I think there's some conversation of setting up with like a chat with Danny Cole or other metrics other than WBO, I'm not sure. I think this result is pretty kind of clear, but like, maybe it's just the fact that like it doesn't correlate. But maybe if you took a different metric of like electron density, you see stronger signal. The number of degrees is still quite small.

    DM: So we do have all that data. So. Yeah. Jessica, do you want to do a pull for a time to talk with Danny about. Yeah. The big question, I guess being. Something like electron density properties you can look at that can tell us things about. Geometry that we perhaps should capture better in the force.

    JM: Okay. Yeah, I can do that.

    SB: It's also a really stupid thing, but it's just something that I want to check myself on. If I had kind of blocked at this. Have you visualized some of the things to make sure that like they're actually flat. I was just thinking like, if I did this late at night, I'd have a chance of plotting radians and labeling degrees because it was radians and not degrees. You'd have a lot more like bend than the improper.

    DM: It'd be too much then, though, right? Because I think our max should go up to about, what, like 40 degrees? What's the largest flex you see in an improper Jessica?

    JM: Um. Yeah, I think it's, um around like 40.

    DM: Yeah, maybe check really quick, but that's so. Yeah. Checking some visualizations is probably a good thing. Okay. Anything else? All right.

     

 Action items

 Decisions