2022-10-25 Protein-ligand benchmarks meeting notes
Participants
Jenke Scheen
@Diego Nolasco (Deactivated)
@Richard Gowers
@John Chodera
Mary Pitman
@Mike Henry
@Irfan Alibay
Ben Ries
Levi Naden
@Iván Pulido
@John Chodera
Forrest York
@Jeffrey Wagner
Benjamin Ries
@David W.H. Swenson
@David Dotson
Yutong Zhao
Melissa Boby
Goals
RG : atom mapper landscape, current state, relationship to network planning
looking for opportunities to coordinate efforts, avoid duplication of effort
DD :
fah-alchemy
- current board statusone week out from Nov. 1 deadline
@David Dotson development effort now focused on
FahAlchemyAPIServer
,FahAlchemyClient
, andFahAlchemyComputeServer
Currently implementing task queue system;
FahAlchemyComputeService
instances pull from this for work to do, and use it as the basis for pushing results backtrying to do transactions as atomically as possible, idempotently where appropriate, though this is a challenge; will consider locking mechanisms later to support concurrent service use
Building out test suite for
fah-alchemy
; tests with Docker-based neo4j in CI now workingusing
openfe-benchmark
networks while we work towardprotein-ligand-benchmark
0.3.0
IA :
protein-ligand-benchmark
: blockers and prioritiesIP : Nonequilibrium Cycling Protocol (
perses
#1066) update:Still refactoring
RelativeFEPSetup
to accept single phases and optional receptor.
MH :
ProtocolSettings
taxonomy (gufe
#37) update:
Discussion topics
Item | Notes |
---|---|
RG : atom mapper landscape, current state, relationship to network planning |
|
DD : |
(Meeting ends here) |
Transcripts with no edits | F@H interface meeting - October 25 SCREEN SHARING: David started screen sharing - WATCH
@00:37 - John Chodera (he/him/his) Thank you.
@01:04 - David Dotson 1 second, everyone is getting attendance and we'll start is it we do the Adam Ethic discussion first? We are. Yeah, that's top of the list here. And we're happy to spend most of the time on this meeting on that, even if we use all of it too. Jeff, would you be willing to take notice?
@01:43 - Jeffrey Wagner Yeah, sure.
@01:45 - David Dotson Okay. Also tag team with you on that. And I think Diego got that bathroom running to you, so should help. Okay. Thank you everyone for joining. Welcome to the If. Never been to One of these calls before. Welcome to the prin ligand alchemical free energy calculations via folding at home call. Some helpful. So this is a project that open FF. It's a joint project between open FF, the cadera lab ASAP, and open fe. And so we've got a lot of folks on here. We are coordinating A lot of activities. So before we jump into the agenda, I just want to ask if there are any items folks want to add to this or modify. Okay. If not, we'll go ahead and jump into the first item. Richard, do you want to start us off on the adam mapper landscape discussion?
@02:43 - Richard Gowers Yeah, sure. So I think we've got Some slides ready. Maybe I can share my screen, and maybe that's a good way to go forward. SCREEN SHARING: Richard started screen sharing - WATCH Hopefully you all can see that you have a cure code. Now, talking about adds a mapping, I think maybe to start off with, we can start with a few definitions because I think a lot of the times these things are all sort of jumbled in together. So we think the planning of a relative calculation network or a campaign of simulations, you can split that into three different distinct roles. You have an atom mapper which takes two small molecules or two molecules and provides a route sort of a mapping of correspondence of atoms in those molecules between each other so that's some sort of box or one component piece separately so that you can have essentially function which scores those things. This is separate to the process. Of defining the route between two molecules. The term score can mean lots of things. It could mean the difficulty between the difficulty of transforming between two molecules the anticipated era of transforming between two molecules online given route or something really simple just like tana motors and similarity if you wanted. So score is actually not very well defined despite how difficult it is. And then finally you have a network planner box which takes the previous two components and proposed a network across a given set of small molecules or chemical state in general.
@04:46 - John Chodera (he/him/his) So this is that something that is entirely divorced or could be used as part of the score? There's like this missing idea. It's. The actual spatial overlap within the binding site that might be a missing component of you here.
@05:07 - Richard Gowers Where do you envision it fitting in overlap between, say, two small molecules?
@05:14 - John Chodera (he/him/his) You could provide an Atomatic, for example, but it might be impossible to shove the atom in the molecules in a way that doesn't clash with the receptor. Given the atom mapping, for example, is it presuppose that you've already aligned the molecules in space or is it pre exposed or that you take the atom mapping and then somehow generate a valid positioning of the tool again?
@05:38 - Richard Gowers Yeah, sure, that's a good question. So I think we're considering all of the small molecules we sort of read docked so the input is sort of predicted to ligand with sort of a 3D confirmation defined which is acceptable to the protein and is the requirement that all of the events have to share the same alignment of.
@06:00 - John Chodera (he/him/his) Or is it each individual pair of ligands just have some atomic correspondence that are allowable for the automatic no, I think we could have a core doesn't have to be constant across this node. Thank you.
@06:18 - Richard Gowers Cool. So as part of this houthi ground unified free energy package that Opense is making we've defined some abstract based classes for how these components should look. So just quickly, these aren't very complex but at the mapping has two components so that either end of the mapping these could be either RDK molecules, openss molecules or Oemoles are sort of transferring between those three are all equipment and in general you have this sort of mapping of int main thing that you get out of this mapping this could. Be as simple as a dictionary int. And this will be sort of the index of an atom on molecule A to the index of an atom on molecule B. And that's really all this abstract based class needs to provide. It also provides sort of the unique indices on sort of more of a convenience. And then the second abstract base class, which again, is quite simple, is how should a map behave? And this is basically you pass the two components, so you pass the two small molecules and it gives you an iterator. So an iterable, it could propose one or more mappings of this app mapping plus I just described. So this is how these two components were proposing they should behave. Got links in these slides and I imagine they'll be distributed. You don't have to keep up. So in terms of. Concrete implementations. What we've done in the open SD package, which is sort of our application layer is we've got we've wrapped up low map, we've wrapped up per se, and we're working on a new sort of atom. We've got three masters in this sort of API already, so through these matters already exist, but they sort of wrapped up versions of existing things. And then there's this new added method then on my team and I are creating, which is based on the ID kit MCs subaru team, but it calls it a bit differently to load map. So this is a different entry point to the same algorithm, which gives you slightly better mappings, slightly faster, because we can do a few things at the same time in the MCs algorithm. US in progress here's a pull request here if you're interested in how that works. If you want to read about sort of the Rd Kit MCs module created a blog post on that recently which is pretty good here's an example of why we're doing this here's an example of something loan app will get terribly wrong so here we've got too few strings if you put that into load a map it will get confused and will map it get these views ring the wrong way around layout all currently this is a sort of pathological example of what lineups sometimes do this is because when lineup does its MCF. It doesn't consider the 3D space alignment so it will do the MCs in its density ring but it will put the fuse ring on the wrong side so that when you try and align it with 3D it doesn't align at all so if you consider your 3D corners and do your MCs. You go there on the right but not usually pretty good but this is just one pathological example to show what happens if you don't consider. Three D and DMs. I guess, as I said before, scoring is a separate task posing the mapping. This is sort of another way of saying that a map doesn't get to market homework. Essentially, this is because you might have very different opinions on what constitutes a good or bad mapping, depending on, for example, what you're planning to do with that mapping. So sort of what method you're going to apply to it. Or equally, a scorer could also just forbid certain mappings if your particular method or your particular Nd engine doesn't allow a certain transformation, such as sort of can you have an element swap? Can you sort of change the map to mapping? Yes or no? You can use a scoreback. I don't have like a direct delta sort of response to seeing a mapping that it doesn't like.
@10:52 - John Chodera (he/him/his) Is high score good or low score, but good?
@10:55 - Richard Gowers We're going for low scores, we're doing golf scoring. So low score is good. And it can be thought of as like, Distance, then dissimilarity.
@11:08 - John Chodera (he/him/his) If you're normalizing it between zero and one, dissimilarity might be the optimal choice, which is completely unambiguous then, right?
@11:17 - Richard Gowers That's low scores are good, right?
@11:20 - John Chodera (he/him/his) Yes. Normally it's good.
@11:25 - Richard Gowers I mean, it's easy to flip it. So yeah, ideally, we would like the scores to be normalized between zero and one because then you can obviously combine them easily. This is not trivially done sometimes. Here's an example of what using a score would look like. It's just a function that takes a mapping and it will output a flow. So just takes mapping outputs of flow, which is pretty trivial question, actually.
@11:57 - Mary Pitman If I could just interject about the score. Will there be some way to have self consistency between I don't know, like your project is like if you guys are just gearing towards relative mining for energy calculations, but towards the type of simulation that the person will run. So are you expecting the user user to self define what type of scoring algorithm they want or will you have some way of handling what they're planning to do downstream? Does that kind of changes how you score the mapping?
@12:34 - Richard Gowers Yeah, I think this is why the score should be separate from the map. So yeah, I think people will have to define how they want to score things. And this will be according to what they're intending to do with it? Yes. I think perhaps a given method should come with a recommendation of a scorer to use, like if this method sort of should be scored this way or if you're kind of using this method, here is the score, you should use that. It is down to a person to find sort of a good mapping or a bad mapping. And I don't think it's going to be universal across everything either. Cool so in terms of work that's been done, all of the existing scoring metrics from Lowmap have been ported into this new API I. What people often call the loan up score is actually a combination of nine different functions which we have exposed individually. And then the sort of the loan out score is then just the product of these nine different functions. These are all available in this loan up scores of module. And then there is also a scoring metric in Percys that we've wrapped up as well, which again is in this purpose score a bit. This uses completely different dilemma, quite interesting to look at. So there's lots of ways to find good or bad. We've got a few serving examples. Cool. Cool. So in general, if you look at some of the weightings on the scoring functions and they're quite basic, sort of exponential functions, they're quite basic and the weights for them are just not point one. Usually they're quite sort of not highly tuned or optimized. I think this is probably because there's not a great deal of data to sort of do some sort of regression on. So I think I did this in one of the first following home calls. It would be nice if following at home was creating sort of a log of calculations that could be done and something like a smirks pattern. And then the sort of the difficulty of the trouble that a given simulation had were obviously defining how well a simulation went is obviously still an open question sort of what is rate of convergence and how do you define these things? But it would be nice if building a home and sort of generating that database of transformations and sort of perceived difficulties that you could sort of build better scorers, right? That's why it's. Quite hard to build these scoring functions is because it's hard to tell ahead of time if something is going to work. Whereas if you had sort of a log of thousands of calculations that have been done, you might find a chance of pretty good regression there, obviously.
@15:17 - John Chodera (he/him/his) Right. And so may help us set a good standard for how we would want to export this as a JSON or keep track of this for all the calculations run on folding home.
@15:26 - Richard Gowers Yes, I know that holding home is probably not planning on keeping all the data ever creates because that might have a very high value, low speed, the kind of data that we should retain for everything. Right. This is sort of my ask as well. If we can store this, then that would be incredibly valuable.
@15:44 - David Dotson I think that would be a function of how many tasks you execute for a given transfer information. Right. So if you're. I know I'm talking about the data model here, and everybody's aware, but I'm talking to Richard here is that let's say you did a bunch of extensions on it, given the transformation. You created a protocol tag, and then you extended extended, extended it, the number of times you extended it, and the combination of Delta G values. Like, you could plot what is Delta G given using more and more of the data, that can get you a rate of convergence.
@16:30 - Richard Gowers Right. And you could do that across all transformations in a network, and then that would tell you something about how the quality of your Adam for each of those transformations, that would give you an estimate of the computational difficulty. But score could be orthogonal to that, and score could be talking about eventual accuracy, which is sort of different to computational cost.
@16:50 - David Dotson Fair enough.
@16:50 - Richard Gowers Okay.
@16:51 - David Dotson This goes back to the sort of things compared to experiment at that point.
@16:59 - Richard Gowers Yeah. Or just the variability and the results of the confidence you get after something has converged, something could converge to a large error bar. It could converge to sort of take result. And so this is being well defined.
@17:12 - David Dotson The good news is that the model we're pursuing already, I think could enable all of that analysis. So I think we're already we don't need to there's nothing new that I will have to do.
@17:20 - Jenke Scheen That's fine. So I really like this idea, obviously, but do you think it would be valuable or even possible to add more information on the free energy algorithm that's used or force fields being used?
@17:39 - Richard Gowers Yeah, I mean, you can keep as much metadata as you want. This is sort of my bare minimum, but obviously more columns are going to be better, I think maybe some, yes, some info on the method being used to be handy. This folding home, I think, is only going to have two methods lying around.
@17:54 - John Chodera (he/him/his) So that's either going to be A or yeah, this method, this would be specific. Protocol. One other thing is the target, like depending on if you're doing this in a solvent environment versus protein. Depending on which protein maybe that pass rate detect quality item accident by running in in a cheap environment like yeah, that's a good question. Our work in the past suggested, at least for our alchemical protocols or alchemical functional forms, that the solvent is often highly predictive of the difficulty in protein. But that could just be a highly unoptimized functional form.
@18:43 - Richard Gowers I think some thought into sort of what information we store because we're going to be throwing out a lot of data. We want to keep the right data.
@18:51 - John Chodera (he/him/his) It's a whole meeting action like a young kid setting up a wiki. Or something like that. For what? Information we like to keep for this either perfectly or should we just open an issue in Goofy or it's probably a fair outcome logging thing that we need to sort of figure out what we want to do.
@19:15 - David Dotson That would be the appropriate place for it.
@19:24 - Richard Gowers Yeah. If Yankees prevent from that, that would be great. So he's the right guy for that.
@19:29 - Jenke Scheen Yeah. I didn't hear all of it, but I'll bring David about it. Thank you. Cool.
@19:37 - Richard Gowers Moving on to this idea of network planning. So now you've got sort of how to plot routes and how to evaluate a given route. Network planning is a different step to that. Again, we've got some sort of serving example of how network planning will look when you have these components. Maybe interesting example here is how to do. Graph, which takes sort of an interval of small molecules in a table of different mappers because you could have more than one mapper sort of at play here and then you kind of give it a single score, which is sort of your racing function and you can create a minimal and graph and sort of that many lines of code. That's the power of having these reusable components. I can now sort of plot a minimal standing graph with sort of different scores of different matters, different kind of chemistry toolkits. I try Oecvs body kill, but at the same time stuff like that. And only for the 20 lines of code, obviously being able to add redundancy as well so that every node is ideally within multiple cycles for redundancy.
@20:53 - John Chodera (he/him/his) Or do you envision that being a whole new record that's are networks?
@20:58 - Richard Gowers No, I mean, this is something we're going to be. On next year that she sort of putting in the cycles that are required, the minimal signing trees, just like a toy example really to us that would be future work that we're doing. We currently don't have a strong API for how to store that network planner as it's in flight as well because if you want to sort of extend the network, you need to sort of serialize and deserialize that network plan. So that's again something we'll be working on this year, adding Cycles into something which is obviously required and would be part of a Network planner. I think a network planner is sort of trying to achieve a desired topology is one way of defining it where minimal standing tree topology radial just like a star map, radioactive as well. But obviously Cycles is another sort of designed topology. So sort of guiding it towards a given topology. So sort of this is what Openfield is currently working on the next year. So this sort of data mapping tool here we go, John cycle closure, optimized networks is something we're going to be working the next. Some of that might just be taking the Network Planner out of Loan Map because that's still buried in Loan Map. We'll be looking at Georgiaology approaches, which is something Ben quite passionate about. There's quite a few multistate for energy networks which are coming out, which sort of don't follow of network as I've described it today. So I think there's ladybugs out of Indiana Jones and Sex got a late book algorithm which doesn't really follow the rules of this Network Planner. The whole thing has to look at how that works and how mapping will work in a thing like that. And we're also working on sort of how to serialize and decrease things like planner objects for the atom mapping self.
@22:51 - John Chodera (he/him/his) What kind of are you guys thinking of doing to test its growth bus? And what kind of checks are you going to add? To ensure that mappings are actually valid.
@23:07 - Richard Gowers Yes, checking and atom is valid is more of a problem of the score, but we're going to be sort of benchmarking at the as we are developing them. Yeah, it's quite easy to recognize when an atom mapping is wrong, as I sort of showed with that sort of flickering example. So you often know when you've gone wrong quite early on.
@23:32 - John Chodera (he/him/his) So each code also can check whether it has a valid mapping. Sorry. There's Adam. Mapping Emitters. Right. And those might be paired with a particular code that is capable of those transformations. Is that the idea? Well, one very cheap and sturdy thing to do is to just run the basket transformation, whether it can or not. Just measure the bar air keeping that came alone, if that. Does not converge, you have no chance of converging solvent, period. You don't even have to run on locally. Can you guys do that confirmation right now? Yes. Which part of this validates that the actual protocol is able to execute the given at a mapping? Is there some checker that's supposed to happen or is the protocol itself supposed to reject at that point? Is it the value?
@24:40 - Richard Gowers Yeah, I think the protocol will reject things it can't do. But I guess the protocol could also supply method that you have to sort of mix into the super. So the protocol could supply a scorer, which you have to mix into the super different scoring functions.
@24:56 - John Chodera (he/him/his) The network planners take multiple scores.
@24:58 - Richard Gowers No, it's just a single score. So there will be some sort of thermal product of different many scores, I think, made to be sort of composable because it just returns low.
@25:08 - John Chodera (he/him/his) So you have like a hybrid composable score, which is product to other scores and with the idea that they would be multiplicated.
@25:17 - Richard Gowers Yes, yeah, I think that would work because then obviously you can have a single score reject. So a single score can have a visa for multiplicatives, which is nice.
@25:28 - John Chodera (he/him/his) You'd have to switch the values. That right, because if you want zero if you want a one score to be can't do this transformation, that would just be the identity multiplication. If you have a zero B, I can't do this transformation, then it would zero out anything that it can't do.
@25:44 - Richard Gowers So you want a dissimilarity sorry, similarity measure. Yeah. I guess you could just make one product infinite if you wanted. That usually stops things in its tracks.
@25:57 - Benjamin Ries I have a question. Regarding the vacuum simulations, if I might interrupt, because I just wonder, do you have a feeling or an idea when it might fail? Because usually those are super easy to convert, right?
@26:16 - John Chodera (he/him/his) Opening, closing. If you do a bar breaking the transformation between four, unless you have core hopping protocols already, they're not going to convert. Yeah, so it's scaffold hopping cases in theory. Or the pathological examples that Richard shared. Right? Like, if you end up with one of those pathological examples creeping through, then you want it to fail. If your core becomes disconnected for whatever reason, you just mapping molecules that fall around in space, you won't convert the part.
@26:51 - Benjamin Ries But in our hypertropology approach we're using, I'm not necessarily seeing whether it's a real ring breaking. So for the. That Richard showed. SCREEN SHARING: Richard started screen sharing - WATCH
@27:02 - John Chodera (he/him/his) Richard, if you go back to your slice, you had the low map versus the Rd map. For example, even in the valid mapping on the right side for some codes could do this, some codes can't do this, because they can't do the ring breaking, right? Unless I do the invalid for all codes.
@27:25 - Benjamin Ries What I meant is that what we have here in red. So there are two sets of coordinates, basically for both states. One individual, one so you don't really have a ring breaking. So that should actually always converge in this case, no matter what kind of variant you have.
@27:46 - John Chodera (he/him/his) No, these are both going to be reopening and closing cases. So sorry. There's maybe two things to clarify about here. There's the actual add an app in itself which will define the core and the dummy app. And then there's the actual code that takes the core and the W items and figures out which bombing of portions nonpunitarms to turn off at the end. State definitions. Right. And then the third problem of actually coming up with a sort of intermediate that let me go from one to the other. Just let's assume you have a valid step two and step three. I think find a validate out of mapping in the absence of the actual chemical protocol that you have may not actually be a qualifying problem because certain protocols will actually be able to do things like opening and closing. Certain protocols will not be able to. I think that's important to quantify. For some packages, some automatics are going to be allowed and they'll be disallowed by other packages. People are saying I can't believe that this is just a question.
@29:16 - Irfan Alibay Yes, I was just saying which kind of mentioned this we do anticipate that engines should have we can have these checks higher up but when you do enter an engine or we have a protocol is defined in the Goofy system we should have our own validation checks to make sure that things don't explode which could be, for example, running gas based or wherever else.
@29:37 - John Chodera (he/him/his) How is that information going to propagate up back up to rewiring the network to complete the sets of transformations? Like if you kill an edge right at this point you've disconnected completely part of the network. Be just one again, multiple again. So you need to repair the network.
@30:00 - Richard Gowers At that point. Yes. For that you're going to need like with assistant network signer which has some sort of idea of history. Currently we don't have this idea of history built in.
@30:11 - John Chodera (he/him/his) You can just use the protocols like score is this possible? Right in with the original network plan. So that if it can't do the transformation then it would just work around that during the original design?
@30:28 - Richard Gowers Yeah, but if it fails after trying to simulate it, you're probably past the planning stage. Right? That sort of you have to go back around for a second swing.
@30:35 - John Chodera (he/him/his) It sounds like you're talking about nice to be able to do that too. But if the transformation is valid, it should hopefully be able to complete the transformation at some point. But yeah, ideally we want these to be fully adaptive.
@30:50 - Richard Gowers Right.
@30:50 - John Chodera (he/him/his) Where after something is you can continue to replan how much effort along the edges. Right. So another question could be is there's. Does the network have a concept of weight for the edges where you would be able to reallocate effort in a way that is commensurate with how easy each transformation is compared to some can differ by order of magnitude in terms of difficulty. So I think my fear here is like it's a goal here to support multiple engines in the back end or just like Percy's, multiple free energy engines. So if the goal is to support multiple engines, they all have different efficiencies, able to do certain things, they even set up instinct definitions themselves to be different. Right. So it kind of seems like it's going to be just a lot of ground to actually cover here, just to even figure out the mapping itself robust. So I don't know, I would actually advise just like focusing on one engine first and working.
@32:00 - Mary Pitman There because I think trying to support all of that once a lot of different things and to kind of add to that as well, like for the similarity scoring. So in this paper I'm writing, I've had people ask about the so called of optimal similarity scoring. But the problem is it's going to change for each way that you run the protocol. So you would have to collect information on all of those different types of ways for running these free energy simulations with folding at home as well. If you try to support all the different engines.
@32:49 - John Chodera (he/him/his) I think this is a great start, though. It's nice to have a clean, very simple API can support multiple downstream applications. So I think it's. Definitely step in the right direction. Other challenges with Adam mapping you had enumerated and showed us some of these earlier, including Pyrelli versions and some other things that one might need to worry about to topics I could share slides if you guys want to see what you're getting yourself into here for better work. Is that the most effective use of your time? 15 minutes left, so I think that's fine just to, like, I don't know, show you guys some of these things. So we have our own adapting portal based on the Rd kit that we've been having trouble with as well. So maybe just having some user stories here would be the weird coming in a calendar. Invite slack as possible. Okay, I'll just send it by slack it's. And this is just coming from area public systems that we have general stuff. SCREEN SHARING: Yutong started screen sharing - WATCH This one probably isn't super important to you guys, but there's just something that we found with like there's a certain thing you have to do to turn off certain bonds in terms to ensure that the instincts are actually defined, actually affect a couple of situations as well. Because for something like this kind of a map you have to be very careful that. If you try to do like this alchemical transformation with this as the core regions you have to turn off one of the bonds here to ensure that the don't have the ability to do bond breaking or reopening closing. You have have to actually do all or nothing here. For the ring group you can't just do a partial map that's actually illegal. So here's one case where the possibility that you have for the chemical transformation will actually affect what's valid and what's not valid. There's also certain things you have to do with being able to support the software for the exotions when you do certain kinds of Adam mapping society involved with core core bonds being broken. Our weakness FMCs module is just a speech that showing certain situations where setting consumers rings only equals to true on the atom for the bonds just leads to. Either in the loop, like just doesn't terminate or it just times out or it just returns very fast. But then you get a trivial core. So you just get a single bond in the MCs. This has been actually the bank of our production experience in the past several weeks. Be very careful with Karaldi. You can do certain things with out of mapping. Sorry, I can't see what Jeff is saying there. Let me check what I can share. I think these two sites probably fine. Karate is going to be a major studio with two because you can get out of mapping in ways that's going to invert Cairo centers if you're not careful. If you do it based. Graphs then that doesn't know about terminality and he already gets a match. Carol Tags actually doesn't do what you think it probably does. And I as far as I can tell, their MCs also doesn't really work either with paranoid. It ignores Paris. Yes, last year hedges. I will be sure to reach out to you naturally. Thanks. So we've seen issues with priority too, that just require a lot of different tricks to actually overcome. So what you actually want for the Atom atom here is that you don't actually want the full mapping that maps all six atoms to each other, because then you're going to have an accidental catalogue. In the instance here, you want to make sure you only restrict yourself to five or six items and introduce a few additional dummy items to do this transformation. You may say, well, just align them so that they're employee so that only matches are close to each other in space. But there's other issues with like different combinations that you have in like rings, like little hexing and whatnot that can actually also be be a problem. Even when you try to do the distance aware Adam mapping. There's other things that we could talk about, but these are just like some of the issues that we have seen with Adam acting. And we only right now support our own back end for the actual clinical transformation. I just think they're trying to support other engines and all that too is just going to be a lot of different corner cases and a lot of pain. And you guys make sure at least you need to wait to have the engineer. The problem here is that for something. Like, a kind of checking. This actually doesn't even give you a problem where, like, the bar errors are, like, high. Like, there's a lot of, like, this class of, like, errors that you can't even detect in terms of trying to measure the conversion of the artistic protocol. It's just going to design. The look of your simulation has converged. These kinds of bears are the worst kinds of bears because you just tell people, here's a good set of LGs, and then you don't know what actually went wrong until you have to go back and look very carefully. Those are the two not so contrived examples that we've actually seen in production that made Adam very difficult for Joe. Do you guys have any more comments? Comments on how to mac and Land? I think you covered it. There's a lot of issues with RDK can't be more specific than that. Yeah. And Forest also, I think, has just seen just like people like to do crazy chemistry at relay and the more rings you have, the more fuse they are, the more spiral bridges, cairo Bridgeheads you have, the empty hardness of the MCs problem becomes very, very evident very, very quickly.
@40:40 - Richard Gowers Nphide is always fun. I think with the Chirality inversion, I think we were looking at using the add to callback. There's a callback on Rd SMS, but you can kind of give it a function that will do a final check on if an item can be matched. I think that's how we were planning on getting around that.
@40:56 - John Chodera (he/him/his) Yeah. So the problem there is that trial is not a local profit. It's not a comparison between, like, two atomic only properties. Like, our adding is actually defined in a non local way. Right. Because it's actually beyond your immediate self. You're looking at actually with your environment to actually say, what's my Cairo encoding here? And that can actually affect a lot of the actual experiences that we already could assume must be true. What is actually doing the MCs? So you actually can get into situations where you can still get a non valid MCs, even though you've got all these psychological tax that you've said, go ahead. No. So in preparing the doc posts, we've been talking a lot with Melissa about what the appropriate way to prepare all of these are. And I think in the simplest case where you have, like, fuel against and a single highly well defined core, you can. Fit them all in. But this trick and the nigga problem of, like, okay, I need to stop these all so that the core is in exactly the same place, but the core may be different for different pairs of ligands and then deriving mappings from those common cores versus if I specify the common core, need to know what the common core is specified to dr ligand. So we're in a little bit of a chicken and egg problem here. Hopefully we can sort it out by hand. But in general, I was just wondering if anybody had any other thoughts about how to do this in a self consistent way. We were essentially trying to maximize the core alignment of individual pairs of ligands, but also derive a single, uniformly docked set of all ligands in the same single or multiple receptors. Because in principle, in the limited of this, you can have a different receptor geometry plus, again, pair doc pair for every. Transformation you want to carry out, right. That could be completely different for every transformation in the network. For the printing league benchmark set, we're obviously trying to make this as simple as possible, have a single receptor, a single self consistent set of all doctor ligands from which we can derive these mappings. But even in getting there, it's a little bit complicated because some of these, it's hard to jam mobile against in exactly the same way with exactly the same core.
@43:28 - David Dotson Well, don't you have the same don't you have this problem, John, where if you say, okay, for this pair of ligands, I'll try to do it based on try to make their core line of the same. In this one, we're in our network, we are connecting to this one. So it needs to have its core alignment the same, and that one's connected to two others. Those core alignments need to be the same. Doesn't it just end up for a given alcohol network? You just need the same core alignment throughout the whole thing, or no, it's if you do have the same core alignment for everything, it makes it especially easy.
@44:00 - John Chodera (he/him/his) But it's certainly not required. Like, you only need to be able to carry out individual transformation using a receptor confirmation and a reference lagun and then a target league with some aligned core. Right? And that can be completely different for every edge in your network.
@44:19 - David Dotson You can do different every edge in your network but if you're using a given chemical system for many edges which that's the intention of the network you'd want to use the same coordinate set for that chemical system across all of the transformations I think you can't conclude that you would want to.
@44:38 - John Chodera (he/him/his) It just means that if you can that it's especially easy. But in earlier discussion, Dimitri, who had to take off pointed out that you can do an ensemble refinement to an Xray structure and get a variety of structures that are all fully consistent with your Xray data. Right? But they may have subtle deformations in them that may permit other leads to fit especially well. It's. It's perfectly valid an edge transformation using one of those ensemble structures which we assume has absolutely no energy penalty to access the card.
@45:11 - David Dotson There is that for building networks. You can basically make a connection between two chemical systems. It's the same ligand, but it's a zero delta g, and it's just and a slightly different confirmation. So it can connect up to these other ones. So you still get a consistent network.
@45:24 - John Chodera (he/him/his) Okay, yeah, that can work exactly. For the obviously targeting the simplest possible case single receptor, a single alignment of all documents with a single overlapping court. But I'm just wondering if anybody has any other thoughts about first of all, what's the right way to find that cork? And then what's the in general, what's the best way to find especially if there's hierarchies, of course, because we frequently run into this problem I will show you what this looks like. Frequently run to this problem where you have many leagues that match one core and then some others that you have to. SCREEN SHARING: John started screen sharing - WATCH Have a hierarchical step back of there's a different core in order to fit this guy, actually. And so it uses a somewhat smaller core. Thank you, Tom. Take care. So right now, we're just trying to abuse all the docking programs until they kind of work. But this is a more general solve as a community too, because you're responding the problem to the person who's doing the docking, which is Melissa in this case. Right? Yeah. Not simple, but a naive way might work for this if you have multiple structures, is to sort of ensemble docking approach to even just the initial docking. And that could inform both. What your core isn't could also potentially bleed into the way you construct your edges. This means you'll end with a different reference structure for every edge. Exactly. So what I just decided to do, because I remember it existed, had the license was an induced fit docking to see if that might but even then, that changes the actual structure for every weekend. It yeah. So the way I'm understanding of the way Schrodinger's inducement for call works is it actually gives you a consensus structure for all the for all of okay, so in that case, we would go back to a single reference structure is used for everything. Yeah, that's probably what we want for the benchmark at least, but just something for people to think about.
@47:43 - David Dotson Maybe a key difference between benchmarking systems versus, like, real systems. John for example, like in the ASAP effort where we're trying to get these hundred and 50 vehicle structures.
@47:53 - John Chodera (he/him/his) Yeah.
@47:57 - Jenke Scheen So john similarly. I don't know if anyone has ever tried it, but in our deck at the MCs code, there's a parameter that I think is something like seed underscore MCs. You can feed it just an MCs to start from to compute the MCs for only impair. Right. So pretty efficient approach would be to do an MCs correlation on the whole set. And you have probably a pretty small MCs if you set as large and then use that as a reference structure you were talking about just now in three D and then purling and pair refine the MCs and kind of build from that reference pose yeah mentioned to Melissa and I had an early discussion about this.
@48:44 - Irfan Alibay The main thing is you cannot want to have to do your MCs for all your pairs because what you just showed John also that can really mess with your atom map because some point you end up with rather disconnected. We had this issue when we first did the first round of add to mappings with low map on this where sometimes we just weren't generating edges at all because things were just completely shifted from each other. SCREEN SHARING: John started screen sharing - WATCH So the naive approach here would just be to do a solvent all to all doc or for all the potential common sub structures and then choose the highest overlap ones.
@49:23 - John Chodera (he/him/his) But obviously that would be very expensive and also not guaranteed to be the right mapping so short of a workflow that they have since gotten rid of in the newest release for probably justifiable reasons. But it was clever in its inception, which was they called their MCs docking workflow. They used for this purpose where they basically would do an MCs constrained docking and then they would subsequently use a ligand that had been docked as the new two core constraints for elegant further in the pipeline that. Have a good overlap with the original reference, so I've tried that. For some reason, I'm getting really weird. Like, stuff is going in places it shouldn't be. But in principle, that would be a really useful thing to do, then digging into their scripts. And this is when we figured out that they just literally use artificial it. Yeah, it seems like it must be iterative, though, because even Yankees your suggestion about this, iterative seating it is iterative. And you can't run it in parallel. It has to be run, like, serially. You can't distribute the job. Richard one way to fit this idea into the framework that you proposed is maybe not to presuppose. You already have the dock structures, but if you have a bunch of potential Adam mappings right. You can, in principle, record constrained docking and then score them based upon the clash. Right. So you can't you could then get three dimensional feedback from this magic. This mapping doesn't fit for any given pair. And that would allow you to self consistently, essentially select the best mapping that's spatially reasonable even within your current framework. Would you have previously generated a large ensemble of confirmations, then sort that. Either that or you just generate it on the fly. Right. So this is what our docking pipeline does right now. If you give it a core constraint between two sets of ligands, then it will use one of the ligands it has some structure for and then it'll place the other legend. And then score, based upon all the controversy, can generate what's the stocking score overlap. So I wonder if there's a way to even fit this within your framework.
@51:43 - Richard Gowers Yeah, I was trying to treat the inputs as sort of like read, only like you couldn't modify them. But it sounds like you might have to jiggle your inputs around to make them line up the way you want them to is what you're saying. Which the structure?
@51:57 - John Chodera (he/him/his) Yeah, you might have to drink all the structures. If you're asking the Admacking to drive the docking, then that still might be possible.
@52:07 - Richard Gowers Yeah, it feels like a step backwards to start thinking about docking again. I thought I thought I had docked stuff, but yeah, to your point is that you might have to go back and play with that again.
@52:24 - John Chodera (he/him/his) Any other thoughts on alignment? If not, then I think we're finished with that discussion. Thank you.
@52:33 - David Dotson Excellent discussion. Thank you, everyone. Any other questions or comments?
@52:40 - John Chodera (he/him/his) There was one question from Relay, which was they were using the protein Leagan benchmark for some evaluations and then all of the structures changed. SCREEN SHARING: David started screen sharing - WATCH And so I think there's an action item to either create another repository or cut a release for the previous state of the inventory for the pricing. Again, benchmark.
@53:01 - Irfan Alibay There was a release zero before 0.3, right?
@53:13 - David Dotson Yeah, there is release zero two one that we released earlier this year.
@53:18 - John Chodera (he/him/his) So we just I see. There is the release tag. This is lifeconcept. That's not it.
@53:25 - Mike Henry (he/him) Yeah, probably to just raise the visibility in the README or even pan an issue to just point people to that older version of the code base. Because yeah, there's a couple of different groups now that have kind of stumbled into this and been like, what's? Everything different.
@53:42 - David Dotson Well, they're just straight up pulling the master branch. I mean, before, were they using LDFS and everything, too?
@53:48 - Forrest York I assume. Yes, that was what we were doing previously. No, actually, no. We were using LFS than anything else before.
@53:56 - David Dotson Okay. Or get LFS. Yes. So we've moved away what we've done with what we're doing in release zero three here, I'll show you the activity. So we have a milestone for this. We're trying to get this some less items out. One of the major things we've done I want to give Christmas she's worked on taking a consistent approach to docking all the ligands into the targets available. And then for cases where these things haven't worked out, flagging them as well. This addresses a lot of issues that have been raised over the last year or so with ligands clashing with proteins and other problems with the existing ligand targets structures. But we also got rid of LFS. Usage usage. So we removed all those large files, all of those growmac specific files. We're trying to basically make this much simpler to use as a repo as well. And so this is intended to be a repo that we want to use. So we wanted to be simply as for users, we also wanted David that lives somewhere. Is there a terrible for with all the data that used to live in the LFS, you can still shoot okay, this gets tricky with LFS. The LFS stuff should still exist, but it's a bit of a pain to get to. So what you'd have to do, if you wanted to pull it, clone the repo switch to this tag. So zero, two, one, for example, that's the last release. And then. Get LFS, or actually, if you have LFS already installed in your system, it should automatically start pulling LFS objects down because you'll see all the Smudge files. So that should work for you, for the relay folks, if they're trying to use the old data.
@56:24 - Forrest York We've moved away from using this. We went back using the original American jacksets from Data.
@56:36 - David Dotson Okay. Yeah. So we recognized that it was difficult to use, and that's what we're trying to address here. And we also plan to be heavy users of it. So that was the reason for us pursuing this path. Does that answer your question?
@56:58 - Forrest York Yeah, I mean, I think. Is actually one question I had. Is there a easier way to know? Cause like there were some comments that kind of alluded to like why things were removed. But I mean, we were talking to John earlier today that kind of played up a lot of it. But is there a better way of like knowing what the kind of rationale behind line these changes are?
@57:23 - John Chodera (he/him/his) Yeah, one of the things we're working on is just for like a change log that will talk about what the major changes are and what was the main driver behind those. We talked through some of those earlier about the asset conditions and the crystal graphic structure deficiencies, but we'll make a detailed change log. Great. That's something that the open free energy industry folks have been asking for as well.
@57:50 - David Dotson Yes, and we've had discussion on how we want to do this change log as well and how we want to do subsequent releases of this review repo. So we do want three folks be living as unable to move. Forward. Obviously we do releases such that you can always go back in time to pinned a specific version you are using. But we need to be able to evolve it as well. And along with that is a change log for what decisions are made for new targets added, new Lincoln's added, maybe targets removed as well. So being able to do both but having the rationale documented in the change level, would that be sufficient for us?
@58:26 - Forrest York Yeah, that would help a lot.
@58:29 - David Dotson Okay. Yeah, that's our desired end state here. We're still working toward that. Right? So we're trying to get zero three out. That would be like the first iteration of this and then from then on it would be more of an incremental approach. Yeah, great. Okay, we're at time folks so I don't think we have time for status updates. We are one week out from by November 1 deadline. I'm pushing hard to get there myself. Yvonne, mike. Are you guys free to meet today on the Percy side for the usual call? Never mind. In that case, that's fine. We can also oh, I see it now. Okay. Yup. Can move that later, if that's possible.
@59:21 - John Chodera (he/him/his) Yeah, I'm still working on writing the other unit.
@59:25 - David Dotson Okay, that's fine, thank you. Otherwise, Mike, is a protocol settings.
@59:30 - Mike Henry (he/him) I know, it's ready for review now. I've gotten the last round of dock strings in and clean some stuff up and got the dogs to build.
@59:38 - David Dotson Thank you. You're fine. Do we want to meet this later this week and put team league in benchmark or do we want to kick it to next week? It's not correct.
@59:45 - Irfan Alibay Yeah, if I cancel it, I'll contact you.
@59:49 - David Dotson Okay. Thank you. Any other questions or comments, folks?
@59:56 - John Chodera (he/him/his) Thanks for watching.
@59:58 - David Dotson Yeah, this is fantastic.
@59:59 - John Chodera (he/him/his) You can share that. |
IA : | |
IP : Nonequilibrium Cycling Protocol ( |
|
MH : |
|