2022-12-06 Protein-ligand benchmarks meeting notes
Participants
@John Chodera
@Jeffrey Wagner
@David Dotson
@Mike Henry
@Irfan Alibay
@Richard Gowers
@David W.H. Swenson
Levi Naden
Benjamin Riew
@Iván Pulido
Goals
DD : current sprint status - deadline 12/13
architecture overview : https://drive.google.com/file/d/1Elw5vWYXuGKSuO-E3jNMxkYnSQaioVF8/view?usp=share_link
coordination board status : fah-alchemy : Phase 1 - MVP
updates on In Progress cards
seeking volunteers for unassigned cards in Available
DD :
fah-alchemy
0.1.0 milestoneDS : deployment - CLI for service startups
MH : deployment - Docker image build and push
LN : security - scope enforcement by APIs
IP : Nonequilibrium Cycling Protocol (
perses
#1066) update:IA :
protein-ligand-benchmark
: blockers and prioritiesDD : cards in play
Discussion topics
Item | Notes |
---|---|
IA : |
|
DD : current sprint status - deadline 12/13 |
|
DD : | |
DS : deployment - CLI for service startups |
|
MH : deployment - Docker image build and push |
|
LN : security - scope enforcement by APIs |
|
IP : Nonequilibrium Cycling Protocol ( |
|
DD : cards in play |
|
Transcripts without edits | F@H interface meeting - December 06  @00:00 - Jenke Scheen (John Chodera) and feel easy. So the general kind of feeling was just, they were just very interested in the work. I think that's why I took so long. So yeah. Done. Actually not done. I need to get corrections now.  @00:18 - Mike Henry (he/him) Awesome. SCREEN SHARING: David started screen sharing - WATCH  @00:24 - David Dotson Hey folks. Okay. Thanks for joining. We'll go ahead and get started. So, um, Jeff, I don't know if you'd be willing to take some notes.  @00:48 - Jeffrey Wagner Yeah, can do.  @00:50 - David Dotson Thank you so much. Okay. All right. I've got the agenda on the left sort of restructured how I've done doing things at the moment, very much around sprints. So we are currently in a sprint with the deadline at 1213 for that sprint. All the items in the agenda on the left should have representation in the sprint itself. If you're interested in how the sprint is defined, it's whatever is in this in progress column on the MVP board. So we've got three sprint columns now. This is what's available for the next sprint or even the sprint if folks are feeling or they get whatever they're currently working on done and they're ready to move on to something else. In progress is what we're already committed to doing this sprint. And then in review is anything that's that's basically hit the review stage. So if you're just waiting for review, and that's the place to put it. I think I. they can probably do some kind of automation and make it, I think it have some, has some sort of an automation to make it things go under review when you say you're ready. But anyway, this is the, I'm trying to orient, work around since we're now very much in coordinating lots of people and trying to do so on a time, on a tight time scale. So the, the meeting notes and the agenda now reflect that as well. So any questions or comments before we jump into the items?  @02:31 - richard Yeah, I got a quick one. Can we do the protein ligand benchmark stuff first? Because we've got someone that needs to do that and then leave.  @02:40 - David Dotson That sounds good. Yeah. So you're fun. Do you want to go ahead and tell us the current status on protein ligand benchmark?  @02:49 - Irfan Alibay So, I think we're just waiting on 82. Like I think it's a, it needs a review. Yeah.  @02:59 - Iván Pulido (John Chodera) Yeah. So I. Check that, but we still haven't removed the binary files.  @03:08 - Irfan Alibay So my understanding is that we're doing that after just before merge, right? Because it's some of the more remains are moving in. Now might be difficult. I can do it. It's up to folks. We can do it now. We can do them as an initial thing. We do it before we. Sorry, not before merge. Before we do the release so we can just clean up history completely.  @03:31 - David Dotson That points. I guess it's better.  @03:37 - Iván Pulido (John Chodera) I think it's better to remove them before because or else we have this big diff in the repo that we don't really want or we don't have a benefit for having it.  @03:49 - Irfan Alibay Does that happen even if we do like the LFS merge? And I guess the VFD tool thing.  @04:00 - Iván Pulido (John Chodera) Oh, I don't know about that.  @04:03 - Irfan Alibay Like the deleting files that you might have secrets and that one.  @04:09 - Iván Pulido (John Chodera) Or like deleting them from history that after the file. Oh, okay. Yeah. That should remove also the diffs. Yeah. Okay.  @04:20 - John Chodera (he/him/his) Were you able to verify, Yvonne, that all of these can be actually set up and run? Or do you run into any issues with any of the systems?  @04:28 - Iván Pulido (John Chodera) So, yeah, I did test that only that the edges have to be regenerated. Because there are some leons that are in one branch, but not in the one that has the edges. But the structures were corrected. I think only one of the proteins had like a troublesome structure and that was already corrected. I reviewed the PR. I just didn't formally approved it or reviewed because I thought we were waiting for the binary files to be removed. But I'll do that now.  @05:00 - Irfan Alibay Great. We're moving the binary files separately, so we just clean main directly. That'd be great.  @05:07 - Iván Pulido (John Chodera) You can just keep moving. Yeah, that makes sense.  @05:09 - Irfan Alibay Awesome.  @05:10 - David Dotson Okay, cool. So it sounds like this one's clear for merge. Uh, uh, can you go ahead and follow up on that? Did I actually that would unlock us?  @05:20 - Iván Pulido (John Chodera) Yeah, definitely. Yeah. Thank you. You're fine.  @05:25 - David Dotson Any other comments on protein?  @05:28 - Irfan Alibay Yeah. So the next steps here will be to, um, which area would you rate these edges? As I mentioned, I don't know. I assume that should be fine. I don't know if you're going to run through that again. Otherwise, then as long as the edges are readable, we can merge that in. So I just fixed the proteins. I think I was, I fixed the proteins. I mean, fix the things like changing the rest of your names from enemy. And I made to enemy. of those type of things that were more PDB compliance and I think where we might be good to go as a point-free.  @06:08 - Iván Pulido (John Chodera) I can once you generate the edges again I can rerun them it wouldn't take more than a few hours I guess it depends on like demand but I prefer to double check that.  @06:23 - Irfan Alibay Okay I'll ping you as soon as I come up.  @06:31 - David Dotson Hey Cole is there anything you guys need to from the rest of the group to coordinate on these two things? Okay thank you. Any other questions for Yvonne and I have fun? All right thank you both. So we'll go ahead and jump into just the... the overall board status. So just starting from the top here. Actually, first, Ben, did you have any comments or questions you needed addressed on PLB?  @07:11 - Benjamin Ries No, thanks, I'm fine. I'm on board. But I also have to go now. So thanks a lot.  @07:16 - David Dotson See you. Thanks so much. Take care. Okay, we'll skip architecture review. We did that last week. Anybody can chime in if they actually want another review of that, but it's the overall diagram picture. It hasn't changed since the last week. So as far as board status, I want to, the way I would like to do this is we already have folks in the agenda. I want to make sure that people are represented in the agenda for the things they're working on. So we'll walk through each of those, as we've done previously. What I want to do here is just ask Yay or Nay if these things are ready for being moved into a different column. So keep that in mind. First up is Swenson on CLI for service startups.  @08:04 - David Swenson Is this currently in progress? Still in progress, yes.  @08:08 - David Dotson Thank you. Yvonne, non-equilibrium cycling in progress or is this waiting on review?  @08:14 - Iván Pulido (John Chodera) I have to do the non-equilibrium protocol result part that we discussed last week. Yeah, I was in this conference last week so I couldn't work on it.  @08:25 - David Dotson Okay, thank you. And then we just talked about these two. So we got the status there. Levi, are you on the call? Yes, he is. Yeah, I'm here. Is this in progress or are we ready for review?  @08:42 - Levi Naden That is in progress still. I had some blockers that prevented me from being able to work on it this last week but I'll be able to get some time in and on this week.  @08:49 - David Dotson Okay, cool, thank you. And the rest of my tickets here, I'll cover some of these. These are currently in progress as well. So, okay, thank you. We do have some items available. available. And so are any of these items folks would like to move to in progress? So Mike, for example, we've got deployment, Docker compose.  @09:13 - Mike Henry (he/him) Yeah, that can move in progress. Perfect. Thank you.  @09:19 - David Dotson And I know that your Docker image build is now on review, so I'll review that later today. Thank you so much for doing that.  @09:27 - Mike Henry (he/him) It's looking good.  @09:29 - David Dotson Yeah, it should be good to go. Perfect. We did have in a working session with Yvonne and Mike last week, we'd discovered some things for some improvements needed for settings and goofy. We don't have anybody assigned to this at the moment, but I think it is critical for usability. I'm building out a tutorial for using building all chemical networks and having some of the improvements in this PR, I think are going to be key. Is this something anyone's able to take on? I think Mike, you may be best positioned for this.  @10:06 - Mike Henry (he/him) Yeah.  @10:09 - David Dotson But it looks like there's some debate as to what the writer press might be.  @10:12 - John Chodera (he/him/his) Would it be OK to maybe if we could have a quick chat between Ivan and Mike and I to just start through any remaining questions here?  @10:22 - David Swenson Can I be involved in that as well?  @10:24 - Mike Henry (he/him) Sure. Yeah, maybe it would just need a little bit of a, because I think there are, I remember correctly, and I remember correctly, we just need to make a few decisions on how to best implement it, although I think I actually have a good path forward. So maybe what I'll do is I'll take a stab at that and then try to arrange a meeting maybe next week with those who just identify themselves.  @10:52 - John Chodera (he/him/his) Wait, maybe we could just invite David to the proceeds. David meeting and just spend the first few minutes on that.  @11:02 - Mike Henry (he/him) Yeah, that could work.  @11:03 - John Chodera (he/him/his) If we need to resolve anything next week, stop meeting, right? Next week's that meeting.  @11:07 - Mike Henry (he/him) Yeah, perfect. Perfect. Yeah. And then that way we just go for this meeting. So I think I think you should be okay.  @11:14 - John Chodera (he/him/his) But you may be able to resolve this before then, and we can just talk about anything else that's. Still outstanding.  @11:19 - Mike Henry (he/him) It's perfect.  @11:21 - David Swenson Yeah, actually, I. This might be, I think this is something that will be really easy to resolve. But it'd be worth having. I think if I have, you know, 15 minutes with you guys that can probably. Describe the plan.  @11:35 - John Chodera (he/him/his) So we could do that sooner or two. Yeah, yeah, that's fine.  @11:38 - David Dotson Yeah, you probably don't need me for that then. Because this touches a few things. There's the technical side of just serialization. There's also then just setting structure, I think. So it hits a few areas where I think once in a year, you'll be really key for this.  @11:54 - David Swenson On the technical side, I think the thing that needs to be, I think there are things that are that not everyone is aware of, but we've already implemented. And it's just please just use. those things because this will make your life easier.  @12:02 - David Dotson Perfect.  @12:03 - John Chodera (he/him/his) Yes.  @12:04 - Mike Henry (he/him) Yes, and honestly, David, you've been telling me to do this this whole time, and it wasn't until I actually started this, and then it all clicked. So I totally see no time out with the COVAX and setting that up. So now it all makes sense, but honestly, it wasn't until I try to stab at it that it'll fall into place. So thank you. I completely understand. That's what I plan on.  @12:25 - David Swenson OK, I completely understand how that can happen, too, because it is a kind of weird thing at first.  @12:31 - David Dotson So can I get an owner for this PR?  @12:34 - Mike Henry (he/him) You can put me down.  @12:35 - David Dotson OK, thank you. Sweet, thank you so much. I'll leave that in the available column, or unless you want me to move it to in progress.  @12:50 - Mike Henry (he/him) You can move it in progress.  @12:52 - David Dotson Thank you. OK, so the board is this current state of the board. Thanks, everyone. As I said, our sprint deadline is 1213 that we're aiming for. That is next Tuesday. We're trying to knock out all of these things in progress. I know that seems like a tall order, but just focus on the things that you're assigned on and try to get them out as much as possible. That's where we want to maintain focus. Any questions on board status? Okay, thank you. So specifically on FAA alchemy, I just wanted to talk about the milestone. A lot of this stuff is represented in the board already, so we don't necessarily need to hit anything new, but there's a few themes here. We've got deployment of things up here. We've already got owners for those working on them. Also compute the synchronous compute service. Still working on this and also building out the object store, which is this now will be using. So I'll talk a little bit about that later. I've got an example in a book for users and then the minimal necessary API points for the follow me client and then scope enforcement that's from Levi. So all of these things are represented on our board. This is our first release. There isn't another milestone which is not too, which I've started populating as well with things that will hit later, but nothing additional and anticipating is going to be over 0.1. Any questions on the release milestone? Okay, thank you. Swanson, do you want to give us an update on the CLI for service startups PR?  @14:43 - David Swenson Yeah, so I believe the status is now, so I don't be working on this a bit because we've been getting ready for our board meeting that we just had and trying to finish up our annual report. But I believe the status now after some improvements that you did, it's now that we just need to have a little bit of a break. These classes are download your blog daily, which is reporting that you can start with eleven at a WH Zombie whom you can take. If there is any statement was signed, that's not going to have a lot to provide. of conveniences so that we actually have commands that are linked to the environment variables or the other options that you can provide to the command or properly linked to environment variables which we've done in a few other places but we didn't do that for these commands. But I think that's all that's to be done. Is that to correct your remembering of this, David?  @15:21 - David Dotson Yeah, I think it's really just wiring things up. Yeah, exactly. The uh, in variables and uh, and the options for the casino. I, yeah, and then if you see any other gaps in the testing, um, go ahead and fill those if you can.  @15:36 - David Swenson Well, I'll get to that tomorrow probably.  @15:39 - David Dotson Perfect. Thank you. Yeah, but otherwise it's looking really slick. So, um, just to show off real, oh, I, I, sorry, I declared Conda bankruptcy yesterday so I don't have a Conda environment show folks but um, I um, basically now it's possible to do something like. I might be able to show this on. I don't know why I'm demo this. Basically doing something like Fa, alchemy. What is it? API start, and then it will just fully start up the G Unicorn based unicorn workers for the user facing API. And then if we also have the, if you want to start up to compute API, then this just works. And so like it's really slick. So David, you should be proud of this. I wish I could just show it, but I can try to show it at the end. Just had to slap the environment together real quick. Any other questions or comments for David? Well, thank you. Mike, do you want to tell us about Docker and which building push I know it's in review?  @16:57 - Mike Henry (he/him) Yeah, it's set up. I haven't done anything. to try to optimize its space, mostly because I just wanted to make sure we get things working. One thing that's a little weird is with the way the testing configuration is currently set up, PyTest spawns an instance of Neo4j. So, running the test as written inside a Docker container requires you to do Docker and Docker, which is technically possible, and I kind of got it to work. But I think I would just need to rewrite the test suite where it first checks to see if there's like a Neo4j database that you can connect to, and then if it can, it uses that instead of always spinning up its own, because then in the GitHub action CI, I can set up Neo4j as a service. So, like, that will, it will create its own little database that then the Docker image can talk to, which is how we're going to have it set up for the Docker compose set up anyway. So, that will, that will make testing the Docker image. It's easier, but I did test the Docker image locally and things seem to work fine.  @18:06 - David Dotson Okay, sweet. Yeah. Um, as far as testing the Docker image goes, is it, is it necessary to run the unit? Your view is necessary to run the test suite fully with the built image?  @18:17 - Mike Henry (he/him) You know, probably not. I think it was one of those things that I wanted to make sure that I didn't tell you the Docker image worked as it had built. And then like didn't even bother to test anything at all. Um, so it is possible that, um, it's not necessarily required. Um, so if, if maybe you're playing with the Docker image, if you're satisfied with it, or if you find some deficiencies, then we can work on it. We might call it like fine and not necessarily worry about like retooling the testing infrastructure. And because I mentioned it is, it is exceptionally useful to be able to just run PyTest and not worry about setting up your database and it just setting up just like Django, not having to like set up a Postgres database every time you just want to test it real quick. So. I didn't want to rewrite it in a way that limited the local stuff, but it is like certainly more dev work, we're near the end of the sprint, you know, probably can spend time spent elsewhere better.  @19:11 - David Dotson Yeah, I don't. Yeah, you've done more thinking about this than I have so you're in better judge, but I think I'll say that probably let's not prioritize running the unit tests or running all the the test suite within the image.  @19:29 - Mike Henry (he/him) Yeah, and that's, that's fine. It's definitely like a, um, I poked around at it to kind of see how it started up to see if I can like first chuck before I started it up and then I'm like, okay, this, this, I don't quite understand the pie test framework that we're using to expand at Neo4j instance well enough to like do this.  @19:49 - David Dotson We're picking backing off the Python Neo infrastructure so they built this library called grill grolt, which I had to fork because it does some hard you encountered this yourself. It does some hard. Constraints on cryptography.  @20:03 - Mike Henry (he/him) Yes, yes.  @20:04 - David Dotson I had to create a fork that removed those constraints and now we're using those in our test end, which you may have noticed. So we do a bunch of little ugly pins at the moment. Not pins, but we're using a bunch of get branches. We can proceed with that for now. But yeah, we're using growth as our interface for starting up Neo4j instances in Docker and that allows for local testing. It is what you're proposing for in terms of changing the test suite with that mean that only CI can actually run those things.  @20:34 - Mike Henry (he/him) No, no, no, what it would do is basically in the part in the testing set where it creates a Neo4j database for you. It would first just check, like, just running currently local code. And then, because then I can set it up to run as a separate database instance in the GitHub CI, which will simulate more like how it works, Docker compose it also removes. move the like crazy arcane magic that like getting Docker and Docker to work. I see. But how about that?  @21:07 - David Dotson Um, let's not expand the scope of this PR because I think you, this is already great. I'll review this as is we can get this merged. If you want to create an initial PR just to try out that idea of like, because I really like what you're doing with this where, you know, spinning up Neo4j as a service with GitHub actions that way it's already up. You don't need to rely on grolds on the CI end so much. Um, and if that allows you to do other things from the Docker compose perspective, like if that, if that enables better, um, testing of the continuous deployment infrastructure, then that'd be a win, I think. So does that sound reasonable?  @21:46 - Mike Henry (he/him) Make that a separate PR? Yeah. Yeah, totally. And like since the, since the plan eventually is to only build these Docker images at release time, it's also something that like, if we don't get to it right away, like we're not doing these anyway before we release them. So. We will have them manually tested at minimum.  @22:04 - David Dotson Okay. I would say, yeah, separate BR for the test suite adjustments, but I would say prioritize building out the Docker compose deployment first if you can, because we actually want to first. We can always build in more test apparatus around that afterward.  @22:22 - Mike Henry (he/him) Yeah. I'll have to look to see. I know there's a lot of groups out there that you Docker compose. So I'll have to see if they how they tested also in CI to see if there's like an easy way to do that, or if it's one of those things you just do before you launch or before you release.  @22:36 - David Dotson Awesome. Thank you for doing all of this. Yeah. Fantastic work. Any other questions for Mike? Okay, cool. Thank you. Levi, do you want to just give us an update on security? This is the scope enforcement by API's issue.  @22:56 - Levi Naden Yeah, I have not as I mentioned, I wasn't able to actually commit any time to working on that. I have been looking through the code base and that's about the extent of what I've been able to do this week. There was a few other commitments that got on the way that should now be out of the way. So I'll be committing a lot more time to that this week. So that's where that stands right now. It's just I'm catching up on code base.  @23:15 - David Dotson OK, yeah, that sounds fine. Feel free to reach out to me or others if you have any questions. I'm happy to answer anything like this with these possible. As you're as you're reviewing the code base. Is there any other any questions you want to ask here?  @23:28 - Levi Naden No, none right now.  @23:30 - David Dotson OK, OK, well, thank you again for taking us on.  @23:33 - Levi Naden No problem.  @23:36 - David Dotson Any questions for me by. OK, if on, do you want to give us some more detailed update on an equilibrium second? I know we hit you earlier a little bit.  @23:49 - Iván Pulido (John Chodera) Yep. So, yeah, last week we met with David. That's one I'm Mike. And we basically finished this. protocol part and we have a notebook with a way of creating the whole workflow, like all the objects that are needed for the protocol and running it. I think David's planning on using that for tutorials or something based on that. And now what we're missing is writing the non-equilibrium protocol result part, which is getting the free energy estimate from the results and then it should be good to go.  @24:37 - John Chodera (he/him/his) Is there any way it can help with that part of it or is it pretty straightforward?  @24:44 - Iván Pulido (John Chodera) I think it's going to be pretty straightforward, but I'll let you know. I would like to maybe review the part of the simulation unit. I pointed that in a previous meeting we had just to double check that we are not doing anything crazy with...  @25:00 - John Chodera (he/him/his) with the clinical urine cycling part. I'll do that.  @25:03 - Iván Pulido (John Chodera) Thank you. Yep. OK.  @25:09 - David Dotson Well, thank you, Yvonne. Do we do you have? So yeah, so you're working on the protocol results object, which takes a list of it takes that list of protocol DAG results and produces aggregated delta Gs. I know we talked about for start doing statistically aggregated delta G. So if you have a bunch of a single delta G from each protocol DAG results, combining them, you might take the mean. And then if you want to get an uncertainty, you might do.  @25:55 - John Chodera (he/him/his) I'm pretty sure we don't want to do that.  @25:57 - David Dotson Because you've resolved a single cycle. Each protocol deck result is a single, it is enough non equilibrium cycling to produce a delta G. So, Ivan, is that just a single equilibrium, non equilibrium run?  @26:16 - Iván Pulido (John Chodera) Yes.  @26:17 - David Dotson Okay.  @26:19 - John Chodera (he/him/his) So each cycle is going to produce one forward and one reverse work result, which if you estimate the delta G from that, it's just going to be complete and utter disaster. Like you will not get anything useful from that. So unless you were running infinitely slowly. So let's not do that because it's just a waste of time.  @26:36 - David Dotson Okay.  @26:38 - John Chodera (he/him/his) So we need to aggregate all the forward work values with the reverse work values and then we would bootstrap from the pairs that are associated with each cycling trajectory if we need done certainties. But we would definitely want to aggregate those and feed those into bar. There's no other practical approach.  @27:00 - Iván Pulido (John Chodera) Okay, yeah, that helps.  @27:02 - David Dotson Yeah, that can be done. So still good.  @27:06 - Iván Pulido (John Chodera) Yeah, so this would be done in the get estimate from the non-equilibrium cycling result, right? That's for David Datson.  @27:17 - David Dotson Yeah, so what we would need to do, I'm just looking at the code base here right now. So it's in your result unit. Yeah, here it is. So the result unit, what we're doing here, I see it's using a bar estimate. And as John noted, this is going to be too noisy. This is just probably a crap, but we are keeping the works, right? And you're putting them as paths. That should be fine. What I'm planning to do for the compute services is that those paths will any path that's included in a terminal node in your deck, meaning it doesn't have any dependence. These paths will get translated. into object store references and those corresponding files will get shipped to the object store, so they'll be preserved. What we can then do...  @28:10 - John Chodera (he/him/his) Hold on, there's just two numbers we need, two floats, right? We don't need to be shipping them around as files. We just need the two fucking floats.  @28:18 - David Dotson Hold on, hold on, hold on.  @28:24 - Iván Pulido (John Chodera) Yeah, I was thinking about that. We might only... From the simulation unit, Jats, return the forward work and the reverse work.  @28:32 - David Dotson Oh, I see. You don't need a trajectory of values.  @28:39 - John Chodera (he/him/his) Each of the works is a single number, right? Yes, for computing the overall free energy estimate, then we just need the forward and the reverse work for that one. If this is one cycle, then we just need those two. It would be useful to be able to store files that contain the forward work trajectories and reverse work trajectories for later optimization. purposes or optimization of their dynamic length. That could be a future potential use case for retaining them, but for estimating the free energies, we just need the two floats.  @29:10 - David Dotson I see, so I see. So the changes, I'd say, make Yvonne would be, you can probably remove these, because I don't think they're not going to use them, right? And so instead of doing a bar estimate here, just we'll return forward work reverse work, and these should be single numbers.  @29:29 - Iván Pulido (John Chodera) Or reverse research. Yeah, and I think it makes sense. Even this unit, that doesn't make sense anymore. So it will probably be in the other unit.  @29:40 - David Dotson Yep. That's fine too, but this unit's fine as just a, here's the results that are outputting. I guess, yeah. And so, you and I can be at our working session if you'd like to this week, and we can implement some of this stuff. But with that, these will then be the Those forward and reverse works, when we have multiple protocol deck results, we can then take those individual forward and reverse works, combine them in here. We'll do bar here. So we'll be doing the bar estimate in here.  @30:17 - Iván Pulido (John Chodera) Yeah.  @30:18 - David Dotson Okay. Yeah, that should work fine.  @30:19 - Iván Pulido (John Chodera) Makes sense. Yeah.  @30:22 - John Chodera (he/him/his) If you feel free to pull me into any working sessions where it might be helpful to just go through the data flow.  @30:27 - Iván Pulido (John Chodera) Okay.  @30:31 - David Dotson Yvonne, would you like to schedule a working session? Yeah.  @30:34 - Iván Pulido (John Chodera) Yeah, I will.  @30:35 - David Dotson Yep. Okay, cool. Thank you. No, great work on this so far. I mean, it's come a long way. So thank you. And thank you, John, for that. Sorry for that.  @30:44 - John Chodera (he/him/his) Yeah, we just want to avoid. I mean, it would be great to contain these. This is why I wanted to air this. Good. To have the entire work trajectory is available. I think that's going to be very useful in the future. I think the key is that we want to avoid estimating works from single pairs of value or free energies from single pairs of value. and aggregate as many work value work pairs as of are available when we compute the free energy estimate at any time.  @31:08 - David Dotson I see.  @31:11 - John Chodera (he/him/his) And so it can be you know that we can compute the free energy estimate from any pool of work values that are available at the time, but the more the barrier.  @31:20 - David Dotson Cool, yeah, so it really does make sense for all of that aggregation to happen and to get estimated here. And then. And that the unit of aggregation is the works. Definitely not the delta G's from these we don't want those at all. That actually brings up a good question then. Individual protocol drag results then would be kind of useless for getting delta G's. I guess they'll just be noisy. That's fine. That's fine. And as you produce more, you'll get less noisy results.  @32:01 - richard Okay, is there a reason we can't put more than one cyclic and to a dagger?  @32:06 - John Chodera (he/him/his) We can. It's just that if you're looking to minimize wall clock time, you want to exploit parallelization as much as possible. That's number one. Number two is if you're not using restraints, then in the first cycle, the ligand may eject from the binding site, and then you'll just be cycling around and solving it. So we've tried to accommodate that by using some sort of restraints to prevent that from happening. But it is possible that the more cycles you run, that just the more out of equilibrium you will create, because there may not be enough time to return to equilibrium in the equilibrium phases. Certainly, wall clock perspective, it's better. And so one of the other things you might think about Richard is just the normal non equilibrium where we run some equilibrium bag units followed by not single non equilibrium. So that we never have to worry about the cycle. part of that too. So that could be a more complicated variant that we eventually fit into this in future as a different protocol. Yeah, maybe I'm thinking of traditional my equilibrium that didn't quite get by this one.  @33:17 - David Dotson One thing that could be done, but this would be, this is outside of scope for the Soutmont. So you can rest easy. You can build a DAG where you run equilibrium component, and then you run n number of non-equilibrium forward runs. And then you could, I guess, for each one you would run an equilibrium segment?  @33:43 - John Chodera (he/him/his) Seems a bit silly, but I guess you could. Yeah, you can have a bunch of things running, not running equilibrium segments, right? And a bunch of things running non-equilibrium segments of the same length, and they would all be about the same expense. And so you would just wire it such that you're always shooting. from the end of an, shooting a non equilibrium segment from the end of an equilibrium segment, but never then connecting an equilibrium segment to the end of a non equilibrium segment. So you would run two equilibrium segments, you'd spawn off non equilibrium, one or more non equilibrium pairs into sleep running the equilibrium segments.  @34:18 - David Dotson I see parallel. I see, so you would run like both forward and reverse at the same time. You could basically do equilibrium on chemical system A, equilibrium chemical system B, and then spawn a bunch of non equilibrium.  @34:33 - John Chodera (he/him/his) As many as you want, yeah. And then you continue to run these with reverse plots, but you'd be able to run them at the same time. And you would never have this problem then that you're cycling and driving yourself further out of equilibrium.  @34:44 - David Dotson Sweet, yeah, and then from that, you would get a whole bunch of these work values, probably enough, I mean, if you did enough of them, then you could get, then what you could get is a reasonable delta G from even a single protocol. result protocol deck.  @35:02 - John Chodera (he/him/his) Possibly. There's a bunch of ways to structure this. You could run many equilibrium segments at the same time as well, and each one could spawn a pair of non-equilibrium switches so that you slightly decorelate, or you can continue to run equilibrium segments as you're going along and decorelate that way. That's how the Orion folks do that. They pre-generate those equilibrium snapshots and then run a bunch of non-equilibrium at Imperial. But there's a few dimensions that we can easily enumerate for a protocol based upon this. How many equilibrium segments at the same time? How many non-equilibrium at the same time? Then how many iterations of this you run? We could play with that, but it's much more complicated than just the cycling protocols. It wasn't the first thing that we picked.  @35:44 - David Dotson Of course. What we've chosen to do here is just get the simple case running, and then that can be a future PR on this, is adding some dimensions for how you want to structure this. Do you feel like you have a...?  @36:01 - Iván Pulido (John Chodera) What you need from this discussion? Yeah, and I already scheduled the meeting, and I invited John. So hopefully we can discuss this with detail in that meeting.  @36:13 - John Chodera (he/him/his) There's one thing that your phone points out, too, which is that do we only do one cycle per unit or per protocol unit here?  @36:24 - Iván Pulido (John Chodera) Yes. So it does equilibrium forward, equilibrium reversed in the simulation unit in the unit.  @36:34 - John Chodera (he/him/his) One thing that your phone points out is that it might be useful to have the ability to specify multiple cycles, but we can always add that later. Yeah, sure.  @36:44 - Irfan Alibay Like otherwise, it's more.  @36:46 - John Chodera (he/him/his) Sorry.  @36:46 - Irfan Alibay Go ahead. So I was going to say my comment is more, I think Open IP does a slightly different, which might explain why Richard had that question. I mean, it's more we parallelize at the DAG level as an each of our unit would eventually end up on it. on the different worker, if that makes sense. Rather than a one-dag, one. It's fine. We can deal with discrepancies between two eventually later.  @37:12 - John Chodera (he/him/his) But I think the idea here is that each GPU receives a sound protocol or a work unit here as well, right?  @37:23 - Irfan Alibay Yeah, I just, I need to double check this. I think we might, our protocol might be slightly off on this. That's always making us a comment. Like, we might need to realign back.  @37:42 - John Chodera (he/him/his) What's the appropriate venue to do that?  @37:46 - Irfan Alibay I'll have a read through that PR and I'll make sure that it makes sense. Like, isn't like, there is actually a descriptive team. We can discuss it and tell you whatever it is.  @37:55 - John Chodera (he/him/his) It doesn't really matter how long it runs.  @37:58 - David Swenson It's gotta be matter. I think we can really quickly discuss. I'm understanding that this was such that you would send out a DAG to a worker, right? It's the entire protocol DAG gets sent to the worker.  @38:11 - David Dotson Correct, yes. So the contract we make with protocol authors is that a single DAG gets executed. All the units in DAG get executed with the same shared space as in like they can write to context.shared and they can all see that, right? And so for all community, what that means is that a single protocol DAG gets executed within a single service. So even though there may be many services pulling tasks from, say, the state store here, they pull entire protocol DAGs and they execute that DAG in full. So a DAG is never. executed across many services. It's always executed by a single service. There can be parallelism within the service. Like that service can have a process pool, for example, and do a bunch of units in parallel at an even moment, according to the DAG. But a DAG in full is computed by a single service. Does that answer your question?  @39:22 - David Swenson Yeah, and that is the difference with how we're trying to implement it because we're looking at the parallelization at the unit level.  @39:30 - David Dotson You guys are also targeting HPC where you might be able to take advantage of shared file system across HPC nodes.  @39:38 - David Swenson Exactly, we're under the assumption that there is some shared file space that we can use.  @39:44 - David Dotson Yeah, in that case, in the abstraction sense, your compute service is the entire cluster.  @39:49 - David Swenson That's fair point, yeah.  @39:51 - David Dotson Fair point, thank you. Does that make some sense? I think that John did that answer your question.  @39:57 - John Chodera (he/him/his) I think so.  @40:01 - David Dotson Yeah, so there's just a bit of an architectural difference between like OpenFeeza XRSS project, which is focused on HPC execution of the same data model versus the file commit approach, which is more cloud oriented, where we don't have a single like shared file system for all these different things. Any questions, additional questions for Obama? Okay, we already covered protein ligand benchmark. So we'll go to me then. This will be, I can answer questions on things I'm doing. So my focus at the moment is the object store. So I got, we have integration tests in place that use modo, which is a mock library for Bodo. Bodo's V… AWS Python API or Python library interface for for AWS sorry. And so got this working actually just yesterday. So we now have a test suite that can execute pushes and pulls. So pushing protocol, uh, uh, uh, uh, results and pulling protocol, I guess two S3 to and from S3. This is important because we're, um, I wrestled with this for a while last week, which was doing one of store protocol, I results in the state store that is Neo4j. Um, here, where do we want to store them in the object store? And, uh, there's some complicated issues with trying to put it in Neo4j for one thing, you know, storing all these tags in there is going to proliferate a ton of nodes in our database and they're kind of useless but not really important from a database perspective. They can get kind of fat because they're holding onto these results so it just sort of balloons the databases own just storage content. It just didn't make a lot of sense so the architecture we're taking is that compute services, they ship, they ship their protocol, dig results to the compute API that then ships things to the object store and then a reference to that and what's called an object store ref gets created in the in the in the state store and that is basically a pointer to the protocol dig result that lives in the object store. So it's and then when a user on the client side over here asks for results they hit the same API as usual like they don't have to. think about where these things live. And then that API will pull, once it hits the state store, it'll know what it needs to grab. It'll pull this from S3, and it'll ship it back. And the serialization path for this is pretty fast, because we don't have to deal with converting into Neo4j land. We just completely sidestep that protocol DAC results are serialized to JSON, and they're just shoved straight in as byte data into S3. And then the reverse operation is also pretty fast as well. So got that working. I'm now working to wire that machinery up into these two APIs. So basically it pushes here, pulls here, and then corresponding things in the clients to make sure that those pushes and pulls can happen. And that's key for getting the synchronous compute service into a feature complete state. The next step beyond that would be to do that path translation. stuff that I talked about earlier where if a protocol, if a protocol, DAG result, terminal units, that is units that don't have any children, or any dependents have any paths enumerated in there, then those will get translated into object store refs in the same way as a protocol DAG result is. And those files will make their way to S3 as well. That way when the client pulls these things, they have a way to just pull in any relevant files they need. And I know that's needed for the replicate, the replicate exchange protocol and open a fee. Richard. So that should accommodate that and that uses the same machinery. So it's, it's not, I think we've got a fairly minimal set of machinery needed to do all that gymnastics. So I know I just kind of threw that at everyone. Do I have any questions on object store? Okay, I'm also working on an example notebook. This is our documentation for user client. So Yenka, I've pushed an update to this as of yesterday as well. So this is what it currently looks like. What we do is I'll make this big for this. I'm basically just walking you through the context, goofy objects. This is basically a goofy object tutorial. So building a sample network using OpenFe benchmark, using tick two in particular and using the defined connections in there, building the chemical systems, defining the individual components. This would be the Liggins small molecule components, the protein solvent component. and then building up the individual chemical systems for the complex cases, building them for the solvated cases, and then defining the transformations between them and using those network edges to build those up. So I wanted to hear to get into settings, and this is where I'm dependent on the work like you're doing over in Goofy to make the settings a little more user-friendly. This is the same issue we hit when we worked with Avon on Tuesday, just getting those defaults. So I'd like to make this as user-friendly as possible to finding these settings so that a user like Yanko who's building up this, an all chemical network doesn't have to define 100 lines of stuff. It ends up being pretty short if they're using all the defaults. But this is the current state, any questions for me?  @46:56 - John Chodera (he/him/his) Is there a way I can jump in on some of the. reviews of the settings stuff too because I, um, sure, it would just be useful to make sure that we can specify everything that we need to through through this in an appropriate way.  @47:14 - Mike Henry (he/him) Yeah, right now we're working on still iterating like all of those pieces are. So we had the first pass of it done that has a lot of force field details and some bare minimum thermodynamics stuff. But for example, we don't have like the time step yet. Like obviously you need to specify your time step and our current plan is with the protocol that Richard's writing and the protocol that Yvonne's writing, both of them will stuff all the settings they need in a like protocol settings, um, class, and then we will massage that into like where does that hierarchy fit? So I think that would be the perfect time to keep you sure because it's just one of the things like there's some things that Percy's needs but not general protocol needs and it's hard to figure that out without like having the rubber meet the road.  @48:00 - John Chodera (he/him/his) And just for. to double check for sanity. You had a force field file equals a single file.  @48:06 - Mike Henry (he/him) That's actually a list, right? Yeah, yeah, yeah. That was like one of the first bits that we got hit with was, well, and like right now you can just put anything in anyway.  @48:15 - John Chodera (he/him/his) Yeah, yeah. Okay, got it.  @48:18 - Mike Henry (he/him) Because it's confusing, like we'll have to figure out how does the API specify if your proteins won force field and you're using a different force field for the small molecule. Like that seems like something the protocol is concerned with. So that might, we might have to change where that entry point is for specifying force fields.  @48:36 - John Chodera (he/him/his) Yeah, it depends on whether it's like an Ember force field or a Gromax force field or a OpenMM force field or Open Force field force fields.  @48:45 - Mike Henry (he/him) Yeah, exactly. So that's where kind of playing around with this will help figure out like what does that API look like? Where things that can be really general and things that you have specific to the protocols.  @48:56 - John Chodera (he/him/his) But the force field file is separate and outside of force field settings.  @49:00 - Mike Henry (he/him) Yeah. Currently, that's like very much the reason. Where is everything else is bundled into a settings object? Yes, yes. So the original reason for that was because I set up a convenience method that if you gave it an open FF force field file, it would know how to parse that file and populate everything. But in order to do that, there's a bootstrapping problem. If the force field things is buried inside of a subclass, I can't populate fields without inspecting it. So I just pulled it out. But that revealed some issues. So still working on that design.  @49:36 - John Chodera (he/him/his) Got it. Cool.  @49:41 - David Swenson Just looking at which you've got here, David, my understanding was that the settings is going to be not this setting for just kind of like an abstract class, but they're supposed to have a specific protocols settings. It's a setting should come.  @49:57 - David Dotson It's right here. Can you see it?  @49:58 - David Swenson I'm going to zoom in. But maybe I'm not reading this correctly.  @50:00 - David Dotson You're good.  @50:02 - David Swenson I'm seeing.  @50:04 - John Chodera (he/him/his) So it's the base class settings object that bundles. Okay. Setting. Is it to see the other way around or.  @50:11 - David Dotson So here's, here's the, so I'll walk you through. So, and Mike correct me if I hit any of this wrong. So we've got the goofy settings. It's just, which is where the. So these are the. The common settings that we're operating from the settings object is the, is the high level container. It's the top level. And then we've got some additional like individual thermo settings and for the settings here we're pulling in. The purses protocol for non equilibrium cycling has its own set of settings, which is its protocol specific settings. Like there are things that are only relevant to that protocol. And so when we build up our full settings container, we're passing in some. Some top level arguments here, but we're passing and also the force field settings. the thermos settings in here, we're specifying that we want this temperature in particular, and then we're passing in our protocol specific settings here, and we're taking the defaults by just doing this as well.  @51:10 - David Swenson So does that answer your question, David? My understanding was that there was on a protocol, there was a default settings class method.  @51:20 - David Dotson That's a good point. So what you're talking about is this thing. So I think I don't know if we have this wired up yet. I don't think so. It could be. Oh, we do.  @51:44 - David Swenson Yeah, I think yeah. I think the default.  @51:46 - Iván Pulido (John Chodera) I have to double check that because I don't know if they are the same settings. So I would have to double check that.  @51:55 - David Dotson But yeah, I didn't do this basically. Right. instead. Yeah. And actually, is that, yeah.  @52:05 - David Swenson I think it's that.  @52:11 - David Dotson I would check if you just wanted to do correctly, because if you wanted to then override them, you'd need to do a bit of a different path here though. So, or you could do something like, I'll show you a little bit. So, the approach we've been using and modify it and then create a new instance by passing it in.  @52:32 - David Swenson So, yeah, the approach we've been using has actually been to have them be mutable. Oh, shoot.  @52:38 - David Dotson Okay. So then you could just directly mutate the object at this point in the game, right?  @52:44 - David Swenson Yep. That's what we've been doing in our tests at least.  @52:46 - David Dotson Sweet. Okay, cool. Yeah. And I think resolving all these little pieces in a way that we're happy with would be the goal of that call that you guys are going to do. I'm also happy to join that call. So, I think, Mike, you said. It would be the Percy's dev call next week.  @53:03 - Mike Henry (he/him) Yeah, I accidentally invited you to it and then I invited you to it and I will invite you to it.  @53:08 - David Dotson Okay, that works.  @53:10 - Iván Pulido (John Chodera) Oh, wait, because I set up another one because I thought we ended up saying that it wasn't in the meeting, but another one.  @53:19 - David Dotson I think this is for this.  @53:21 - Mike Henry (he/him) The 13th, right. The one before this meeting, I think it's the Percy's dev talk that we talked about question.  @53:29 - David Dotson Is it possible that we can schedule something earlier than that as in like earlier than the 13th?  @53:34 - Iván Pulido (John Chodera) Right. I just scheduled one for tomorrow.  @53:37 - David Dotson I invited you, David Datson, David Tonson, John and Mike.  @53:42 - Mike Henry (he/him) So you scheduled that, unfortunately, right on top of our own SF staff meeting. Okay, feel free to suggest that. Yeah, let's yeah, yeah, we can we can find a time for us to meet then. Sure, that's fine.  @53:54 - David Dotson Okay. Just for clarification, we have two separate things. That's that's for the non equilibrium cycle protocol and then. We also have settings, but I know they're kind of commingled. So maybe it's best to just do these things in one go. Is that even possible, Mike?  @54:09 - Mike Henry (he/him) Honestly, it's pretty best to do it in one go at the same time because it's really easy to get out of sync with whatever changes you make and then break serialization. So keeping them coupled, I've definitely gone on the other path of trying to fix a problem and then realizing, oh, I need now test serialization and broke that. So I think kind of doing them in lockstep makes sense because they are so tightly coupled because they're both representations, one in memory, one on disk, essentially of your settings.  @54:38 - David Dotson Okay, cool. So can you and Yvonne work together to find a time on the calendar that works for myself? I think John also wanted to join and then Swenson, where we can resolve this and we'll also hit some remaining items on the non-agliger-siking protocol. We would just be working session for these two things, settings and the protocol.  @54:58 - Mike Henry (he/him) Yep, I will propose a time for that.  @55:00 - David Dotson Yeah. Okay, thank you. And that would come earlier than the next Tuesday call, because I think next Tuesday was looking for the time to be on the page right now. Yeah. Perfect. Thank you. Appreciate it. Cool. Any additional questions for me on the, on the tutorial side?  @55:21 - Jenke Scheen (John Chodera) I just wanted to say thank you, David, for pushing this. I'll have a look at the, the notebook tomorrow to get a first of bearing.  @55:31 - David Dotson Okay, cool. Yeah, I think it should at least give it, get you started, I think, because then you can at least start building out your own notebook apparatus for trying these things out, even if we're not at the point where you can just easily slot in settings for the protocols. So I think you can at least make some headway.  @55:48 - Jenke Scheen (John Chodera) Yeah.  @55:49 - David Dotson Perfect. Okay. I've got other cards in play as well, but those are the two primary ones that I'm focusing on at the moment. So. Two, two three. |