2022-11-29 Protein-ligand benchmarks meeting notes
Participants
@David Dotson
@Jeffrey Wagner
@Diego Nolasco (Deactivated)
@Mike Henry
@David W.H. Swenson
@Irfan Alibay
@Iván Pulido
Jenke Scheen
@John Chodera
@Richard Gowers
Goals
DD : architecture overview : PL Benchmarks on FAH - Architecture v5.drawio
coordination board status : fah-alchemy : Phase 1 - MVP
past 11/29 deadline - seeking extension
would like OpenFE REPEX protocol representation; is there an existing issue or PR for bringing it up to date with
gufe
Settings
?updates on In Progress cards
seeking volunteers for unassigned cards in Available
DD :
fah-alchemy
0.1.0 milestoneIP : Nonequilibrium Cycling Protocol (
perses
#1066) update:IA :
protein-ligand-benchmark
: blockers and prioritiesAdding missing thrombin entries and then good to go
Waiting on #81
Waiting on one of the networks to return (PR is up)
Discussion topics
Item | Notes |
---|---|
DD : coordination board status : fah-alchemy : Phase 1 - MVP |
|
DD : | |
IP : Nonequilibrium Cycling Protocol ( |
|
IA : |
|
Deadline extension | JW + JC + RG – New deadline will be Dec 13 Current In Progress: |
Transcripts with no editing | F@H interface meeting - November 29
@00:00 - Irfan Alibay view Hello.
@00:10 - Diego Nolasco (Open Force Field Initiative) Hey, from.
@00:14 - Iván Pulido (John Chodera) Hello. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Descartes, you're happy with Brazil performance. In the work up.
@00:41 - Diego Nolasco (Open Force Field Initiative) And you know everything is okay right we are winning. Yeah, but they might they might have seen church right. Yeah, but you know what, uh, presidents don't like me more. Okay. And not that I am cheering against him. Not that's not a case but. We are happy that other people are able to carry the team without name or. SCREEN SHARING: David started screen sharing - WATCH
@01:10 - Iván Pulido (John Chodera) Right. But we are.
@01:24 - Diego Nolasco (Open Force Field Initiative) We are most happy that Argentina lost than because of Brazil is winning.
@01:34 - Iván Pulido (John Chodera) Yeah, the big rivalry there.
@01:42 - David Dotson Yeah, thanks everyone. So, yeah. As you in the US, I hope you had a good holiday. I did. It was nice to finally unplug for a bit. We're back. So before we jump into the agenda, are there any items folks want to add or. modify. Okay, if not, we'll start with the board status. So. Actually, we'll start with just a diagrammatic view and then we'll turn them to the board. So I didn't update this diagram just to match current state where we are. We're architecturally and so. Colors indicate for each of these components, basically how far along we are and in maturity. It's a bit of a subjective measure, but our state store is pretty mature at this point. So we have that clear model worked out. It's gone through at least one refactor few weeks back and so it's in a place that we're quite happy with. It's really some quite happy with the forms of the core of the whole system. The two APIs, the VolkMe API and the VolkMe Compute API, this is the user-facing API. This is the Compute-facing API. These are actively being developed, but they do have a clear shape. They actually have working components and things like this. We're still adding API points to each of these as needed to meet the needs of, in this case, the user client, and in this case, the computer services. And so those are where, at least for me, most of my focus is going at the moment. We do have a lot of other areas to work on. We'll talk about those in a second. I've tried over the last few days to put together multiple issues that we can try to parallelize every other. So if I have volunteers in this group, I'd like to certainly recruit you to work on stuff. This is exciting to you. If you'd like to dive into this kind of working on a project that has this kind of architecture. And this could hopefully align with your interests as well. So that will be part of my sales pitch to all of you. But this is where we are. This is sort of the current map. I also tried to illustrate that we're not just needing, we're not just building a system here that can use folding at home. The idea is that this is generalizable beyond just using folding at home as the compute engine. In fact, we're actually building the non-folding at home components first because those are far simpler to write. In this case, I'm currently working on the synchronous compute service. This is the reference implementation for a compute service. This would typically be run on a debug host. It's used for testing. It's intended to be the dead simple case. Everything runs in process. Not too much magic to have to deal with. So it's easier to debug. That can also be used in production, although it would be less efficient just by virtue of its construction. Once we have this SyncRIS computer service down pat with all of the API points it needs to talk to the state store, then we can get to work on the async version, which is just basically a more efficient version. It does can do multiple tags at once. It takes advantage of an event loop, takes advantage of a process pool and all of that. So it's more of the optimized version. Less easy to use for actual debugging. All of that work will will factor into our implementation of the alchemy version, which is even more complicated because it needs to talk to a fully home work server. But the value here is that all of these things can be run on HPC or just a local host. They could be used this whole system that we're building here isn't just for falling at home. We really can be used generally by other folks who are not even involved with falling at home. At some point I would like to suggest. name change for the project. I'm not going to do that today. It's not important today. But that is kind of where my head is going in that. The following at home components are really additions to a system we're already building here. But we can discuss that in the future. I don't want to eat up too much time on that. That is the vision. Any questions for me?
@06:19 - Iván Pulido (John Chodera) In terms of the Q-managers in LILAC or different HPC, would that be handled by the users themselves? Or would FAL or can we have something to help with that?
@06:43 - David Dotson This model is not too dissimilar from the model we've taken with QC FRAPL, unlike QC Archive, in that there are users who simply submit things to the system. So they're over here on the left. They're building all chemical networks. They're submitting them to the system. They don't really net this whole... The system on the right is then a black box to them. It's a machine that I submit things to and then there's results that are available to me later. And so it's usually other folks that are involved with firing off workers to say lilac or firing off workers to other computer systems, keeping those workers running. Those are usually two different roles. At least that's what we've been doing with QC Archive. Different skill sets, different concerns basically. Users don't have to necessarily know how their stuff is being computed on the other end and the people managing the compute don't necessarily need to know too many details about what the users start tossing into it. Occasionally they have to talk to each other if there's problems but in the 99% of cases they don't really need to know. So does that answer your question?
@07:53 - Iván Pulido (John Chodera) Yeah, I think so, yeah.
@07:55 - David Dotson Okay. Yeah, so for example with QC Archive, I have workers on lilac. as jobs is in a preemptible queue that are just jumping on. Once they come online, they talk to QC archive, they pull work down to do. They do that work, they send it back. And, you know, Myself as an operations manager of those workers, I don't necessarily need to know the details of the quantum chemical calculations being done there. So we have to clean separation of concerns.
@08:27 - John Chodera (he/him/his) It does bring up the question of fault tolerance too. So, you know, if something goes down or something happens or it, it areas out of what's the current concept of how to, how to tolerate computational errors or things that happen with the computers running.
@08:43 - David Dotson So the current approaches. This is all goofy. It we, we get back a protocol deck failure instead of a protocol deck result. And that features every, every protocol unit that did succeed along with the last ones that failed. And so. And so those all get set back. We do need to make sure that this is, there's a clear way for users to pull those down. And so there's still opportunities for folks to contribute there to build out all of the detailed API points to pull these things through. But we do have that factored into the Core Data Model. We did that fairly early on. We wanted failure to be the first class that is in here.
@09:23 - John Chodera (he/him/his) And so that's if a job actually produces an exception and then propagates back up with that. So you can get stuff like that, right? So, and then there's also that if the compute host goes away. Yeah, yeah. And so just lose the job entirely and don't get anything back necessarily.
@09:39 - David Dotson Yeah, that would be a classic fault in that case. Yeah, so in that case, we're the way that these compute services grab tasks, they put a claim on a task. And so that keeps another compute service from also putting a plan on the same task. What needs to be implemented and you've reminded me, I need to also. So, my issue for this is that this API here needs to have needs to have at least one background process, which can be done in fast API that prunes, unclaimed or prunes claims that are stale, basically, basically walks through all all claim tasks and then establishes whether or not they've, they've made some stale bacteria. And so, and then that would allow, once a claim task is then the claims removed, then another compute service running on potentially another resource can then claim it and execute it later.
@10:41 - John Chodera (he/him/his) Got it. Yeah, I imagine this might get sophisticated with needing heartbeats or something like that. That would actually be.
@10:48 - David Dotson Heartbeats is usually, yeah, and the synchronous compute service already has a concept of a heartbeat. So even though it's a synchronous version, it does have one side to the sidecar thread that is doing a heartbeat. And so, We have yet to hook that into it to anything that database does that needs to go somewhere in order for it to be a meaningful piece, but we do already have this idea in mind. There's a lot of things to be. So I'm kind of in the, I guess I just need to see what fails for us to then address the effort is needed. Yeah, so, so as I said, everyone we're in the death by detailed mode for all of this so there's, there's usually more things to do than there are hands to do. Any other questions on the big picture though.
@11:37 - Jeffrey Wagner When we look at the colors. I, so can I interpret green is something that's pretty much ready to roll and yellow is something that's like prototype and red is something that's like not prototype yet or does yellow mean like. Not really a working MVP yet for the yellow boxes yellow is yellow is not feature complete.
@11:59 - David Dotson As in as in not even. feature complete for deployment yet.
@12:03 - Jeffrey Wagner Like we, it starts missing key features. Okay, cool.
@12:08 - David Dotson Does that make sense?
@12:09 - Jeffrey Wagner Yep. Yeah.
@12:14 - David Dotson But they do exist, right? So, red doesn't exist yet. Like we don't, you know, we may have a placeholder for it, but there's literally no implementation. Yellow means there is implementation, but there are key pieces of like API points and other things that are missing yet.
@12:33 - Jeffrey Wagner Cool, thanks.
@12:34 - David Dotson Okay. And so, this is reflected in our board. Sorry, that zoom keeps jumping in front of me. I've updated our in progress column to reflect current states. So we've got things folks are already working on. So we've got, so once in working on CO life or service startups, database in it, I know you've got that PR. I have to get to review that. So we can talk later today if you'd like to get unstuck.
@13:06 - David Swenson Okay. Yep.
@13:09 - David Dotson Mike, we've got you on Docker image build and push. Is there, do you have any questions for this?
@13:16 - Mike Henry (he/him) No, I am just working through my backlog and I'll probably get PR into this by the end of the day or tomorrow. Just to get a tough action, which should work. What I'm going to do is a separate action file because I think they're a little bit easier to maintain that way.
@13:32 - David Dotson That's perfect.
@13:33 - Mike Henry (he/him) I can then have it like target and run based off of like tags and things like that to satisfy the requirements.
@13:41 - David Dotson Yeah. Yeah, I would say, yeah, that would give you the custom trigger because we don't need this to happen on every single push to master or main.
@13:49 - Mike Henry (he/him) Exactly. Yeah, exactly. So I'll start it with just a custom webhook. So you go into the GUI and hit build, perfect start with that. And then we can automate it once we do. determine like the like what red jacks we want to use to match tags or whatever that kind of thing.
@14:06 - David Dotson Excellent. No, thank you for that. Let us, yeah, I'll watch for that PR. Let me know if you hit any stacks.
@14:12 - Mike Henry (he/him) Will do.
@14:13 - David Dotson If I'm, I know you're working on non-clogor cycling, we can hit that here in a bit. Is there anything you just wanted to flag right now? Yeah.
@14:26 - Iván Pulido (John Chodera) Well, I'm, yeah, I'm almost done. I just need a few pointers on how to get the attack results, but the protocol is running. I just have a few questions on how to get the results and how to actually know they're okay, because I see some contradictory results. And I also have a few questions with the settings, but I think we can talk about that when I, when it's my turn.
@14:59 - David Dotson Okay, that was. Thank you. I've also stacked in your funds, some of the things you're working on, and, particularly in Benchmark. Do you have any immediate updates on number 83 and number 82?
@15:16 - Irfan Alibay Number 82 should be good to review now. So I think Yvonne had started reviewing it. I just want to comment that Yvonne made. If we can merge 82, so I just now fixed the eye. It was not related to the CR, but just disabled the one to ask that it was pairing our Deacon hashes. I assume it wasn't changing for some reason. So if we can get this merge as soon as possible, then we can then deal with 83, which I think Yvonne waited out. I mean, that's blocked by 82, because we need to re-update things. And then we can finish things.
@15:55 - David Dotson Okay, cool. Is there anything you need from this group? Um...?
@16:00 - Irfan Alibay Just remember once they give it the green tag and then can go ahead. Oh, actually, yes. So there was a question about removing MEG Z files. Do we want to scrub them from history properly or do we, I mean, it's four mags. Do we do we mind just deleting them?
@16:19 - David Dotson We should be squash merging this at the end. So they are already in.
@16:23 - Irfan Alibay They are already in main.
@16:25 - David Dotson I see. That gets tricky. I'm inclined to say. That's outside of the scope of this PR to try to. That we're rewriting history for the whole branch. So like we can just do this in a PR.
@16:45 - Irfan Alibay Yeah, we could just leave them in the for now and then raise an issue to do it just before release. Does that make sense?
@16:52 - David Dotson I think that can be done. Yes.
@16:53 - Irfan Alibay Yeah, okay.
@16:55 - Mike Henry (he/him) Yeah, let's just make an issue so we don't forget. And it will. certainly be easy enough to run some get commands to find out if there's any big files in the history. They don't exist anymore that we can scrub out.
@17:13 - David Dotson They don't need GZ. Yeah. Yeah.
@17:19 - John Chodera (he/him/his) Yeah. That those should be moved to the release artifacts. If I understand correctly.
@17:31 - David Dotson Just gonna say C number for now. See here.
@17:41 - Jeffrey Wagner Yeah, I think those already would be in the release artifacts for previous releases, right? Because the release artifacts included our ball of just the entire zip that repo.
@17:54 - John Chodera (he/him/his) I thought these were newly introduced as part of the remediation effort.
@18:00 - Iván Pulido (John Chodera) Yeah, these are new.
@18:02 - Jeffrey Wagner Oh, so these haven't been in a release yet. Okay.
@18:04 - John Chodera (he/him/his) But their intermediate files that are generated by the script set Melissa wrote so that they're not strictly necessary. She thought they would be nice to have, but they accidentally got committed.
@18:16 - Jeffrey Wagner Yeah, got it.
@18:18 - John Chodera (he/him/his) They're also only useful for the people that have my screw up so that's not most of our target audience.
@18:31 - David Dotson Go ahead and add this to the. Yeah, we can. I'll go ahead and put it on zero about three for now. Thank you. Erfan, any other comments or questions you have for the screen?
@18:48 - Irfan Alibay That's it. We're just, I guess. So is emergency to do we get to go where we can get everything else done. I don't know if I don't have any comments on. Was that data format. 83 was that waiting for you?
@19:04 - Iván Pulido (John Chodera) It works. I just needed to change because the path the path changes. So I guess one quick question is. Do we expect to have like different mappings and edges network in the same data set? Because I see that we have now a special edges. Subdirectory that would probably be populated with different ways of having a network transformation network or different mappings. So yeah, I guess that's a question.
@19:37 - Irfan Alibay Yes, that was the I think Dotsun I had discussed at some point that was the plan so that we could keep history of old edges But then move forward for example if the first his map would change then we could use the latest one and you know So you know this is the better edges, but also keep the history. I don't know if that's if folks want to do We can also do it another way.
@20:06 - David Dotson Yes, the intention was that this repo for a given target, we have pre-baked networks. As in the edges are already established and those may or may not include a mapping. Is that correct or fun?
@20:20 - Irfan Alibay Ideally, they only include mappings. So that way, right?
@20:23 - David Dotson Those would be useful as possible.
@20:25 - Irfan Alibay Okay.
@20:26 - David Dotson Yeah, so they all include mappings. Now, of course, a user who's using them could choose to take the edges and leave the mappings behind, use their own mapper if they wanted. They could also choose to take the whole thing and then they'd be able to get as close to reproducibility on network plus mappings as possible. That's the intention. And then we can, over time, add new pre-baked networks to any given target and still keep the old ones as well. The but it's a directory of a growing collection of pre-based networks.
@21:09 - John Chodera (he/him/his) We'll have multiple networks then, is the idea.
@21:13 - David Dotson Yeah. Instead of trying to do a, here's a one-tier network done with this particular, I think that's an impossible standard.
@21:22 - John Chodera (he/him/his) OK, understood.
@21:23 - David Dotson But they're all still for convenience only. Correct. Yeah. You could also use these targets plus their ligands and ignore the pre-based networks entirely, build your own. But yeah, so obviously for benchmarking, we'll probably use pre-based networks for our own uses and open FF, because we don't want to have to also then re-compute networks if that's not what we're trying to benchmark. Like I said, users can also ignore that information if they like.
@21:55 - Irfan Alibay Yeah, it's quite important on those side of things as we're gathering results. So we start. for the programming image block so that we can point to which network is useful for which version of dataset because we're already so currently in the process of reading some of this new stuff from it to it and you know we can definitely see that there are some differences right we're doing the comparison from previous datasets, it's got one and it's even on on systems that haven't changed a lot we're getting a lot of differences so hopefully we can keep track of the tier and then solve half that history.
@22:32 - David Dotson See, um, you're fun. Do you, are there any points of ambiguity you want resolved here?
@22:41 - Irfan Alibay Um, so we've got Mete3 once Mete2 is done, I'll recreate the edges. And then, yeah, I mean it just needs checking and we didn't have the advanced script work so we can go ahead with that. So, Mario moves.
@22:58 - Iván Pulido (John Chodera) Yeah, are you ready? So, I merged. these two branches and run the whole dataset. And it works other than the ligands that I mentioned there that will be solved when you regenerate the edges. So yeah, it seems okay so far, yeah.
@23:24 - Irfan Alibay Also thanks.
@23:26 - Iván Pulido (John Chodera) And I leave my formal review once I review all of this.
@23:33 - David Dotson Yeah, Evon, if you can give your phone as much clarity, like I'm just gonna move forward or don't move forward, it's possible and that way we can just move to merge as quickly as possible.
@23:45 - Iván Pulido (John Chodera) Yep.
@23:46 - David Dotson Okay. Well, thank you. Yeah, because these two will unblock a bunch of other issues we can hit as well. Any questions for your phone on protein ligand benchmark?. Piers. Okay. Sorry.
@24:03 - John Chodera (he/him/his) Yvonne, you're going to actually try to run them all to make sure everything actually at least runs one iteration, right, just to catch any failures prematurely.
@24:16 - Iván Pulido (John Chodera) Yeah, you already did, but since we are regenerating the edges, I just have to do it again. But yeah. Great.
@24:27 - David Dotson Well, thank you. I can talk a bit about current status for things I'm working on. So I'm putting in benchmark number 78. I'm currently waiting a moment for some of these to be resolved. But I'll be. This is already an open PR restructuring repo around its data directory previously because it relied on LFS. It used like sample data. We got rid of that. This is. simplifying the structure and putting it into a form that's more in line with our vision. So I'll return to this work. This will be one of the things that's been unblocked by your phone's work. On the Falcomi side, the synchronous compute service has been the driving force for a lot of my work. I've got last week, ran into some issues with sterilization of paths and things like this. So David, I've added in use of your JSON code, or that's from Goofy. That seems to work well. Now I'm not adding some new issues with trying to shove a protocol, tag result into Neo4j. So I need to chase that one down, but that's just chasing down things as we go. So. Oh. As I work on this particular issue, I'm also hitting a bunch of other APF points. So this is driving a lot of the activity in the repo from my perspective. I'm still trying to get to an endpoint on this. It's also, oh, I've also used this to lay out some other issues like help on. We'll get to those in a second. This is unrelated to that is an example of a book as first documentation. So we do have a work in progress PR. I wanted to get this out for Yenka and John's consumption. So in this sense, Sphinx. Sphinx.cel skeleton, it also ends up to be a book. I can go ahead and we show this amazing. detail. I could use some one thing I wanted to include in here was use of settings. So, so here's the skeleton of the notebook. I'm still piecing it together. We're using Open a fee benchmark for the source files. I don't want to show how you actually pull in SDS, PDVs. And I didn't want to use a real protocol if possible, not dummy protocol. So, Richard, I question for you. I think RBFE, or sorry, in the replica exchange, do we have an issue for or a PR for bringing it to date with goofy settings? I know it's not a priority at the moment, but I don't know if there's an anchor issue.
@27:54 - richard I didn't need an issue with an all-out permit.
@27:56 - David Dotson Okay. I like it. can take one.
@28:02 - richard I'm there. I can do it now.
@28:04 - David Dotson Sorry, second.
@28:05 - richard I'm there. I can do it now.
@28:07 - David Dotson OK, thank you. Yeah, because I'd like. I also want to use this as a hedge, so we can be at both non-equilibrium site that we can use. And then we've got also repX we could use. So trying to get these both lined up. In the meantime, I'll use dummy protocol, but that won't include much of the setting stuff which I think will be a key factor for users, right? I need to define their settings. Mike, could I also meet up with you sometime this week to work on this and show like what is the user experience for like setting up settings?
@28:51 - Mike Henry (he/him) Yes, I think that we'll need to get it wired up into a protocol first because right now I don't think we've integrated it. into like that's actually used in any of the protocols. But once we do that, then yes, it makes sense to show users how to create that object and then pass it into a protocol.
@29:22 - David Dotson Yeah, so and I think I've on a meeting later today with the phone on non-equilibrium cycling at 3.30 pm A.C. would you be willing to join or are you?
@29:44 - Mike Henry (he/him) One second, let me check.
@29:46 - David Dotson Okay, thank you. Yeah, cuz that dovetails into that. So if we can get settings, I'll plug into non-equilibrium cycling code.
@29:57 - Mike Henry (he/him) That should work.
@29:58 - David Dotson Just send me any back. Okay. Cool. Sorry, one second. We just shoot that off to you right now. Thank you. Thank you. Yeah, because if we can get settings, um, hashed out in non equilibrium cycling, then that'll clarify things for me in this in this tutorial as well, because users will have to like create that so they can pass it in. And I think we have an easy route for that because I think we have concept of default settings. So it should be. We want to have a fairly straightforward user experience for this is what I'm getting at. It's okay if it's rough for start, but I just want to know if it's rough. So any questions for me on this? You and I know this is a bit delayed. You asked for this last week.
@31:17 - Jenke Scheen (John Chodera) No question yet. I'll have a look at it. And I'll raise and thank you.
@31:25 - David Dotson Okay, I'll go ahead and shove in, like I said, dummy protocol for now, but it's just not a real outcome. It's not a real chemical protocol. So it's not going to give you real delta Gs of any kind. But it will show you how to build chemical networks and basic pattern. So you can see.
@31:40 - Jenke Scheen (John Chodera) Yeah, no, this is brilliant.
@31:42 - David Dotson Thank you. Thank you. And that's it for my active, active issues. Any questions for me? So we have some. Available cards as well. I'm actually seeking volunteers for these. So I've already got Mike, if you don't mind I've pressed gang to you into Dr. composed of women. Is this something you're interested in?
@32:13 - Mike Henry (he/him) That's the next evolution. Perfect. Yeah, so that's totally totally good.
@32:18 - David Dotson I think that'll be a fun one to get to put together some. It's basically you're building a full employment.
@32:25 - Mike Henry (he/him) I've only ever used Dr. composed for like building personal projects and like a whole plaques media stack stuff. So like this will be fun to use it in a more academic context.
@32:36 - David Dotson Thank you. Thank you for taking on. So, and I've also got the five version of my foot recycling that's parked until we get some of these other things resolved. The first one I'd like to get a volunteer for is adding scope enforcement for our APIs so we can talk a little bit about how this works so that there's There are the concept of scope is kind of core to Fafa alchemy. So it's, let me just, let's up for folks. So these are scope and scope keys are defined at the top level models module. And scope defines basically the boundaries within the system. So it's a triple of org campaign and project. And functionally this matters for deduplication. So if you are submitting an alchemical network to say open Faf. Sage benchmarking, you know, and then you have a project that's just. A little project within that campaign. You could give it whatever name you want. If you submit an alchemical network with that. With Faf. scope and then later you make another alchemical network and submit to the same scope. Any common elements between those networks will be deduplicated so you're not storing say the same chemical systems over and over again the same transformations over and over again. That allows you to to basically expand say take an initial network and then make a bigger network up from it, submit that, you don't have to redo all that work you can take advantage of the existing work already done including the computations and results. If you submit an alchemical network in a different scope then it isn't deduplicated across scopes so this is all intended to be the boundary between deduplication. It should also be a permissions boundary so we're designing the system to handle a multi-tenant right so in this this particular deployment we're targeting we want it to support both ASAP discovery and OpenFF and so and potentially an OpenF fee. So that's three different organizations all using the same deployed system. That should be fine, but we do need these things to have permission boundaries attached. So a user who has, and I'll show you the user identities here, it's touches a few points in the library. If we go into file, can we security? And we look at the security models here. We have a couple of different identity types. We've got user identities, and we've got compute identities. And each of these have scopes. So scope is a list of these string-ified triples, or campaign projects separated by dashes. Those scopes, so given users define, let's say we have a user named Yanka, that's going to find in the system. And by creating the user, I added a set of these triples to its list of scopes. And that means if Yenka, using the client, as if he's on his machine, he's using the FileKenny client, and he tries to submit an all-chemical network to a scope that's not in that list, what it should do is not let him do that. Or if he tries to view an all-chemical network, let's say, in outside of his scopes, you shouldn't be able to do that. This is our basic security boundaries. This is going to be important for ASAP in particular, because if we want to run jobs through the system, that the molecules are a bit sensitive, as in they would jeopardize the patent position for ASAP in fulfilling its mission, it's critical that not just any user who has a access to the system can have access to literally everything. Also, it's important that users can't just walk all over each other in terms of orgs and things like that. So this issue is about enforcing the most basic boundaries within the system. Do I have anybody interested in taking this on?
@37:20 - John Chodera (he/him/his) One quick question, David. Is there a way to also mark data as publicly readable? Because I think that will also help us a lot in the early stages to feel that at least publicly access some of the data without having to worry too much about scope before everything gets locked down.
@37:40 - David Dotson So I'm taking the approach of start from a position of locked and then open things gradually.
@37:49 - John Chodera (he/him/his) Because I think that a whole other question is difficult.
@37:52 - David Dotson OK. We don't have a concept of public yet. So in our APIs, I talk to... So once in about this last week, we do, we don't expose very much through the APIs without requiring a credential. So, and I think that that's a good security posture to take.
@38:15 - John Chodera (he/him/his) Got it.
@38:16 - David Dotson I'm just doing. But you said this is something you'd like. You'd like to basically be able to mark, have some sort of mechanism to mark things as you don't need a credential at all to read.
@38:26 - John Chodera (he/him/his) That it might be useful is what I was thinking, especially if we have a lot of stuff and many people interacting where we don't need the security initially. But maybe it's just more hassle than it's worth.
@38:43 - David Dotson If you can articulate a story for this, I think it would be valuable because I just want to get really understand like at what level, what things would you like to be able for users to be able to pull, right?
@38:56 - John Chodera (he/him/his) I'm just thinking about things like, you know, we would. We will have a mechanism to make some data public as time goes on for ASAP. We can manage all of that separately. But then a lot of other stuff we do retrospectively will all be public from the outset. And someone might want to vacuum off all of the results that contain information about, here's the transformation, here's how difficult the transformation was. Can I do some machine learning on the atom mappings, for example, to understand how to optimize that? But they would want to work with the public data. Here everything is locked down and we have to go through separate steps to disclose all that data. That's totally reasonable we can do that. It just means that we're not going to interact directly with the system to pull that data out.
@39:44 - David Dotson Yeah, so one reason to also operate from a position of credential users only is so that you don't get abuse of the system, right? So you don't get people who are hammering it. I mean, we can also put rate limits in and things like that. But it becomes much easier to rate limits. when you know who you're talking to, suppose just relying on IPs and things like that. Okay, I think it'd be best if you could articulate this in a story. I'm not trying to pass you off, I'm just trying to say, if we can get that down, then we can think about possible solutions to this, to get some public, what are the public API points that should be available, if any. Would that design work for you, John? Hello?
@40:40 - Jeffrey Wagner We may have lost him.
@40:43 - David Dotson Oh, he's not here anymore. Okay, go ahead.
@40:45 - Jeffrey Wagner If I could check actually, Yumka, is this a big need for ASAP to have private datasets? Like, is it more a technical decision of like, oh man, it would be really annoying to get DDOS or something, or is it a. actually like, oh yeah, no, we need to keep this like nice and secure because we might add into something like that.
@41:08 - Jenke Scheen (John Chodera) Yeah, this is definitely the second. We already are being briefed on confidentiality of certain compounds. So we wouldn't want that public.
@41:23 - David Dotson Yeah, and the reason for this is that ASAP in order for it to secure equitable access to the compounds it creates, it can't put things into the public domain for them to either get patented or by other more predatory patenters. So it's a defensive patent position to make sure that ASAP can actually pursue its mission, which means that certain compounds need to be, we can't have them exposed, right? If we know that there's a way to, for folks that aren't part of the organization to get access to these then. that presents a huge risk to the whole effort.
@42:03 - John Chodera (he/him/his) Sorry, I like change settings just to clarify this a bit further. So the second generation of our open science approach allows us to be more open and control the licensing better if we end up with a single compound patent that we can use to make sure that manufacturers and people who feel the development don't use market exclusivity to block other people from manufacturing drugs during an emergent pandemic. So in the end of this whole process, we'll end up with a patent with a single compound in it rather than a genus, which means that we have a delayed release policy where every quarter we sweep out a bunch of data from the late stage, all the early stage gets disclosed immediately. So it only applies to a certain amount of our work. But in order to do this, we just need to avoid basically blasting things onto a web page immediately until the release process has occurred. There's just certain things that need to be done to protect confidentiality and prevent it from being qualified as accounting. as a public disclosure for some of this data, but then it will all eventually make it up online.
@43:05 - Jeffrey Wagner Okay. My understanding of security stuff is that far my companies are really skittish about using any cloud services or letting their data outside of their walls at all because and the few places I can do this, like Schrodinger and OpenEye have invested a ton in like, expecting out their security.
@43:31 - John Chodera (he/him/his) It's not necessary. In particular, they need people to stew for damages in case somebody else or they lose the patent position or somebody else gets a wind of something and beats them to market. Here the only thing we're trying to do is ensure global equitable access by ending up with a single compound patent, which means we just need to take some minimal steps to preserve the, to don't accidentally do something that counts as a public disclosure, which is a limited list of things that includes like. posting it on a webpage or giving it a public talk with the structure. Those are the things that we have to avoid here. We just need the minimal justifiable steps to prevent that from qualifying as a public disclosure.
@44:14 - David Dotson I want to avoid a case where we issue a user identity on the system, and then that user decides, I'm just going to download everything because I can, not because they're doing anything nefarious, just because they're curious. And so doing maybe make a git commit to a repo and push it somewhere, even though it may be obscure. Again, I want to avoid a case like that if you understand me, John.
@44:42 - John Chodera (he/him/his) So, yeah, it's just avoiding the accidentally gets on a webpage kind of thing, right?
@44:47 - David Dotson That sort of thing, right? You know, because that can be disastrous. And in any case, having this, this issue just articulates the basic scope boundary. There, there's additional. layers we want to add on this. This is just the most basic of the layers, which is to say you as a user can only access things that you have been the scopes that you've been assigned up front. We'll add flexibility to this. So let me just show you, this isn't the end of the story for this just to be clear. So if I look at security labels one second. Sorry folks, I know this is a bit boring for some people but so enforcement is our first task for release 0.2. I want to add scope rules for hierarchy to enforce that orgs or above campaigns about projects. We also then want to add more fine grain of scope read right into main permissions that comes later. Like but at the very least we need the boundaries to mean something and that's what this first issue is for. So I'm looking for a volunteer who wants to focus on this for now. That doesn't mean they have to absolutely solve. everything they're not going to be on their own. I'm happy to help and provide pointers, but I would like someone to take this on. Once going twice. Okay.
@46:24 - Levi Naden David, this might actually be something that I might be able to help with. But that's something we can talk about offline, possibly. So I'm not sure how much I'd be able to help with this and how detailed it might be. So this is a talk about this afterwards, but this seems like something that is a separate thing. I might be able to assist and get an upping going.
@46:46 - David Dotson That was the idea. Yeah, I had you online for this actually. So if it's of interest enough to you, I think it does touch a few different places in the system, but I think it would be a good way to get familiar with how things are structured and how the components are structured. So if you have a broader interest in Falacme going forward, then I think this would be a good entry point.
@47:10 - Levi Naden Yeah, possibly. Yeah, but let's talk after this at some point.
@47:15 - David Dotson OK, OK, sounds good.
@47:19 - Levi Naden Just make sure it's not going to put me out of scope with what my other duties for other things as well.
@47:24 - David Dotson We're interested. Second thing is object store support for APIs. So in addition to the state store, that's here. This is just for the. Now we're coming down to the wire here, but. In addition to the state store, this is this is the state of the system. And that's handled with Neo4j. We do have a concept of an object store and that needs to hold on to file results. Right. So this could be trajectory data. This could be snapshots. This could be EDR files and Gromax. This could be all kinds of stuff like that. You wouldn't store that. in a graph database, you'd store that in an object store like S3. You could also store it in a file system. You could use that as your object store in more of an abstract sense. But we need, initial support for our APIs for file upload. So in particular for the compute API, it needs to have the ability to take a file upload from a compute service. We can decide on what the policy for the paths and protocol tag results are actually uploaded. We also want a mechanism for pulling result files if they're available down through the user client. So just illustrate. The compute API pushes things into the object store, the user facing file company API pulls things out of it. And that's pretty much one directional. in both cases. So you don't really have uploads from users to the object store. There really shouldn't be. You don't really have downloads via the Compute API from the object store because that gets really inefficient real fast. If compute services have to constantly pull big blobs down. So that affords us some optimizations there. Is anyone interested in taking this one on this one's a bit more complex but I'm happy to be your spirit guide on this as well. Okay I believe it's it for now and thank you.
@49:50 - John Chodera (he/him/his) And apologies for not having more support on this yet. We'll get Hugo at 50% starting in January.
@49:55 - David Dotson Okay great. I mean I'll probably take it on once I clear some of these other things because it needs to happen. Not absolutely critical for getting basic results through, but if we want to be able to introspect files downstream. And I know that RepX in particular, I think needs those files. Correct me if I'm wrong, Richard.
@50:14 - richard Sorry, we trust the NC files.
@50:17 - David Dotson Yeah, does it need the labs basically in order to do its thing on the client side, user side?
@50:23 - richard Yeah.
@50:23 - David Dotson Okay, yeah, see, that's what I'm thinking about, okay. Yeah, so it is a critical component, but it's critical, but not crucial for initial functionality, for full feature functionality it is. Okay, thank you, John. If any of you decide you do want to take something on, feel free to let me know. And like I said, I'm happy to be your guide on this. Any questions on current state where we're at and where we're going? Okay. The board represents current state of the whole whole project, the whole MVP project across multiple repos. If you, if you're trying to understand where we are with the milestone. And then the milestone has just what's needed to make file can be ready for deployment. So I'm trying to populate this with the only things that are critically needed to deploy. So, anyway, just giving folks reference points here. We've got about 10 minutes left. Yvonne, did you want to go into more detail on non equilibrium cycling.
@51:44 - Jeffrey Wagner I think we need to figure out a new deadline.
@51:48 - David Dotson Oh, thank you. Sorry, I completely walked right past this. I'm going to leave it to my approvers to decide what an incorporated line is. I know we've blown past several at this point. And so you can chop my head off if you'd like. This is your opportunity, but I would like to get your input on what is an appropriate deadline going forward.
@52:18 - John Chodera (he/him/his) Our practical deadline before things shut down for the holidays is the 16th. And so we're going to lose a ton of momentum if we don't get a working state by the 16th of December. So is it. Is that feasible for folks to really try to make something that's at least the MVP by the 16th of December? Or should we aim for after the holidays? I think these are two choices.
@52:46 - Jeffrey Wagner So December 16 would be two weeks from now.
@52:48 - John Chodera (he/him/his) So the other realistic date is probably something like January 13th.
@53:00 - Jeffrey Wagner Yeah, from our side with the UC strike, I think our protein force field is now two weeks delayed and probably going to be three weeks delayed because Chapin is the primary driver on that. So yeah, some slippage there is okay.
@53:33 - David Dotson John, I know this is critical for ASAP. That's been the biggest fire under me here. So, and of course, as we're working on this, it means that we're also delayed on working on other components within the ASAP infrastructure. So this is the biggest blocker.
@53:53 - John Chodera (he/him/his) I'm going to say it would be given what's going on and the things are going to get slower, the holidays, I'd say. We're inclined towards January 13th. But it's only we have our local pipelines. We can run until then. And there's still some stuff we need to figure out on the setup, automated setup. Yeah, I'm not sure. I don't know what to say in terms of which one is better. I'm also acutely aware that good for folks to have an actual holiday break, because sprinting forever is not possible. So from that perspective, I would be keen to say December 16th, and then we'll lay down our tools for a little while if folks are going to and have an actual bit of a break.
@54:49 - David Dotson Yeah, I would like to, you know, I know everyone in this call is putting a lot of effort in and you all have other responsibilities as well. But if I could get your attention for the next couple of weeks in particular. on the things that you're already working on, then I think that that would help us immensely. So I think we can probably set the goal of clearing out a lot of the in-progress, what's in the Zimprogress column in the next two weeks. Would that be reasonable for folks? Okay, I'm seeing none. So yeah, so I'd say let's make that our goal. So December 16th, if you've got something in progress, really aim to get it knocked out. Yeah, as sooner is better, of course, because there's always unforeseen issues that can crop up. So but we'd like to clear what's currently in play by then, and that would then clear the board for the next round of things. So we can probably get, we can probably work with quite a bit of what's already in place if we've cleared the in-progress board, right? It's not an all or nothing proposition of. being able to use Being able to deploy without that Without or without anything beyond that that makes any sense So that's the proposal to the approvers December 16th aiming to clear out current set of in-progress Items so we've got I can also try to record just whatever Issue numbers are here because I might add new things over the next couple weeks, but I don't want that to alarm anyone Since all of our other deadlines have been Tuesdays and we can review them at this meeting maybe we should pick 20 or 13 I would go for 20 any objection to that We're still planning to build a meeting on the 20th Maybe I should ask I think my plan is to be here Richard John Do you guys plan to be in on the
@57:01 - John Chodera (he/him/his) 20. I can do it. I know Yvonne is going to be in Columbia. And yeah, our unofficial shutdown is starting the 19th, essentially. So.
@57:14 - Jeffrey Wagner Okay, then let's do the 13th. Okay, that's good for everyone.
@57:24 - David Dotson So that puts us says two weeks from today. So. So we have a two weeks sprint focused on the in progress items.
@57:39 - Jeffrey Wagner Right. I see Richard nodded John. Does that sound good to you? Okay. Good. I mean.
@57:46 - David Dotson Thank you everyone. I know this is a bit of a pressure cooker. I know we've got a holiday coming. So let's keep that in mind that we would be do plan to take a break during the Christmas and New Year holiday. So. But if you give me your time. I'm happy to take it. I'm also here to help you as much as possible. So reach out if there's anything you're stuck on, anything you'd like some additional eyes on, we can try to get more of that. I think we're about at time, and I think we've gone through. I know that Yvonne, we didn't get give you much attention today. But we do have our call at 3.30 with Mike. Is there anything you wanted to just bring up right now?
@58:30 - Iván Pulido (John Chodera) No, I think, yeah, all my questions should be easily addressable by you and Mike.
@58:40 - David Dotson Okay, cool. And you're fine. I know we talked about pretty link in benchmark things. Is there any last items on some of the other stuff you wanted to hit?
@58:49 - Irfan Alibay I don't know. I should be good.
@58:51 - David Dotson Okay. Thank you. With a minute left, any other topics? Okay, thank you all. Thanks for being on this journey with us. I think overall, I do want to say I think we've come a long way and we're in a good place, despite, you know, what it may feel like so. Just you're doing great work. So keep at it. And yeah, we'll be in touch.
@59:26 - Iván Pulido (John Chodera) Awesome. See you folks next week. Thank you.
@59:29 - David Dotson See you.
@59:33 - Diego Nolasco (Open Force Field Initiative) Hey, David, I think you're selling you. I think you're selling yourself too much short.
@59:42 - David Dotson Really?
@59:43 - Diego Nolasco (Open Force Field Initiative) Yeah, man. You're doing an awesome job. You are coordinating things really well. And you are helping people a lot. And you are also really interested in what people really need. So that's a very good job. You're being a fantastic driver to this project. So don't sell yourself short, man.
@01:00:08 - David Dotson I appreciate that. No, I'll admit that my stress level is pretty high. I took a break over Thanksgiving because I realized my wrists were hurting, my head felt like it was gonna explode. I just needed to rest. So I did a lot of sleeping over the last few days and tried to spend time with friends and things like that. So, but yeah, I do take it seriously trying to hit these deadlines. I don't wanna give the impression that every extension we ask for is if it's just sort of a, as if it's meaningless. I do want us to still be meeting our timelines because they're meaningful timelines, right? Anything that if we don't deliver this soon enough, then it delays a lot of other things that we're trying to do. So it's gonna. like the train sort of runs into itself, right?
@01:01:03 - Diego Nolasco (Open Force Field Initiative) And so... But so only so that you know, everybody has great respect for you and people know that you are really busy and Jeff told us that you would be out of some meetings a couple of weeks ago and the entire team said, you know, he's really focused and he knows what he's doing. So keep yourself going, work your way and believe it, there is no pressure over your shoulders. But yours.
@01:01:45 - David Dotson Yeah. I know, I know, I know, so I have to remind myself often that I do put more pressure on myself than I think my clients do, which is a good thing, but it sometimes means I'm distressing myself out. You know, that's okay. So... You're just entirely manageable.
@01:02:02 - Diego Nolasco (Open Force Field Initiative) I just want you to take care of yourself.
@01:02:04 - David Dotson Yeah, I appreciate that.
@01:02:06 - Diego Nolasco (Open Force Field Initiative) Nice.
@01:02:07 - David Dotson I leave. I just want to. Did you want to? Is there anything you wanted to hit?
@01:02:10 - Levi Naden Leave by? No, just briefly. I wanted to set up. When do you want to have a quick chat about stuff? Because I just want to make sure I don't over commit myself to something I may not be able to actually do.
@01:02:23 - David Dotson I understand. I know. I've got a lot of sense of that as well.
@01:02:26 - Levi Naden Yeah, because I've got a Thursday. My commit, like I'm going to be chatting with some people. And so my other duties might take precedent. And I don't know how much time I'll be able to commit. So I was actually just go over what it is you exactly need with this, especially since I'm unfamiliar with the total structure. Before I fully commit to saying yes, I can do this with an x, y, z, amount of time.
@01:02:50 - David Dotson I see. Okay. I guess I wanted to ask last week and I wanted to get a their answer. I'd like to align. whatever I ask you to do here with whatever I am asking of you with what your own interests are and not just technical interests, but also just like what is your what are your broader interests you know what is what's your motivation to.
@01:03:19 - Levi Naden To right so in this case it's an address skill diversity and basically bowl stack ecosystems. Because the science parts are at this point the bike the way of my path is going it's like I'm not going to be the most involved in the scientific side of things anymore just because other people are better at that than I am but I can't think of the science. It ain't the science that actually slows people down a lot of times it's the everything else. And so that's what I'm trying to get better at so things like doing things with security and better full stack integration and figuring things out and making all the infrastructure work.
@01:03:57 - David Dotson Okay, excellent. Okay. Yeah. Yeah, I can empathize. because I come from science as well, but I had my PhD work. I enjoyed building machinery to do the science more than I enjoyed doing the science itself. So and I realized I was actually better at the machines part than the science. So yeah, I think this project could be a good fit for that. Not just saying that, I think it literally is one is a good fit. If that's your interest. And so in terms of, you know, I know, so you're currently your software scientist at Molesy, right? So you think you've got obviously responsibilities to them. Do they give you any sort of discretion to be able to like, you know, you can work on what you want?
@01:04:48 - Levi Naden Yes, I do. I have that. I have a, I can, if I tap into it, I have a 20% time I can commit to this.
@01:04:54 - David Dotson Excellent. OK, cool.
@01:04:57 - Levi Naden But I don't want to, with what's going on, I want to make sure I. Don't over commit that time.
@01:05:02 - David Dotson Understood. Okay You know, I think You know, I'm under no illusions that you know if you if you start working on this issue that you know It's gonna be slow to get started and that's okay, you know because you're learning the codebase, but that's Also a feature of I think this particular issue That's why I sort of drew it up with you in mind because I thought it would be nice to um I'll see you Diego. Um, because I think it would work well as a Kind of an introductory issue. It's gonna force you to kind of look through the codebase Build a mental model of how how these components work, right? Kind of build your own map in your head and then and Then I think there's really only a couple places where you actually need to actually put down code, right? So the PR itself probably is not gonna be huge when it's all said and done Um, if that makes any sense. Yeah So if you want, we could start with just minimal commitment and say like, you know, for the next couple weeks, I'll have you assigned to it and take some cracks at it, ask questions. If you decide after the two weeks that. Hey, I don't have time for this because of course you're going to get into Christmas then anyway. Then just let me know, and I can we can always reassign. I planned to operate as the, as the, the backstop for anything that nobody takes on or, you know, if people get stuck. Because I do have an idea of how to do this thing it's just it's yet another thing to do. Right. So I don't.
@01:06:43 - Levi Naden Yeah. There's only one of me. So I get, I very much get that.
@01:06:48 - David Dotson And so, so this would be an opportunity for you to get familiar with code base, which then could open you up to to future opportunities to contributing to because then you'll have a clear idea of how it works. Yeah, all right sounds good. Okay, would you mind if I assign you to it and then yeah, go ahead and say okay, and then we'll be I'll chat with you.
@01:07:12 - Levi Naden Then this just supersedes the email I left in our inboxes from last week. Oh, you left an email I'm so sorry if I missed. Yeah, I shot an email that basically said, just a reminder to so it could just sit in our inboxes to laughter. After Thanksgiving.
@01:07:33 - David Dotson Oh, got you. I'm so sorry.
@01:07:38 - Levi Naden There wasn't an expectation of response to that.
@01:07:42 - David Dotson Yeah, so I had an action item for myself to put things together with you in mind this is one of them so the other phones out anyway so yeah, no worries. Cool. Thank you for taking that on. I'll go ahead and add that to the in progress. If you don't mind me doing so go for it. And. Yeah, go for it. And. Yeah, we'll be in touch then.
@01:08:02 - Levi Naden I'll let you get off your data.
@01:08:04 - David Dotson Cheers. Cheers. Thank you. |