Participants
Jenke Scheen
Levi Naden
Goals
JW + JC – : Sketching out planned submission operations, personnel-time requirements, submission prioritization:
What will actual submission operations will look like?
Will we provide submission support/debugging (and to whom)?
How will priority be assigned to submissions.
DD :
fah-alchemy
- current board status : fah-alchemy : Phase 1 - MVPfocused on 0.1.0 milestone, with outstanding issues required for deployment :
Github link macro link https://github.com/openforcefield/fah-alchemy/milestone/3
still driving for 11/29 deadline for MVP
Test suite coverage at 81%.
Now have authentication on both user and compute API, abstracted as separate FastAPI
APIRouter
sBoth user and compute clients feature same authentication machinery, different authentication entities
Have
ProtocolDAG
execution inSynchronousComputeService
; hitting issues with serialization ofProtocolUnitResult
outputs; would like help from David W.H. Swenson
DS :
fah-alchemy
CLI - current state and next stepsDD : help wanted - deployment issues : https://github.com/openforcefield/fah-alchemy/issues?q=is%3Aopen+is%3Aissue+label%3Adeployment
IP : Nonequilibrium Cycling Protocol (
perses
#1066) update:Github link macro link https://github.com/choderalab/perses/pull/1066
DD : requirements for using OpenFE REPEX protocol for initial testing alongside Perses NonEquilibriumCycling?
IA :
protein-ligand-benchmark
: blockers and prioritiesGithub link macro link https://github.com/openforcefield/protein-ligand-benchmark/issues/81 Adding missing thrombin entries and then good to go
Github link macro link https://github.com/openforcefield/protein-ligand-benchmark/issues/68 Waiting on #81
Github link macro link https://github.com/openforcefield/protein-ligand-benchmark/issues/69 Waiting on one of the networks to return (PR is up)
Github link macro link https://github.com/openforcefield/protein-ligand-benchmark/issues/77
Discussion topics
Item | Notes | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
JW + JC : Sketching out planned submission operations, personnel-time requirements, submission prioritization: |
| ||||||||||||
DD : |
| ||||||||||||
|
| ||||||||||||
DD : help wanted - deployment issues : | https://github.com/openforcefield/fah-alchemy/issues?q=is%3Aopen+is%3Aissue+label%3Adeployment
| ||||||||||||
IP : Nonequilibrium Cycling Protocol ( |
| ||||||||||||
DD : requirements for using OpenFE REPEX protocol for initial testing alongside Perses NonEquilibriumCycling? |
| ||||||||||||
IA : |
| ||||||||||||
Transcripts | F@H interface meeting - November 22 @01:54 - David Dotson Okay, thanks folks. Can everyone hear me? Yep. Yep. Sweet. Thank you. Okay, I'll drop the meeting notes link in the chat. Jeff, would you mind taking notes? @02:09 - Jeffrey Wagner Sure. @02:10 - David Dotson Diego, I think you have. Do you have fathom running? You do. Okay, good. Thank you all. We'll go ahead and show in the agendas on the left before we. Get into the first item. Is there anything folks want to add? Or modify. Okay, if not, go ahead. So Jeff and John. This is your item. This is sketching out what submission operations look like. Personal prioritization. Do you want to lay it out? @02:51 - Jeffrey Wagner Yeah, sure. I can. I can open that. This is something we kind of want to figure out is for the falling at home submission pipeline. I think we had sort of been planning on something like the current QC submission, QC data set submission style repo. And so I don't know if people have seen that before, but it's a repository where what's really happening is, you know, there's a live server at MULSE and it's hosting a database and we have a token so that we can access it and write to the database. And by writing to that database, we're putting new jobs in the queue for computation on like a variety of compute workers. And so the submission side is handled using GitHub pull requests. And those are, you know, the person, each new data set, there's a, there's sort of like a compressed file that contains the actual submission and then any sort of metadata like the submitter, the purpose, the details of how the data set was prepared is included in the policy. pull requests. So there's like a Jupyter Notebook and some other files like structure files, text files, workload files that specify exactly what was done to prepare the data set. So this Pure gets opened and then somebody who maintains the QC data set submission repository reviews the submission and says, okay, you know, is this detailed enough? Is it a formatting good and we have some CI that checks the formatting as well? You know, in order do we have any objections and what priority is it's going to run at? So we've got a bunch of different data sets. What priority will go in? So what I wanted to figure out here is who will be doing that kind of work? I think we anticipated for the following at home site or like the OpenFF side, we anticipate it being David for a little while while he worked out sort of the submission automation and the good practices. But then for OpenFF we were planning on having our reviewer be a newly entered grad student in the Mobley Lab who has some experience with simulations and the file formats and the data formats that we need. But what came up the other day was that we're not sure if we're gonna have multiple people submitting jobs to fully get home with different credentials, one for each organization or if all of our organizations are gonna share one submission credential. And if so, where's the person time gonna come from to be the singular reviewer? And in cases of dispute, like if we somehow do have some shortage of time or if we only have, if we have to order the data sets, like if we have to prioritize them, how are we gonna decide that? So that's the intro to the question. Does anybody have any, is that enough context for people to understand what we're looking to solve here? @06:01 - John Chodera (he/him/his) You've described the, sort of how it works right now for QC fractal, but the requirements that I think started with the new when you enumerated a while ago is that you mainly intend to benchmark either release candidates of worst fields and actual force fields. It against the protein ligand benchmark set is that right. @06:20 - Jeffrey Wagner And the, yeah, or are there other experiments that you envision doing outside of the protein ligand benchmark set. I think it would be reasonable to say that yeah the stuff that we want to benchmark against would just be the entire or the entire protein ligand benchmark set or a subset of the protein. @06:44 - John Chodera (he/him/his) And it's not only against various either, you know, builds of the force field that are released candidates or experiments or actual releases. And then the goal of that repository is to able to provide automation for the submission and to provide providence. and metadata associated with that. Is that appropriate? @07:05 - Jeffrey Wagner Yes. @07:06 - John Chodera (he/him/his) Cool. So I mean, that does significantly limit the scope, which is great. And it seems like it should be fairly straightforward to engineer that, but it requires a bit of engineering that has to happen after we debug and test and shake down the whole infrastructure. And your nearest term deliverable, as I understand it, is just a benchmark of the Rosemary build, is that right? @07:29 - Jeffrey Wagner Yeah, benchmarks of Rosemary against Sage. Yeah, Rosemary against Sage against GAF, I believe. @07:36 - John Chodera (he/him/his) Against GAF. OK. So would it be possible to say, OK, we can develop a roadmap for how to get to that automation and repository organization that would allow you to do all of that. But the current near-term deliverable is just the infrastructure for running these is ready, but we haven't automated any of these. We haven't automated any of the setup infrastructure, so we can do that through the same. infrastructure that we're going to be using, like basically a few one-off scripts to submit and run some calculations for the code moonshot slash ASAP stuff to begin with. So we can at least get going and testing and shaking everything down initially. But then we have to identify like who's going to build the infrastructure. And I presume that we'll end up in Dotsn's queue for building that as well as managing that. But for right now we can get started by essentially Gankit and I and a couple of other folks working on initial submission scripts that would do what you need to do because we need to do the same thing. @08:41 - Jeffrey Wagner Yeah, I think that answer is part of it. I mean, so if OpenFF has its special interface where all we change is the force field. But then yeah, like ASAP is submitting a bunch of stuff and and folding, you know, we're throttled to some extent and only some of it can go through full. at home at once? How will we resolve priority conflicts on that? @09:05 - John Chodera (he/him/his) I think, you know, we sort of need to fair share at the moment. Is that all right? Where we could say maybe get equal, I mean, then if we use a single work server, we get one slice, one uniform slice of falling at home. And I think we're intending to share a single work server to begin with at least until we realize that we've had grown it and have the resourcing to split that off into separate servers. But, you know, our calculations will be very bursty and your calculations will be like a lot more throughput for this matrix. And so we, and I'm sure we can just find a way to prioritize it where, you know, we can alternate essentially because we'll have very bursty workloads that much of the other throughput can go through. And we're talking about calculations that should complete in under a week. So it's not like there's a month of work to be done here. We think that it'll be on the week time scale to complete. each batch of calculations. @10:04 - David Dotson Although that being said, John, I think, ASAPs use of, so ASAPs deliverables are more time sensitive, probably, right? But for running things through the system, then we don't want them sitting behind a whole bunch of other stuff. @10:16 - John Chodera (he/him/his) So... We do have very tight timelines, right? So I would, you know, if possible, we would like to prioritize the ASAP work, but, and then backfill with the, but for field work, if there's not a huge time deadline, but of course, if there is a time deadline, like a advisory meeting coming up, then of course we can switch that or easily split equal allocation. @10:40 - David Dotson So we can do this with weights. At the moment, we have weights implemented at the chemical network level, but... Volcomy has a concept of scope, and scope is a triple of org campaign project. We don't have a concept of weights on those yet, but we could implement such a thing. So it's... kind of like weights all the way down, which might be a good system of raw, because it just makes it clear that you're doing this, you're doing a choice selection based on weight at the top and then good. And this is what the compute service is doing. So it's choosing based on right away and going all the way down there. @11:19 - John Chodera (he/him/his) And that way you get I believe you also have priority in there as well. @11:22 - David Dotson I wanted to bring this up. I don't have it on the agenda, but it is a I can do this offline as well. Okay, but this is something for I mentioned this to Jeff yesterday, or two days ago, yesterday, of dropping task queues and switching to the concept of hubs instead. So you can read this if you want, I'd like some feedback on this issue, because it would simplify our instead of having cues as linked lists as the topology and how these things are implemented. It would simplify to just we have concept of task hubs and then tasks are associated with that. And a given of chemical network has a single task hub. That's how you action tasks to be executed for that network. And so, and that would fit well into that waiting scheme that you wanted. So you wanted, you had this idea of, should we, because we have weights on task, cues at the moment, and then a task queue is a queue. It might just be better to do, there's weights on task hubs. And then subscribe tasks also have weights attached via their relationship. And so it's just weights all the way down makes it real simple as a scheme. @12:41 - John Chodera (he/him/his) So given the ability to do something like that, then, I mean, until we realized that there's a con, or what if until we recognize that there's a real conflict in terms of priorities and throughput that we say, you know, we just have kind of equal allocation between ASAP and open force field. And within each one of those, we can do weighting on. projects in a way that makes sense in terms of priorities. And then if there's a conflict where we're not getting enough throughput, then we can address the prioritization issue about which tasks have priority when we need to. @13:18 - Jeffrey Wagner Okay, yeah, that will work for me. I think, so yeah, I would be fine with not worrying about that until it does become an issue. And then the other question is for the manual work involved, I think both part organizations will probably hit a point where we submit something and open force fields like, oh, we really need this force field out or ASAP. So like, oh, we really need this data set, but there's some particular technical problem that needs to be debugged. You know, this is something right now that de facto in open force fields experience de facto has fallen to the QCA data set, the submission people. They. just end up providing free debugging support and sometimes that extends into days or weeks. I would like to make sure that the spending of those days and weeks is, you know, we know, we know where it's going to come from and we've agreed on maybe how much time we can, we can maximally spend or something like that. You know, so it could be each organization debugs around stuff or it could be that we share like one pool of submitters who gain the expertise but also are responsible for debugging stuff. Do you have any initial thoughts on that? @14:45 - John Chodera (he/him/his) I don't know, David, do you have any thoughts about that? I mean, you've been involved with the KCC submit folks for a while. So, I don't think we... @15:00 - David Dotson We have a, yeah, I don't have a clear enough idea to be able to say, you know, here's exactly how we should do it at this time. I mean, we're obviously going to have to have an owner who's. Is responsible for getting things in and building up that infrastructure around a repo for provenance and like what went in for open FF. Like I said, I can be that point person for the initially. No, I want to. I want to offboard that to somebody else. @15:28 - John Chodera (he/him/his) And we have Hugo incoming that can take up that responsibility on the infrastructure side and on the submission side. We have technically Yinka is designated with that responsibility for us. What about for open forest field. @15:42 - David Dotson So I think it's a person that Jeff mentioned. @15:45 - John Chodera (he/him/his) The graduate student who's maybe joining the lab in April or will be joining the lab in April. @15:52 - Jeffrey Wagner Yeah, I think highly likely in April. So initially it would be initially it would be David Dotson. And then in April, we transition over to. to Megan. Yeah. And David, is that very simple to you or is that not? @16:05 - David Dotson Yeah, no, I think that sounds reasonable. I mean, like I said, I think it'll be important to dog food this for a little while anyway. And so that way by the time they come on board, we have a pretty well established process that they can just adopt instead of having to also engineer it, which I don't think there will be in the position to engineer. They're a fresh grad student. So yeah, I think this will work out fine. @16:32 - Jeffrey Wagner Okay, cool. @16:33 - John Chodera (he/him/his) Again, we're bringing on Hugo starting in January at 50%. So he should be able to help both projects from the infrastructure side. @16:49 - Jeffrey Wagner Okay, that's cool. Yeah, I just want to I'm in a thing for just one second about how much we want to prepare for this. But yeah, if we end up. getting one of these multi-week monsters and suddenly an organization falls behind on their roadmap because they're missing a person fixing the multi-week monster. I want to make sure we've prepared for that or like we have like an off switch. @17:12 - David Dotson Yeah, we can always reset weights. That's the concept is that if we need to reweight things or up ASAP's weights on their various things are computing, then that would make sure that they get more resource allocation. @17:26 - Jeffrey Wagner That would be the those who are in house for that. I mean like a multi-week debugging monster before it even hits before it even gets to computing. @17:33 - David Dotson Oh, oh, I see. Got you. Okay. I think over time we'll have we'll be able to, just like we have with QC data, that's a mission kind of a cornered of folks who know the landscape well enough, right? And have built up experience with building about chemical networks and making sure things are working. We can also build up like we've. done with QCIDIS and submission. We can build up validation machinery so that it checks things on input, right? There's a lot of machinery we can build that helps us out here that reduces the review burden. Does that make sense? @18:23 - Jeffrey Wagner Yeah. Okay. So it sounds like the plan is kind of no plan, but maybe that's okay until we know the shape of these problems better. There are three organizations that have to benchmark the protein ligand benchmark data set. @18:48 - John Chodera (he/him/his) So it does behoove us to coordinate that at least for open force field 2.0.0. Yes. page. So, you know, if we have to run that once, then we can use those scripts in the short term and then designate someone with the bandwidth to become the infrastructure developer for this Providence repository. And that'll still probably be David once the bugs are checking out, right? But yeah, but I think you're right, like the ongoing maintenance burden will have to have someone whose real responsibility it is to help debug submission issues. And we'll be more bandwidth than just David at that point. @19:36 - Jeffrey Wagner Yeah, I wonder if we can have a pool where we either put people in or money in and it could be like a quarter of a developer per year. And that responsibility would be on somebody to be like the primary submission debugger, but for, you know, about a quarter. of their time, not too much more, not too much less. Richard, does this sound like something that you would want to participate in or are you guys pretty sure that you won't be using the folding at homestack? @20:17 - Richard Gowers I think we can't use the folding at homestack because the degree of freedom we're interested in changing the benchmarking is the method and that's the thing which is set in stone for folding at homestack, I understand it right. @20:32 - John Chodera (he/him/his) It's not set in stone but it does require some some development in order to use some of the features, some of the methods that you might want. For example, if you're using GrowMax for stock for energy calculations then all of those capabilities are already in GrowMax and you just need to write out a tpr file and write it on folding at home, right? So that anything that you want to do with GrowMax is already handled for using OpenMM, there might need to be some C++ code development done to do, for example, exchange, but it's relatively minimal. And Hugo will be taking up the C++ core development. So it would be very reasonable to do that. The non-glare-recycling, for example, already exists because it's just an integrator. @21:16 - Richard Gowers Yeah, that would then mean I have to write every method twice, right? Because it had to write a folding at home one, then a non-folding at home one, because the one that... @21:25 - John Chodera (he/him/his) Yeah, but that was your decision by having protocol to entirely encapsulate things and not be separated from engine. @21:30 - Richard Gowers So that's your choice. Yeah, sure. So maybe Gromax might work, but I'll have to check how Gromax compiles for the folding at home core version, but it just doesn't sound like it's going to be useful for us for testing that. @21:44 - John Chodera (he/him/his) It's just Gromax. It's Gromax that takes it to TPR for now. @21:48 - Richard Gowers I know there's a load of if-defs in there when it compiles with the folding at home, and I'm not sure what those if-defs do in order that Gromax did other things. @22:00 - David Dotson So, hey. John, I know that the cores, I think, is the Gromix core also a private repo? @22:09 - John Chodera (he/him/his) It is, but there's a, yeah, this is just because of the wrappers that need to happen. But I mean, Eric Lindel was leading the development of this for a while. And I think there's now a new core maintainer that it brought on. Secret will know who that is. @22:25 - David Dotson I'm just trying to look for an opportunity to Richard, if you're still interested, you know, like to at least give you some way to engage. @22:36 - Richard Gowers Yeah, I'd like to talk to a secret. I think maybe you need to go do an audit of what Gromix is capable of doing on falling at home and seeing if that is enough for us. But I can't make a decision on that today. @22:51 - David Dotson An alternative as well as as we've been, as we talked about previously, where. The Folding at Home Compute Service is just one of the compute services that was compatible with this system. And so we also intend to be able to run on clusters like LILAC, Pacific Research Platform elsewhere, like that should be doable. So in a similar way that we're handling with QC fractal. We can throw workers on traditional compute to run traditional protocols. So that could give us some capacity as well to meet some of your benchmarking needs. @23:39 - Richard Gowers Okay, I think the best you're going to get out of me today is a mate. @23:42 - David Dotson Like I'm going to look into. No, I'm just giving options. @23:47 - John Chodera (he/him/his) And aside from the option of particular engine, there was still the interest potentially in developing better network planning strategies where running at scale could be very useful. To generate all versus all. @24:01 - Richard Gowers Yeah, yeah, that's more of a fun project rather than something we're actually meant to be doing. That's something more I wanted to make sure it was capable of one day, but yeah, it's not a thing I'm even meant to be doing this year. @24:24 - David Dotson Thank you, Jeff. Do you have what you need on this discussion? @24:27 - Jeffrey Wagner Yeah, I think I'll say. At least through. I want to make a guarantee. And I think that guarantee is something like open force field will pay for like 10 developer hours per week. Towards this shared infrastructure for the first half of 2023. So that will be dated up until April and then it will transition over to Megan service. quality with Megan will probably be a bit lower while she's getting up the speed. John, are you comfortable making a similar commitment with something like Hugo's time or like can you commit to like 10 hours a week of someone's time as well? @25:15 - John Chodera (he/him/his) Absolutely. But we mean, I mean, we need to make this shit work. So we will likely spend as much time as we need from whatever resourcing we, we have to make it work because we need it. Okay. @25:27 - Jeffrey Wagner Yeah. Yeah. Yeah. I'm fine if it, I think initially we can expect it to exceed that a bit. I just want to make sure like somewhere down the road, it's just not your team doing all the workers, not all our team doing all the work for the other team submissions or something. I want to make sure that there's like a roughly equal split here. @25:44 - John Chodera (he/him/his) As long as the scope doesn't diverge too much, I think I would be happy with that outcome. If we were doing all the work to make it work because the, you know, we need to make this infrastructure reliable enough to robust enough to support ASAP needs. So that means that it should support. or your needs to as long as they're not totally at a scope. @26:05 - Jeffrey Wagner Okay. All right. Yeah, today I'm going to commit to 10 hours a week of open FF time for the first half of 2023 and then we can revisit it as we see kind of how the workload shakes out later. Thank you. And I'll record John as having said, yes, probably more than that. @26:36 - David Dotson Okay, thank you. Any questions for Jeff? Okay, thank you. We'll move on to board status. So, we've currently got David Swenson working on CLI components. So thank you for that, David. We got your first PR in our just last week. I'm working on the synchronous compute service. I know Avon is working on monically room cycling. And then also I'm working on the client components. In the last week, we've got test coverage up to 81% up from, I think, the 70s. We also now have authentication on both user and compute APIs. This has been abstracted as a separate fast API router, so we're using the same machinery. Both user and compute clients also feature the same authentication machinery. They feature different authentication entities, so there's no way for a user credential to be used on a compute API and vice versa. And we also have testing in place to test that authentication is denied for. On credentials, we also have testing in place to show that the token machinery is working as expected. The J-… It's packed to jump up in a quick start. I also now have protocol DAG execution working actually in the synchronous compute service. But I'm hitting issues with serialization of protocol unit result outputs. David Swenson, I'd actually like to work with you today if possible on getting past this, because I think you've worked on some JSON encoding stuff. @28:24 - David Swenson Yeah, and we have all that that should be in Goofy anyway, so that should be available to you. It's pretty straightforward, I think, once you do it once. @28:32 - David Dotson Okay, excellent. Yeah, I figured I remember that you had been working on this, so I figured it was somewhere. So do you mind if we meet briefly after this call? @28:41 - David Swenson Yeah, that works. Perfect, thank you. @28:47 - David Dotson So that's my update. I mean, I could show you the nitty-gritty, but there's just plenty going on. So any questions for me? Okay. David, do you want to discuss CLI in current state and next steps? @29:09 - David Swenson Yeah, I mean, there's not too much to say. It's, you know, obviously the goal of the CLI is just to be a very simple wrapper so that, you know, there's very little logic in there and most of the logic is in the actual code. We have one thing in now that handles most the database stuff and I'm working on one that will launch the APIs. It works for launching the APIs, launching the services. What is, what I haven't done yet is testing. I was having a little bit of trouble getting, I was trying to still do it in the single process which is a little bit of a mess and I think I'm not going to be able to do that but that means that error messages that come out of it will be a little bit messier. So the testing parts will be of a challenge still and I also need to add in some more parameters so that you can have as options to the. CLI things like your authentication. So it doesn't have to come from an environment variable. Right now it has to come from an environment variable. @30:08 - David Dotson Okay, got you. Yeah, it may make sense to get that as just as it has to because I think it's better as a practice to set these as environment variables, because if you set them as command line arguments and they show up in the process list. So it makes it makes it possible for an attacker who's landed on the host to then pull credentials out of just PS. @30:27 - David Swenson So, fair enough, but I guess the question is, are there any other parameters other than just sort of the launch parameters, which is the host and and port and number of workers that you're going to want to. Are there any other parameters that you would like exposed up to command line level? @30:45 - David Dotson You might want to add one option. I can also put this here. This would be the JWT security. Okay. To be generated on startup. This is fine if you're running just a single API instance. It's not fine if you're running many because you don't want them all to have different secret keys. You'll want them all to have the same one. But it would make it so that the user doesn't have to do an open SSL, ran, whatever, and then stick it in there. It removes one step. That's the only thing I can think of, though. I think otherwise you've got it covered. @31:30 - David Swenson Yeah. And that's, so again, I've launched them locally and then seen it works in a browser at least. I get the returns that I expect out of the API. So it's working. It's just that you need to testing its left. @32:00 - David Dotson Cool, thank you. Does that make sense? Yeah, yeah. Okay, cool. Now, thank you for taking this on. Is there anything else you wanted to air in this forum? @32:15 - David Swenson Nope. Have a look. @32:17 - David Dotson Awesome. Are there any questions for David? @32:27 - John Chodera (he/him/his) Will there be documentation for the launch API somewhere that will be easily accessible? @32:32 - David Swenson It is easy to create. If you mean in terms of just the CLI documentation, it's easy to add that to Sphinx. It dumps out all of it. If you use the Sphinx click or clicks Sphinx, one of these things, it will output all the documentation automatically. @32:54 - John Chodera (he/him/his) So, we just need it straight forward, Dad. Yeah. Great, great. All right. @33:01 - David Dotson Yeah, and I didn't cover this, but I can just point out that. So for release 1.0, we're prioritizing just what's needed to get the MVP out. So this CLI bit is a key part of deployment. We also have two other deployment issues that I'd like to get my carry on. I just talked to him over Slack, but I need to check if he's responded. For milestone 0.2, that would be our push for docs, basically. We're prioritizing getting the system up and running and deployed first, and then in the next release, focusing on building out all the documentation for it. Is there any objection to this approach? As long as there's a good example, we can edit and work from, I think that's fine. @33:49 - John Chodera (he/him/his) Okay. @33:50 - David Dotson Thank you. Yeah, and I still owe you both you and Yenka notebook as a gist or a better yet as just committed to the repo of like, here's the user API. Here's what it looks like to use the. user client. That's something I didn't get to this last few days. So I actually add that as just an issue here. I'll add details in the event. Thank you. Any other questions for David? Thanks, David. I did mention this just now. I'd like some help on deployment issues. These are tagged with deployment in the zero. One milestone. Um, I did want to tap my canary for this, but he couldn't make today's meeting. Uh, I think he, he, he said he'd be happy to help. @35:11 - John Chodera (he/him/his) Uh, so just reach out to him. @35:16 - David Dotson Okay. I just haven't gotten a affirmative yet. Okay. Um, I'm going to go ahead and just assign him to this. So okay, John. @35:24 - John Chodera (he/him/his) Yes. @35:30 - David Dotson Well, and then I'll also assign him to 25. That just, that's what followed from 24. Cool. @35:47 - Jeffrey Wagner This is me. I remember you were maybe open to sporadic work. Is this a time that would fit into your schedule? And David, would it be, would it be helpful to get Levi up to speed on this? @35:58 - Levi Naden Uh, depends on what it. because you're needing me to do, but probably we'll see. Okay, I'm happy to talk about it either way. @36:08 - David Dotson Okay, what would you be interested in doing? @36:11 - Levi Naden What do you specifically do you think needs most help? Like where are you lacking things right now? I can probably fit in there. @36:19 - David Dotson That's a good question. Things that are easily separable are the deployment picture pieces. Like I just assigned to my Kenray. So those would be the easiest entry points, I think, because they don't depend on a bunch of other development pieces. I'll have to give some thought if there's any other separable items. @36:39 - Levi Naden Yeah, let's chat. Let's chat probably after Thanksgiving. @36:43 - David Dotson Okay. @36:44 - Levi Naden Or that works. Pepper time and we'll figure that out. @36:48 - David Dotson Okay, great. Thank you. @36:52 - Levi Naden Awesome, thanks Levi. Welcome. @36:57 - David Dotson If you don't mind me asking Levi, what kind of interest do you have in the system? @37:02 - Levi Naden So either deployment or anything to help with the robustness of the system and also work deduplication. So anything in that space, I haven't fully looked over all of what needs done. So I can't give you a better clear answer, but I can help with most any of the bits, especially in the non scientific categories. For like really technical new nuanced and also pain in the ass bits of deployment and what not. @37:37 - David Dotson Okay. Okay, cool. I'll keep that in mind. Like I said, I'll expand some issues out then. Yeah. There's probably some separable items that we can we can get you on sooner than later, but you said after Thanksgiving, which would be. @37:51 - Levi Naden And probably after Thanksgiving because otherwise I just have tomorrow and rest of today. @37:55 - David Dotson So, of course. Thank you. Yep. Okay, any other any questions for me on deployment issues? Okay. Yvonne, do you want to give us an update on non-equilibrium cycling and purses? @38:18 - Iván Pulido Yep. So. Basically, I finished writing all the settings as goofy settings objects. I'm using just the base protocol settings. I didn't see any way to use like the other settings objects, like thermal settings and integrator settings or first field settings. So I'm basically using the same approach that Richard used for the for his repX protocol. I got the simulation unit work. with that. Now I have to make this actual protocol run. I'm facing some issues with the... Let me see what's the issue. Oh, actually, I don't know what the issue is, so maybe you can help me. Can I share the screen? SCREEN SHARING: Iván started screen sharing - WATCH Sure. Yep. So, basically here I'm creating the systems. I'm using... Sorry? @39:40 - David Dotson Can you zoom in a little bit? @39:42 - Iván Pulido Oh, yeah, sure. Yeah, yeah, yeah. I'm creating the systems from the goofy objects. I'm using a copy and paste code from Richard's League and Item | Presenter | Notesmapping that I think we should have in some centralized repo or... I buried that such that we don't have to, yeah, recreate the same code. I'm creating the mapping from it. I'm creating the setting subjects with the defaults values and I'm running the simulation unit here and it runs fine. You can see the result here. I'm gonna try running the protocol with the objects. I don't get any like error and it just says it's not okay. So I don't know how to debug this. @40:40 - David Dotson Yeah. So, deck result has, you can grab, if you go down to the sub below. Yep. Do deck result. Dot hit 10. @40:53 - Iván Pulido That's sorry, what? @40:56 - David Dotson If you hit the time key. @40:57 - Iván Pulido Oh, yeah. @40:59 - David Dotson Pretty cool. Oh, unit failures. @41:02 - Iván Pulido Oh, okay. @41:04 - David Dotson Yeah. So you have at least one in there, you can grab it out and then you can. If you go ahead and select it with open bracket zero. And you can interrogate the trace back. I think it has. @41:21 - Iván Pulido Oh, good. @41:24 - David Dotson You can print that. @41:26 - Iván Pulido Okay. Okay. Oh, okay. Right. Thank you. @41:34 - David Dotson But that's how you can interrogate these things. @41:36 - Iván Pulido Okay. Yeah. @41:37 - David Dotson That helps. You also have a, I believe we merged it. Or did we David Swanson and goofy. Did we merge the ability to run execute with where it raises the errors? @41:50 - David Swenson I don't think that's merged yet, but that is coming. So that. Yeah. @41:54 - David Dotson Yeah, that'll just be easier, but that's how you can do it. @41:57 - Iván Pulido If you need to. Okay. Thank you. Yep. Yep. That helps. So yeah, so that's where I am. Hopefully I can finish this today. And yeah, I have it working. @42:10 - David Dotson Okay. You did mention that you weren't able to use the settings. @42:18 - Iván Pulido Yeah, I didn't give it much effort, but so the models are based on the same. Yeah, like very basic quantities or objects. So just float in this. And. And in my case, it makes more sense to use like units quantities like either open F or open a man. So I just use the base protocol settings object with allowing it to have arbitrary how do you call that like arbitrary attributes, I think the side. for the pedantic model. So I can pass pint of quantity objects or open event units objects. @43:09 - David Dotson So okay so these settings object that that isn't goofy now that my canary work done is is pint based I mean it's pedantic plus pint so that does have unit support. Maybe I don't have the up today. @43:31 - David Swenson It sounded like you were using what Richard had done as opposed to what Mike had done. Now the thing that Richard had done was a stopgap. So if you're really looking at the OpenFE protocol that's using the sort of stopgap while we were waiting for Mike to finish up the settings that he developed and we need to actually migrate hours over to those settings. @43:57 - John Chodera (he/him/his) That question is there a standard set of inputs that we can use to make sure that everything is working because it looks like we're also using different inputs that might exercise different aspects of the APIs where it should be a common API, right? Yeah, a protocol is a common API and settings is a common API for all of us. So is there some sort of common test we can use? Common test inputs? Like you folks on your side, the open free energy side must be using some tested inputs, right? @44:31 - David Swenson So I'm not sure exactly what what you're meaning to test here. @44:38 - John Chodera (he/him/his) So in order to test the protocol, right, and the settings that you feed into it, you need to take some input data and actually run it to make sure it works, right? And so if if the question is still about like, you know, how do what does this, does this thing take pint units or not? Or does it take open MM units? It sounds like we're not using the same input data. to feed into this to test these protocols, right? From, in my totally misunderstanding this. @45:05 - David Swenson I think a lot of the actual testing is done based on whatever the default settings are. So the settings are tested independently from that. And if you're using different settings subsystems, they're both tested, but they're both, they're separate systems. That's the, I think that's the issue here. It's actually, there are two different settings hierarchies. One was the temporary one and one is the intended permanent one. In terms of the actual test process for the protocols, I believe we usually use whatever the default settings. We expect that there are defaults for the settings. And so we're just using that. Of course, if two different settings hierarchies have different defaults, you're going to have a problem. @45:49 - John Chodera (he/him/his) Yeah. So you can't feed the settings from one protocol into another protocol, but the settings are not a member of the protocol. They're something that's alongside the protocol. @45:58 - David Dotson So you parameterize a protocol with settings. So, once a protocol has its settings, it's baked in. That's the idea. And so, no. @46:10 - John Chodera (he/him/his) I'm just thinking about namespace organization for these modules. Do you grab them from different places and expect them to be compatible? Or is it just the naming convention that will mean that one settings is compatible with the protocol? Or do you get the defaults from the protocol from a static method from the protocol? Or what is the philosophy behind if they need to be compatible with each other because they're not general? How do we do that? @46:39 - Richard Gowers Yeah, so you get default settings from a class method on a protocol. The settings are probably defined in the same module as the protocols. That's how they're sort of named space groups. I don't think settings are compatible with other protocols because obviously... if you're talking about non EQ you can't shove that into a refx. Because they're hierarchical there will be components of settings which are shared. That's not organized very well at all currently because everything's just in one place and so it won't be easy to sort of compose a new one because you'll have to push some things down to a common level. I think that's what Mike started. So you'll want to build off Mike's sort of common. @47:25 - David Dotson Yeah so if I've got it up here on the right so for anything that falls into these buckets. @47:32 - John Chodera (he/him/his) Sorry I think you're not sharing your screen. SCREEN SHARING: David started screen sharing - WATCH @47:33 - David Dotson Oh I'm so sorry. Apologies. I know we've got about 10 minutes left. So this is the settings mod. These are the settings modules in goofy. These are the settings models in goofy. And the idea here is that for a given protocol you'll want to be pulling from each of these buckets right for vendor wall settings, electrostatics, GBSA. force field thermo. And for your protocol, if you, if there's something, some setting that you need that isn't represented in one of these already, then you need to make a subclass of protocol settings here and put in your, your very protocol specific stuff. So I feel like nonic, library of cycling. No, there's some, some pieces like how long to do the nonic, librarian portion versus how long to run the equilibrium portions. That kind of stuff would go into this protocol into your protocol setting subclass. You bind all these things up together into a single settings object. So this is just a container of all of the above. You've, you build this up and that's what you parameterize your protocol with. And so, I've on what you need to do is you need to make a subclass of this guy here, put in anything that's specific to nonic, librarian cycling. That doesn't fit into one of these other buckets like temperature already has a home here. already going to be found in the thermo settings. So in your protocol, when you're trying to access temperature, it would be coming from settings dot thermo settings dot temperature. @49:15 - Iván Pulido Yeah, yeah, that's what I'm doing. The problem is that, for example, right there, you see that temperature is a float quantity. So I have to, I have to know that it expects a float and not an open FF unit, like, you know, 300 Kelvin's. So, and I think that's like, I think that the base object should be able to handle that. That's that. @49:39 - David Dotson I think the flow of the unit. @49:42 - David Swenson Oh, it's in a float. It's a unit that's that's the whole thing that's from, I guess, Mike and Matt were doing in this open FF models. @49:54 - Iván Pulido Oh, but it failed for me. I thought it was okay. I have to reach it. I have to reach it. Okay. It failed. I had to. Yeah, I had to use the protocol settings because it failed using this thermo settings one. But I look into that. @50:09 - David Dotson We definitely don't want to be putting things like temperature into your protocol settings subclass because then that kills the ability to do what we like to do, which is have common places for these very common components, like temperature. Right? So we definitely want to get this working. If it's not working, then we should have the discussion. So you have a direct line of mic, right? @50:29 - Iván Pulido Because you would be the expert on it. Yeah. @50:32 - David Dotson If it's because it could be something as simple as we're just using it wrong. And it might be there's just a correct way to do this. And we're you're the first user, right? So this is a it's kind of expected that you might run into things like this. @50:46 - Iván Pulido Right. Yeah. I also wanted to understand the logic of this. So I said, like, OK, in the base case, I'm using the base class. So I know the class. the object works. Now I will try to use the actual like upper classes that makes sense. @51:08 - David Swenson Do you happen to remember what the error that you got was? Because I can very much imagine this is going to require a specific import that is not standard and it's like you have to do a pip install of it. So if your environment doesn't have this thing in there, obviously that's going to be a problem. @51:25 - Iván Pulido I don't remember. I'll try it and let you know and let Mike know. @51:35 - David Swenson It's a good chance that something that's missing from like our requirements because it exists in the testing environment that we have but maybe it's missing from whatever environment you're using this setup. @51:50 - Iván Pulido Okay, yeah, it might be. It might be that. @51:54 - David Dotson It looks like we will have to check your environment but we do have enough of models at least getting pip installed in the environment.aml for your fees. So. But yeah, it could be that simple as well. So I think your best point of contact for this, if it's a settings issue, hit my cup and just ask him because yeah, we wanna get these things wrapped up quickly. @52:15 - Iván Pulido Yeah, yeah, I get it. Thank you. I do have a benefit from all of them, seeing my environment, but okay. Yeah. @52:22 - Jeffrey Wagner Yeah, I think from what I recall, Matt was thinking that Mike wanted to make a release, but Mike hasn't asked Matt to make a release. And so it might just be that there's stuff on the main branch that's not in the conduct package and we'd be happy to make a new one if that's the case. We just need to check with Mike. @52:46 - David Dotson Okay, thank you, Avon. Are there any other questions for Avon? Okay. Got about five minutes left. I did wanna ask this is a bit of risk mitigation. for me in the project is I wanted to ask here Richard what are the requirements for using OpenFE REPEX as an initial testing protocol? I still want to use non-equilibrium cycling, we want to use that production so if on that work is critical but as a hedge if I wanted to run the replica exchange protocol using synchronous compute services on lilac what kind of goshfas could I expect? @53:30 - Richard Gowers It's going to take longer than five minutes but we don't reduce the short answer is we don't reduce the data within the protocol DAC so we analyze the time series in the results for you and not in the end of the protocol so if you're I think a lot of your model is based around pulling down the results via your API and so that won't work because the time series you want to analyze aren't being pulled down you're pulled down. a path to a file that won't be local to you. So, solution to that would be you'd have to add in a new unit on the end of the DAG. You could probably do that by subclassing that would essentially do the time series crunching so you don't pull down the process of time series. @54:16 - David Dotson Okay, you said that you just store paths at the end. I mean, that's kind of okay because it's the job of the compute service to turn those paths into something that actually resolves. So, if we end up taking those paths, pushing them to object storage and replacing them with S3 or RIs, then that can still work. That multi-gigabyte files. That's okay. Cool. I think that can work. Okay. All right. I may use that as a test case alongside not only the insight thing for. file can be just to make sure we can handle it. @55:03 - John Chodera (he/him/his) That'd be great. We do want to be able to compare between the two sets of output data to make sure we get concordant consistent results. So that would be very useful. @55:16 - David Dotson Richard, I'll follow up with you on any any gadgets there, but I think I think that's not a blocker. Thank you. Any other questions for me? Okay, you're fine. I know we're coming down to the wire here, but do you want to give us an update on blockers and priorities for protein-like investment? @55:40 - Irfan Alibay Well, very quick. So PR-82 is up for a period. There's some issues with the AI at no one time. So I need to work out why apparently it's a thing. It's funny in complete molecules. And then on starts, virtually, you can sort them with everything else along. I don't know. We have a chance to look at 83 and whether or not those network files were suitable for you. @56:14 - Iván Pulido I haven't run that, but I will this week. I have that on my roadmap for this week. @56:23 - Irfan Alibay And then hopefully once that's done before. John, there was also a thing about. @56:31 - John Chodera (he/him/his) And the files rights. Someone mentioned that they had trouble with amber files, but I don't know what that trouble is. I think it was Tony may might have mentioned this. @56:44 - Irfan Alibay Right. Yeah, I can look, I can look into it after I finish 82. @56:52 - Richard Gowers I think it was really in particular. @56:56 - John Chodera (he/him/his) I'm sorry. @56:58 - Richard Gowers What in particular? summon So is that 10, 7, probably? @57:02 - John Chodera (he/him/his) Yes. So if somebody can reach out to her and identify exactly what her problem was, that would be useful. @57:09 - Irfan Alibay Yeah, I can do that. @57:11 - David Dotson If possible, could you have her make an issue up for it? @57:14 - Irfan Alibay I think there might already be one, but I'll put one up now. I know we mentioned this a couple of months ago. And I think for, so I brought up, whether or not we wanted to make sure that we could read everything in the way for a method called, on priority at the time, was that we weren't going to do that. And we're just going to just constrain the BMM. So I can put one up for a little bit more. Unless we are trying to, we want to change our priority and put it in look like three. @57:45 - David Dotson I think I'd like to not add new scope to a 0 to 3, if possible. Go ahead. @57:54 - John Chodera (he/him/his) I was just going to say, I mean, the whole hope for 0.3 was that we had PDB compliant files. So that meant the So I think the key question is, is it an issue where Ambers somehow requires non-compliant files or is it something else where we don't, we don't, we have an issue with PDB compliance. @58:15 - Irfan Alibay I mean, I'll reach out to Tony. Um, I suspect based on the initial issues we have will lead a couple of months back. It's that teal that's expected non-PDB compliance. Okay, so we know it is, right? @58:31 - John Chodera (he/him/his) But there's a part of the essential required part of the Ember workflow that is supposed to handle that. And if it's a problem with that component, it's like PDB for Ember. If it's not actually working correctly on PDB compliant files, then we can work with the Ember devs to fix that. @58:48 - Irfan Alibay Yeah, so I only know one issue here, which is when it comes to dealing with protonated rescues. It doesn't buy that fighting problem. Um, but yeah, we, I'll work with Tony and see if we can get a full list of. @59:00 - John Chodera (he/him/his) What needs fixing? Great. And then we can work with the Amber folks and get that fixed because that, I mean, it should, that's the whole purpose of that package is to deal with PDB compliant files. @59:11 - Irfan Alibay Yeah, although do we know who the current data should be for Amber? @59:16 - John Chodera (he/him/his) We can find out from Dave Hayes. We can just find out. And I believe one of us already has access to the Amber code base. Is it Mike Henry or? No, he's not. I'm not sure he's here right now. So we'll find out who. @59:34 - Jeffrey Wagner Yeah, I have access to the Amber GitLab, though I haven't used it in a while. I don't, unfortunately, I don't think I have the bandwidth to be the point person on this. So it may be good to establish a secondary contact. @59:54 - John Chodera (he/him/his) Yeah, we can work with Tony with the Amber folks. But I think the key question is, is it important to check out alerts? problem because we're not producing PDB compliant files or is it an amber problem because they are not reading PDB compliant files? @01:00:07 - David Dotson Yes, and if it's the letter, then we need to push it onto them or have some kind of pathway to, so as John mentioned, PDB for amber is something that maybe not our responsibility to make that, but we then pursue that. We would want this repo to have PDB compliant files not to be, not to have amber compliant files. @01:00:33 - John Chodera (he/him/his) Correct. @01:00:33 - David Dotson So our target. Amber needs to make sure they can read PDB compliant files. Okay, your phone, do you feel like you have what you need on this front? @01:00:50 - Irfan Alibay Yeah, if possible, if someone can I will look at he to do at some point this week, that would be great. @01:00:56 - David Dotson Yeah, we currently have reviewers, so as John and Yvonne, John. down and around you have bandwidth for this in the next week or so. I know that Thanksgiving's coming, so this may be hard to do. @01:01:07 - John Chodera (he/him/his) Can you just post the PR link in the chat? @01:01:12 - David Dotson It's PR number 82. Here it is. @01:01:17 - Iván Pulido Oh, other than me trying to run the benchmarks, is there anything else I should be doing for that? @01:01:22 - Irfan Alibay For 82. I don't, so 82 might not be. So 82 won't have the graph edges yet. That's 83. For 82, it would just be, I think, particularly you had a comment on these GZ files and wherever it needed to be there. @01:01:42 - Iván Pulido I wasn't too sure what needed to be done there. I see. @01:01:46 - Irfan Alibay We're going to move those intermediate maestro files to the release artifact. @01:01:49 - Iván Pulido Yeah, I don't think that's been done. | ||||||||||