2024-07-17 POSE/OpenFF GHA runner setup meeting

Participants

  • @David W.H. Swenson

  • Ethan Holz

  • @Matt Thompson

  • @Jeffrey Wagner

Recording: https://drive.google.com/file/d/1QXFBT1pzHLblh1SUFgY5jPkdPSywaXXq/view?usp=sharing

Discussion topics

Item

Notes

Item

Notes

Resources

Fireflies notes

Ethan Holz: 00:00
David, it was fascinating. I was looking at the pricing between AWS and Azure today, and their pricing for the same gpu's are exactly the same, but the cpu and ram specs are completely different.

David Swenson: 00:39
Interesting.

Ethan Holz: 00:41
For example, if you get a v 100 on AWS, you'll get eight cores and I don't know, let's say this is not accurate, but 64 gigs. But if you did the same on Azure, you'd get six cores but 84 gigs. So they give you just way more memory on Azure, but not as much cpu compute, which I think is kind of interesting.

David Swenson: 01:07
This is one of the reasons I'm interested in actually benchmarking was. Hey Jeff, right there with you.

Jeffrey Wagner: 01:13
Hey David, sorry. I tried to start the zoom call with my computer close and it got very sad.

David Swenson: 01:26
We're waiting on Matt, is that right?

Jeffrey Wagner: 01:28
Correct.

Jeffrey Wagner: 01:29
Open.

David Swenson: 01:52
Jeff, are you aware of anybody having done cloud focused benchmarks with open? Mm. Because as we were just discussing, like you can have the same gpu but different system setups because different cloud providers kind of give you a different default. And we thought it might be a little bit interesting to see if that led to any variants and to really give a price to performance estimate for people.

Jeffrey Wagner: 02:22
Yeah, I'm imagining, let's see. Hopefully open eye has done it, but wouldn't have published the results. I think the problem is people who would have done these studies would have been people who were financially incentivized to do the studies. And those are basically people who own clouds that they want you to run OpenMm on. And at that point those benchmarks are probably so gamed that they might not be useful.

David Swenson: 02:58
I mean, again, it's not. This is. I don't think this is an expensive thing to do. It's one of the reasons we're thinking about doing it, because you don't need to run, you know, days of compute to get an estimate of performance. I have to see what you know. I know that Peter Eastman has like a benchmark set. I'll have to see how long they run it for. But you know, you need to run it long enough that it's kind of settled in, but you don't really need to run it long enough to do something scientifically valuable in order to get the. So I'm thinking we can probably get it in a few hours per gpu that we're interested in, which then we're talking about a few bucks for most of them.

Jeffrey Wagner: 03:49
Yeah, it'd be interesting to check out. Yeah, that'd be interesting.

Jeffrey Wagner: 03:59
Cool.

Jeffrey Wagner: 04:00
I can start some meeting notes if we want, or if you guys have a place planned, we can do that.

David Swenson: 04:04
There don't have a place planned. I brought in fireflies, although it took your name since it's your meeting, but that way I'll have those notes after the fact.

Jeffrey Wagner: 04:17
Okay, well, let's do that. And I'll just record and then I won't worry about notes too much. Today. Do I want to start a notes document anyway? No, let's yellow this. This will be fun. Okay, so we. Yeah, we're interested to try out the GitHub actions runners. And the first place that we think it would make sense for us would be on open ff evaluator. We have some basic tests for evaluator, but we would love to do tests that run a short simulation, make sure that approximately the same numbers come out every time. That would have been impractical without a GPU. We want to get started. The place that we're starting from is that we have a root account, I believe, with payment details hooked up for Openforce field on AWS. But I'm not sure if we just want to have stuff right on that root account, if we should make personal accounts and hook up our CI to that, or if there's another kind of account it sounded like maybe we should be making. So my goal today would be tell us what to do for account setup, and then if we have extra time, I'd love to see more hands on in detail what to do.

David Swenson: 05:44
Yeah. So I'll. I'll go ahead and share my not yet logged in AWS thing here, just so you can sort of follow along as I'm doing this. So, no, I'm actually not going to use Desmond. So I'm going to log in as root, which is not what I would usually do, but I actually, it is what I do with this account frequently because it gives me my. The root account is the only one that can look at the billing information and I want to regularly check and make sure that we're not spending too much money. So, you know, and I've got my whole two factor going, so I have to pull out that thing. All right. That helps if I didn't click. No, it's just changed. Okay. So also from the root account, we can see everything. And yes, we spent a $3 so far this month. Not bad.

Ethan Holz: 06:44
Almost bankrupt. I know. This month. There we go. I did my job correctly.

David Swenson: 06:49
So there are two. Well, three things you'll have to do. The first thing you're going to want to do as soon as you start doing this is go ahead and go to your service quotas, because you're not going to have EC two services. You're going to have EC two instances for what you want. So you go to the EC two type of quota and then you're going to want either. You're definitely going to want on demand to start with. And you're going to want either P instances or G instances. Okay, wait, we now have the p is a better gpu. G is g. What is G again, Ethan? It's so it's not performance.

Ethan Holz: 07:32
Yeah, it's. They're more like, they have. They're more well rounded gpu's. So you're, your P series are going to think like your v. In this case, the K 80s are the other ones. G instances have more of like a well rounded feature set. So think T four s l. Like smaller, smaller, more efficient gpu's that maybe have a cut down feature set. But like, most like, it has most of your compute capability that you need. So they're. They're accelerated computing, but they're not like HPC optimized or they're not as like built for large vRam instances. So if you need like a decent amount of compute, generally your best price is the G instances. So, like, you can get some like, T four gpu's at like $0.50 an hour, whereas, like, the cheapest on a P instance is going to be like closer to a dollar an hour. It's like $0.90.

Jeffrey Wagner: 08:29
Yeah.

Ethan Holz: 08:30
So that's kind of the difference. Difference there.

David Swenson: 08:34
But in any case, you will want to go through the process of going.

Jeffrey Wagner: 08:38
On, how do I get to this page? Or let's establish this. I would love to follow along on my own computer. I'm on two pages ago, but then I lost track of you.

David Swenson: 08:46
Okay, so if we go back here ish. So we've got console. The first thing you do is you just. Yep, I can do that. Let me. So service quotas on AWS services. Type ec two. Just type. Yeah. And that's the one you want. Amazon Ec two, elastic cloud computer quotas. And now on either the G or so you want to go search by quota name at the top because the first ones are all spot instances. So search by quota name and type on hyphen demand. And these are your choices for that. And then. Yeah, so G is. We think G is going to be a better price performance thing. G will give you better performance, though.

Jeffrey Wagner: 09:40
I think G's yeah, we want the cheapos.

David Swenson: 09:44
Yeah, that's what, that'd be my guess. I mean, you're nothing. You don't have a great urgency to get the results from these things. You just want to make sure you run.

Ethan Holz: 09:51
The other thing is too, is like the base tier of the G's and the base tier of the P. The g tier is like a little more performant at the base and it's cheaper. And the other advantage is you don't have to use, like, remember how I was talking about, like, we had to make our own machine images to like, get our operating system updated on the g instances? Those are all supported. So, like, we don't have to build anything that'll work for the foreseeable future even without us. But yeah, probably ge to start. And then if you end up needing more compute, then P is probably a better option for a lot of cases for you.

David Swenson: 10:26
Yeah. And then you can request an increase at the account level here. I do notice that you are in us east one, the northern Virginia region, which is convenient because that's where we've already been putting a few things. It's not convenient. Well, I don't know on g instances yet because we're waiting for our quota to increase. But we had trouble getting quotas or trouble getting instances of the p three at all. They were just, they were overcapacity. And since us east one is the kind of default region in a lot of ways, I wonder if another region might have more capacity. But, yeah, we haven't had a chance.

Jeffrey Wagner: 11:14
To do that yet. I'd be okay to experiment with that. Maybe we try us west.

David Swenson: 11:22
Yeah. The only thing, a couple of things here. First off, the. So I believe that when you request a quota here, quota increase, it is region specific. And so you'd have to do it again if you change regions. And thing two, is that right now? Well, if you need the image that we're building, which it sounds like if you're using g instances, you won't. But if you need the machine image that we're building that's region specific, we can export it to other regions, but we need to know which ones you're in.

Jeffrey Wagner: 11:54
Okay.

David Swenson: 11:55
So it shouldn't be hard for us to make that available in us west one. But it's just like if you don't tell us you're there, we don't do it, and we don't know to.

Jeffrey Wagner: 12:07
Okay, then let's keep things maximally simple and I'll stay in us east. Okay. On demand g and bt and then request increase.

David Swenson: 12:16
Click.

Jeffrey Wagner: 12:16
Yep.

David Swenson: 12:18
And there'll be a simple little form here. How many do you want? More than zero.

Jeffrey Wagner: 12:24
Is this basically how many GPU's would I want at peak?

David Swenson: 12:29
Yes. Yeah. And I would say, you know, expect a smaller number will get approved more quickly.

Ethan Holz: 12:36
How many?

Jeffrey Wagner: 12:37
Like ten might be reasonable.

David Swenson: 12:40
We went up to eight on one of ours and on my current request we're at four and I'm still waiting to hear back. I can tell you how the larger.

Jeffrey Wagner: 12:51
Increase take more time. Okay, so let's do a smaller one then. Let's do four.

David Swenson: 12:56
You can always increase it more later and most likely what's going to happen is you're going to get, almost immediately, you're going to get a rejection and then you're going to have to explain why you want it. That's just the process. But the first request is easy. You just click request and you're done. As soon as this thing is done, it'll say, yes, you've done this request and then you don't have to think about it. You will get emails at whatever email is associated with this, just so you're aware of that. Yeah, whatever that is. It should come in pretty quickly after. Let's see if we look at the AWS thing again.

Ethan Holz: 13:50
It'd be funny if they got approved and we didn't.

David Swenson: 13:52
That would be hilarious.

Ethan Holz: 13:54
Yeah, we also.

David Swenson: 14:04
You can also always check the status if we go back to the AWS thing, the actual quota request forms. There we go.

Jeffrey Wagner: 14:13
Yep.

David Swenson: 14:14
And so this will have the.

Jeffrey Wagner: 14:16
Oh, and then it just sends me back.

David Swenson: 14:17
Yeah, yeah, yeah. But you can also do that from, if you go to your quota request history on the left there and it'll eventually. Yeah, you can go to the support case number and you can also interact with it here. This is actually what I've been doing because I don't check the email address that we have. It's only used for admin things. So.

Jeffrey Wagner: 14:39
Yeah, I.

David Swenson: 14:41
But, yeah, so anyway, we got that process started. That's not the question you had, but that's the thing that might take a week anyway before you can do anything. Okay, let's get it going to the question that you had. There are. So if you go back to the main dashboard of that, we can just do this from your side or actually let me do it for mine for a second because it's worth showing a couple of things. So there are two different things that are very closely related and very easy to get them confused. One is Iam and the other is Iam. Identity center. What we would recommend is any human accounts that you create go into the IAM identity center, which you'll have to set up. And so you'll see in this it's also region specific. And so you'll see we have three users here. I have an admin user, then I have two users that are just associated with our email addresses. This is how we would normally use everything with AWS. I actually almost never use the admin user now because I just use root account because I want the admin user can't see the billing information and I want to see how much money we're spending. I use the user account for that. You can set permissions on this, so it's a mess. And honestly, there are tutorials online that will go through it step by step to do this. But we are both using something that's a combination of, there's a built in power user access, there's a built in IAM full access. We are both using a combination of those two that we did manually, which is, you can just see in here, it's just these two things. So that describes the settings. We have those grouped into a group so that the settings ability or the permission sets are associated with a group name, which we call power users. And then in that group we've added the two of us. So that's kind of the human management. Does that make sense, the kind of groups permission sets? And for the most part, could you.

Jeffrey Wagner: 17:10
Give me the overview on this and then if you've got time, we could do it for us?

David Swenson: 17:14
Yeah, okay. That's kind of what I'm trying to do. The only thing I'll say is like, you can get into these permission sets and look into great detail at what they actually are under the hood. It's all a lot of JSON. Let's see. Actually that's. But actually. Oh, here we go back. Yeah, so it's a whole bunch of things like this, but the console makes it so you don't have to think about all this stuff. But if you want to look at it, you can. And it makes sense that it says things like it allows the IM create service linked role within the IM process. It allows list roles within all the organizations. I can understand who are the organizations. And so this is something that just goes on and on from all the different little details that it handles. But hopefully you will almost never need to look at it at that level of granularity, just to know that it's there if you do need it.

Ethan Holz: 18:15
David, to note though for the GitHub actions runner, we provide you with that JSON. When you go to create the user you will use for your runner, you can just dump the JSON right in. You just copy it right from our docs, put it right in there and it'll set up with the minimum required permissions to provision new instances on your account. And it has no other permissions other than that. Can't see anything else. You can't see billing or anything like that. All it can do is just create an instance, modify it, and shut it back down. That's all it can do.

David Swenson: 18:50
That way you can't do anything else.

Jeffrey Wagner: 18:54
This is interesting. This is almost suggesting that maybe we don't need human accounts at all and we could just ping pong GitHub actions.

David Swenson: 19:03
I don't know that you do. I'm just saying if you use human accounts, I recommend using IAm identity center.

Jeffrey Wagner: 19:08
Yep. Okay.

David Swenson: 19:10
That is what they recommend.

Ethan Holz: 19:14
They're good to have if you want to play around with stuff. Oops, sorry, David.

David Swenson: 19:17
Yeah, but as you say, that's distinct from IAM itself. So it's so confusing. Iam, identity center, iam. You'll also see this called SSO single sign on. So it's like Amazon's own single sign on process, which is honestly a much clearer name and less confusing. But now regular Iam, you'll see we have two users right now, one of whom is the terraform test setup stuff. This is what Ethan actually uses when he is doing this stuff for every day. That gives him a little bit more access to try and set things up and try and debug when things aren't going well. And so you probably will not need a user like this. Although if you do, we can give you suggestions on that. You will want a user like this one though, which is the one that actually runs it. So this is the thing that has the credentials to do this. This is the policy statement. And you can see it allows you to run instances, terminate instances, describe instances, and describe instance status. That's all it can do. And so yeah, it's very minimal in that set. And yeah, we do have for an access key which is going to be needed for it to sort of log in. Is that how it's working right now, Ethan? Use that as a secret. Ethan, you're muted, I think.

Jeffrey Wagner: 20:47
Yep, yep, yep.

Ethan Holz: 20:48
Sorry about that. But basically how it works is you can generate an access key. The access key comes with two parts. So you get an access key, which is you can think of as like your username and then I you'll get like a secret access key, which is what your password is. Both of them are used as secrets on our end. So when we set up the GitHub.

David Swenson: 21:05
Action, you will set your repository as.

Ethan Holz: 21:08
Secrets in your repository. And then AWS has its own sign in action. And what that does is it sets the environment variables correctly. There's a couple of reasons for that not necessarily needed here, but what it does is it allows us to authenticate it, make sure that all the AWS environment is set up correctly before it runs our stuff. So ours. It basically ensures that any of the things that the SDK might need are set up correctly beforehand. AWS sets that up and then it calls out our runner or our action, which will set up everything for you. But you get one access key and then access key, secret access key, both of those get set in your secrets and then you will use them to log in on the action itself.

David Swenson: 22:01
One thing we are hoping to do in the future, but not too immediately, it depends on the people. If you end up using this and liking it will probably do this. I don't know if you've done anything lately with releasing to PI PI, but there you can also do this thing where GitHub is just trusted so you no longer have to give a token. And AwS allows the same thing. It's what is it called?

Ethan Holz: 22:26
Yeah, it's OIBC, it's open ID connect. So how it works is like it trusts GitHub and then GitHub sends an authenticated request to AWS, it says, or to Pypi, it says this is the repo that I'm authenticating, here's the repo I'm coming from. And it goes great, cool. I know that this was set up intentionally from person who owns this account. And then it can assume that those roles, so like kind of the policy that we set in place, it can assume that and set that up correctly. So we have some stuff in place and some documentation available to make it. So you don't even have to create users, you just create the policy and then the actual action does that automatically. So then that way you don't have to rotate any your access keys. The access keys are only allowed for the time, for like the amount of time that it is part of the runner. So like as it's the step is there and those keys get reset. So they don't have like this where they're like a long lived key. They are like basically one time use and then they're done. And it's a lot more secure, but it also makes it a lot easier for you guys as well. But this kind of has a lessen. It's an easier cognitive understanding. It's easier to talk about, I think, using access keys rather than OIDC.

David Swenson: 23:41
Yeah, I mean, here you're just doing giving. You're using a secret as a login and a password. And the beauty of the OIDC thing is that you no longer need the login and password. Right. It just. It knows that it's coming from a GitHub repository that you've already said, you know and control. And so, yeah, it allows you to.

Ethan Holz: 23:59
Not need to use any secrets is what it does is AWS generates the secrets for you on your behalf and then handles it for you. You just have to do some configuration ahead of time so that it knows to provide those secrets when it needs it.

Jeffrey Wagner: 24:15
Okay.

Jeffrey Wagner: 24:16
I think I'll stick with the secret based workflow at first until you guys formally recommend to switch over. But thanks for the understanding. Also this. My quota request just got approved already.

David Swenson: 24:32
Jesus. Hold on. Let me go to. Let me take my support stuff here and see if we've been approved yet. Let's see my support cases. Oh, I must have gotten something back on it pending Amazon action. Yeah, yeah. This is what I had happen last time I had one of these. And it took a week after I got this email before they actually approved it. Man, they must just not like me.

Jeffrey Wagner: 25:09
They said it may take 30 minutes, up to 30 minutes to apply to propagate.

David Swenson: 25:19
So I think that in terms of the authentication stuff, I think that answers your questions. Is there anything else we had questions on in this?

Jeffrey Wagner: 25:29
What is the state of the runners? I guess is there alpha documentation that we can start working with or do you plan to have that out on a later date?

David Swenson: 25:40
Ethan, can you pull that up and I'll.

Ethan Holz: 25:43
Yep, I can probably get to it.

David Swenson: 25:45
Faster than I can.

Ethan Holz: 25:46
Yes.

David Swenson: 25:51
We go. I think that we still have a pr open for the most complete documentation. So if you can show that one.

Ethan Holz: 26:07
Bear that instead. Can you see us a little bit bigger?

Jeffrey Wagner: 26:12
It's a little small.

Jeffrey Wagner: 26:13
Okay.

David Swenson: 26:21
One more.

Jeffrey Wagner: 26:22
Maybe that twice more.

Jeffrey Wagner: 26:23
Oh.

Ethan Holz: 26:26
Yeah. There we go.

Jeffrey Wagner: 26:28
And let's see, like one more. There you go.

Ethan Holz: 26:30
Is that okay?

Jeffrey Wagner: 26:31
Perfect.

Jeffrey Wagner: 26:32
Yep.

Ethan Holz: 26:33
So this is the fork of like the docs here. All that's really here is. So we have the breakdown of the setup process here. So this kind of shows what you can do. I have to make one change here to this. Now that I look at it, but basically this is all going to look pretty much the same. The other example we have is open MLM PP test. We have an actual usable example that we're also using internally. This is doing the open MM test. This is the bare minimum you need to get it running. This will all be updated with the. We're going to cut a new version this week so that you can use that instead of using the hash for this. But this is what it looks like. We have this part here, kind of the setup parts, and that there's a lot more here. If you want to add more things, there's more for this. Ideally you shouldn't need a whole ton of it. This is kind of more of a simpler example. But the open mm GPU one, I will include docs on that as well to kind of point to. That's what we use to test. It's something that like we are actively developing on and making sure that always works. So it's kind of a good point to look at. The other piece of docs that we have is we're going to have in this docs repo, we're going to have the list of the different providers we support. So right now it's just aws. So you'll be able to go to aws and it walks you through like how to go through the AWS console, create all the tokens you need. So you'll need a GitHub access token and then an AWS token. Set everything up. Here we have some im or Ami stuff. So those are your machine images on like what to do here. Some GPU instance recommendations for you want to use P two instances or other instance types. What kind of what we recommend here. And like what we recommend is useful. Like on those different machine images. Yeah, this is kind of just a list of resources. So if you're interested in like setting up the account, this is exactly the document you'll use. It kind of walks you through that whole process. And then the Ami side is kind of where things get a little hairy. But feel free to let us know and we can definitely adjust that as needed. As for additions, like I said, these in the chat. Yeah, so for right now, it's here. Where is the chat?

David Swenson: 29:15
Yeah, we should link to the two actual repositories as opposed to your four.

Jeffrey Wagner: 29:20
Yeah, there we go.

David Swenson: 29:23
At the end of what you're seeing is from a branch that Ethan is doing right now then. Yeah, here's the branch.

Ethan Holz: 29:33
So here's the actual runner. This is the PR with the docs.

Jeffrey Wagner: 29:41
In it.

Ethan Holz: 29:41
So keep an eye on when those get purged and any changes we make there as well as any discussions we're having on that. Feel free to keep an eye on those if you're interested. Then the other one is. Here we go. This as well, which is the test bed that we're using for actually using this thing and using it to test it. Test a couple of things. We test docker like Nvidia installation is correct. And then also to make sure that Openo is able to see that installed Nvidia installation. So we have all three parts here.

Jeffrey Wagner: 30:26
Okay, just taking I did start a note stock and so I'll invite you to add the Fireflies notes as well. Okay. Do you guys consider this ready to use? Like if we start mucking around in this and asking you questions, will you say aha, thank you for helping us polish or oh no, we knew that that was wrong and now you're just annoying us. Which phase is this in?

David Swenson: 31:01
I think it's a lot closer to the helping us polish. I mean, you know, we haven't really tried it on anything that we're testing, but the fact that we have this open, open, open mm GPU test. So that's enough that we are sure that it is seeing Cuda on the device. We know it works on a p two. Obviously we don't know that it worked on the g instances yet because you got you approved before I did. But as soon as we get that approval, we'll check that in principle it should work. It's just a matter of making sure that does with an AMI that it's reasonable.

Ethan Holz: 31:41
The only thing I will say is that the docs will go through a little bit more change through the end of the week. So our goal for this is to have a, like I said, cut our release this Friday. David's like out of town on Friday. So like we're going to have that done by then just to kind of clean up a couple of docs things like I said, include links to our OpenMm GPU test, showcase that as and kind of make our examples more close to that, then do more complete examples so that they're more useful, I would say, and get that all kind of polished up by Friday. So my recommendation would be that if you like, get to Friday or Monday of next week, that would probably be a better time. But feel free to dig in right now if you want to. But that's kind of where we're at on usability in terms of documentation.

David Swenson: 32:37
I'd say life will be better if you wait until Monday. But that's, you know, we're very close to that. I mean, at that point, we'll have a proper release.

Jeffrey Wagner: 32:47
I'm glad I could ambush you all with a fast AWS approval. I knew that flipping a 20 into their Venmo account would help, apparently.

David Swenson: 32:55
Yeah.

Jeffrey Wagner: 32:58
Okay, then these are good resources, and we can easily wait till Monday. But I'm glad that we have notes on this. And I think looking at this, the hello world, the openmm gpu test. Yeah, this guy. Fantastic. I can look at this, and I can understand it. So then the big question mark is account setup, which I think now I'm in a technical. We're technically able to do account setup, and I'd be happy to try doing that now in case there's, you know. And I could do it while reading your documentation, and you could see me stumbling on stuff, and maybe that would help, or what have you. Or if you want to call it a day and tell me to read the docs, and that's fine. Cause you guys actually.

Ethan Holz: 33:51
No, no. I'm actually okay with, like, us kind of going through. If you want to use the docs that we have available, feel free. And then that way we can kind of see where the pitfalls are. I'd really like to see what hangs you up, if that's okay with you, just so that I can make those changes as we need to.

Jeffrey Wagner: 34:13
Absolutely. David, does that sound good to you?

David Swenson: 34:17
Yeah, yeah. The one thing I did also want to say about all of this is that depending on what your needs are, I mean, this is very much. The fact that we're also integrating this with GitHub actions is specific to the idea that that's part of what you need. And if that's not how you want to run stuff, there might be easier ways to get things set up, if that makes sense. I mean, it's kind of. If you just run on Aws directly, if that's a better solution, we do.

Jeffrey Wagner: 34:49
Want to run on GitHub actions.

David Swenson: 34:51
Okay.

Jeffrey Wagner: 34:52
Yeah, 100%. That's the purpose. Okay. And I want to ask one clarifying question. So, there was IAM identity management, and there was IaM. And as best practice, you recommended IAM identity management for humans and IAM Identity center.

David Swenson: 35:14
Identity center.

Jeffrey Wagner: 35:15
Oh, identity center.

David Swenson: 35:16
Yeah, yeah.

Jeffrey Wagner: 35:17
And then Ethan mentioned that there was a bot user that he sometimes used for interactive sessions, I believe. Is that correct?

Ethan Holz: 35:26
I'm using it on terraform, which is the CLI tool. I'm using it from my CLI to just do one off provisioning requests, it would be the equivalent of what I was doing in the action. But I'm running it locally. It's more useful for me testing the things that I want to test locally without having to. It's a similar secret set to what I would have actually on the action itself. It's the same, same idea, but it's not like actually me using it on the actual portal. It doesn't have a way to log in. You can't actually log in with that user. It's only used to authenticate to create resources.

David Swenson: 36:12
It's when.

Jeffrey Wagner: 36:14
Summary of what terraform is.

Jeffrey Wagner: 36:16
Yeah.

Ethan Holz: 36:18
So terraform is an infrastructure as code platform. So it allows you to write a declarative document that says, I want to provision this instance with these parameters and I want it to do this. And then it saves the state of that so that you can, when you're done with it, you can tear it back down in the same way that you put it up. That's kind of like what terraform does.

Jeffrey Wagner: 36:45
Okay. Yeah.

Ethan Holz: 36:51
It's kind of out of scope for this.

Jeffrey Wagner: 36:52
Yeah, sure. I'm trying to distinguish. So basically, option a is the very simple option where I just make a bot user for RCI. And when we need to run stuff and debug stuff, we just have to keep pushing changes to a pr on GitHub. And I guess the old. And so that might become burdensome. That would annoy me. It'll probably annoy Matt. And so should we be. Is there a less painful route via making personal accounts or having like a manually available bot account?

David Swenson: 37:35
So keep in mind that what Ethan needs to debug here is the code that is our GhA runners code. Ideally, if you've got a problem with that, we'd love to have help if you want to. But also that's where you just create an issue and say, guys, your code's broken. If you're working on that, you might want to be able to provision your own AWS resources, right? If you're not working on that, then that shouldn't be what you need. What you could want, though, is to be able to log into like a running action. We haven't tested whether that's even possible. There's a. I don't know if you've ever tried that. There is this. There's an action that allows you to do that, that you can actually put something in your GitHub actions as a step in your GitHub actions. And it will freeze there and give you an ssh thing. We haven't checked whether that would work on this. I tend to think it might not because we've probably locked down the ports pretty well by default.

Ethan Holz: 38:50
By default, the security group will just.

Jeffrey Wagner: 38:52
Let you.

Ethan Holz: 38:55
You should be able to, unless you change the security settings yourself.

David Swenson: 39:00
But by default, yeah, we all play with that one and see if it works. But that would be, I mean, it's already something that you can, that's what you have to do to do that on GitHub, right. If you wanted to check something out while your action was running, and you want to see if it's specific to the hardware that it was on GitHub, you need some way to ssh into that box and there is a tool that allows you to do that. I still don't understand how it works. It seems like magic to me, but I like it.

Jeffrey Wagner: 39:34
Okay, so let's assume, yeah, that's a good point. Like let's assume that we're setting this up for non interactivity first. I think you guys have made a great thing. I just worry we're going to have obscure errors that are ultimately due to, you know, some particular CUDA firmware or virtualization of CUDA firmware or something. And we will want to be able to investigate it a little bit before we just write to you and yell that it's broken, because it could easily be stuff on our side or on the contour forge side and.

Jeffrey Wagner: 40:08
Yeah, okay.

Ethan Holz: 40:11
There is some concern. So a couple, one thing to note. So when you're using G instances, they have a pretty comprehensive operating system set up that basically like Ubuntu with CuDA 12.4 and then some other fun, make sure that all of that is done ahead of time. If you run into any of those issues, we want to know about it regardless, because it's something that we want to document internally. We want to make sure that if there's any of those errors, that those are documented, because CUDA errors just seem to be, everyone has the same issues, but a place where all of them are documented doesn't seem to exist. So having that there is really helpful to us as well. So feel free to open issues if there is nothing. And then we can help make that environment more reproducible for all of you. But also it helps for us to document for people in the future as well.

Jeffrey Wagner: 41:06
Okay, cool.

Jeffrey Wagner: 41:09
Yeah, we can do that. And yeah, we'll give you lots of feedback and tag you insistently on GitHub.

David Swenson: 41:18
Please do be useful.

Jeffrey Wagner: 41:22
Okay, well, neat. So then I'm going to try setting up an account. Matt, do you have any thoughts as to. Do you want. Do you want a personal AWS compute account to do experiments? Or would you be happy with the bot account, Matt, would there be restrictions on a bot account? The bot account.

David Swenson: 41:51
The default bot account, is extremely limited. The thing that we recommend using as what the actual thing that talks from GitHub actions to AWS is extremely limited because we don't want it to do anything other than the things it needs to do for its job.

Jeffrey Wagner: 42:11
Okay, so let's get Matt and I full accounts just in case, and then we'll also make a bot account, and that's what we'll plan to hook up to the GitHub actions. So let's start with the human accounts.

David Swenson: 42:31
That's identity center.

Jeffrey Wagner: 42:36
I assume.

Jeffrey Wagner: 42:37
I want to enable this.

Jeffrey Wagner: 42:40
Yep.

David Swenson: 42:41
Note that it is specified to a region, but you can only have it for one region. So it's kind of weird like that. It's both. I don't understand why it's that way. It is like a global thing, but somehow it has to be specified within a region.

Jeffrey Wagner: 43:00
That's weird.

Jeffrey Wagner: 43:00
Okay.

David Swenson: 43:01
Yeah, so your account can. Your account now has IAm identity center in us east one.

Jeffrey Wagner: 43:08
Us east.

David Swenson: 43:10
You could still use it in principle for the work you were doing in us west one, but the login is still associated with us east one. It's. I. I found this really, really confusing. And you can't set up a separate one within your account in us west two because west one. Because that would just be weird to have two different login processes for yourself. Anyway. So from there you can go ahead and start creating a user. Go to the users on the left. Add a user standard information.

Jeffrey Wagner: 43:49
Yeah.

David Swenson: 43:49
We've been using our email addresses also as our usernames, but it doesn't matter. So the AWS console also has this thing that feels kind of like a dependency inversion sometimes. So, yeah, you don't need to add all the contact info. It's got all that stuff but keeps not scrolling.

Jeffrey Wagner: 44:22
Okay. Yeah.

David Swenson: 44:26
And so now you can create a group. This is part that feels like it's a weird inversion thing. And then, yeah, you can create the group. Let's see, where was it? How does this work? I think. Did it pop up a new thing? Go back one. Yeah. Now hit the refresh thing at the next. Create group. Yep. There you go. Now you've got developers. This is this really weird AWS workflow console workflow.

Jeffrey Wagner: 45:03
Yeah, Matt, any typos? That all looks good.

David Swenson: 45:18
Let's see. Add the user. And now is the confusing part, which is adding. Yeah, so, Matt, you'll get all that stuff and you'll have to do that thing. You don't just go to Aws, Amazon to log in with this. You'll have a special URL that is for your account, associated with your account, but that's also easy enough to deal with. I think you may also have to turn on. I'm trying to remember how this works because I've had to do this a couple of times, but I'm looking through my own just to make sure I can figure out how that works.

Jeffrey Wagner: 46:47
Okay. Yeah, tell me to register. MFA.

David Swenson: 46:51
Yeah, that is definitely recommended.

Jeffrey Wagner: 47:48
Okay, Matt, have you successfully done multifactor? I'm logged into a console. Anything other than that, I have no idea.

David Swenson: 48:00
All right, so right now you may need to get back to the. Jeff, you need to get back to the root account to because you haven't finished giving yourself permissions yet.

Jeffrey Wagner: 48:10
Oh, okay. I wonder if I can be both at the same time.

David Swenson: 48:15
Only if you're lucky. Let's see what happens when you try and reload. Good. And this is part I have to try and remember. Yeah. This is where you have to turn on AWS organizations. I believe in order to be able to do permissions from here, you see related consoles, AWS organizations. This is the part that got me confused when I had to do this again. Okay. Do we not have. This is weird because it's under OMSF, click under policy management here.

Jeffrey Wagner: 48:57
Oh, that is weird.

David Swenson: 48:59
Yeah, I think that this is yours is different from mine. Okay. Which means you can't see your billing information because you're not in charge of it. Okay. I'm going to try to figure this out for a second.

Jeffrey Wagner: 49:25
Oh, interesting. So I'm logged in as info at OpenFf, but my management account email address is info msf.

David Swenson: 49:33
Yeah. So AWS has this nested account system thing. And so that can be nice in principle, because that means that the bills are going to Carmen instead of you. But I don't that changes some of how this works and on ways I don't know, because you're not like the true root user over everything. You're just root within this account. What I don't understand, I'm surprised that the thing I thought I wanted to do here isn't possible. If we get a.

Jeffrey Wagner: 50:33
I can add AWS accounts. Okay, so this might be a Carmen thing.

David Swenson: 50:48
Yeah.

Jeffrey Wagner: 50:53
Okay.

Jeffrey Wagner: 50:56
This could be a good ending that we have reached this point and we.

David Swenson: 50:59
Are confidently confused and yeah, as often happens with AWS.

Jeffrey Wagner: 51:03
We need Karman. Yeah, no, that's fine. I assume that OpenFe and other OMSF projects will have similar setups. So good that we ran into this.

David Swenson: 51:13
Yeah. If you want to work around for now, what we can do is do what's not the recommended best practice. So the best practice is to do all of this stuff within IAM identity center. But you can also just create a regular IAM account for Matt so you can actually use it.

Jeffrey Wagner: 51:39
Oh, because we're still in. Oh yeah.

David Swenson: 51:43
Again.

Jeffrey Wagner: 51:44
But we can go back to vanilla Iam.

David Swenson: 51:47
Yeah. There, it's a little bit more limited because you're basically, you're very much stuck with this user path type of approach. And so it looks like you have two users now already. What are the two? Probably.

Jeffrey Wagner: 52:11
Yes.

David Swenson: 52:11
David Dodson had an account some two years ago. So yeah, you can just create a user here.

Jeffrey Wagner: 52:23
Okay. And so this will be for Matnight. This will be a new, this is uncoupled from the previous account.

David Swenson: 52:30
Right. I personally recommend giving it a slightly different name because let's just make the.

Jeffrey Wagner: 52:41
Bot user right now.

David Swenson: 52:42
Okay.

Jeffrey Wagner: 52:44
Yep.

Jeffrey Wagner: 52:45
We'll do that. And then we can.

David Swenson: 52:49
Why don't we.

Ethan Holz: 52:53
Oh, we'll do that in this here. So just name it whatever you want. You can, you can pull up the docs. Yep.

Jeffrey Wagner: 53:00
Perfect.

David Swenson: 53:03
It does not need access to the management console.

Ethan Holz: 53:05
So, no, that means it can't log in. So that's kind of sweet. And then you can do, so what I've done is you can do a task policies directly. That's what I've used. If you wanted to create multiple users for like maybe you wanted every single instance to have different users. So you can see which users are using more compute. You could set this up so that you create a policy and attach it to a group. Then all of them will get the same. But we'll go here and then you can select create policy on the top right of the permissions policy right there. What I recommend here is I use the JSON editor because I just copy.

David Swenson: 53:56
Paste what we have.

Jeffrey Wagner: 53:59
Yeah. Open. I did have this pr.

David Swenson: 54:06
Open.

Ethan Holz: 54:11
It should be the, you go.

Jeffrey Wagner: 54:20
Is.

Ethan Holz: 54:20
That, okay, that's the base one. So if you go, oh, wait, is.

Jeffrey Wagner: 54:25
That, no, yeah, this is base point, your branch.

Ethan Holz: 54:29
Okay, that's fine. So if you go down to docs, aws.

Jeffrey Wagner: 54:35
Yep.

Ethan Holz: 54:36
So you scroll down, there's that JSON.

David Swenson: 54:38
You should just take the copy button in the corner.

Ethan Holz: 54:43
And then you'll just replace.

Jeffrey Wagner: 54:48
I can find it.

David Swenson: 54:50
Yeah, there it is.

Ethan Holz: 54:52
Just replace it.

Jeffrey Wagner: 54:54
Then there you go somewhere at the.

David Swenson: 54:59
Bottom there should be, oh, what do you need the arn?

Jeffrey Wagner: 55:04
Yep. Let's see.

Ethan Holz: 55:08
Have it up here. See this is why we do it live. Score. When, let me check my user.

Jeffrey Wagner: 55:21
Permissions.

Ethan Holz: 55:23
This one. Oh, resource should be star. So what that is.

David Swenson: 55:29
Yeah, I think it should be star, right?

Jeffrey Wagner: 55:31
Yep, yep.

Ethan Holz: 55:37
So in, inside the quotes.

Jeffrey Wagner: 55:39
Yep, right there. Then there you go.

Ethan Holz: 55:43
And then you just hit next.

Jeffrey Wagner: 55:46
Yep.

David Swenson: 55:49
And then give it a name probably, you know, gha runner policy.

Jeffrey Wagner: 56:13
Cool.

Ethan Holz: 56:14
So then you can go back up to where we were before it did.

David Swenson: 56:17
The thing where it popped out a new tab. So you gotta go back one and refresh.

Jeffrey Wagner: 56:23
Refresh.

David Swenson: 56:25
If you go filter by type, customer managed. There you go. So this is attaching a positions policy directly to that user. Now to actually use it, you'll have to go to that user and create access keys. So you see where it says at the top access key one create access keys link. Oh yeah, to do that, yeah, it tries to tell you not to ever do this, which is what all of this is for. So this is really like you surely shouldn't do this. You shouldn't create a user and a password here, but you're going to anyway. So yeah, you can click on one of the options and then say, yes, I really want to do this anyway, or you can click on other and it'll be like, okay, well maybe click on any one of these and they'll be like you have this other option, you know.

Ethan Holz: 57:31
Yeah. You click on something and it'll say hey, here's some great docs on how not to do this.

Jeffrey Wagner: 57:36
Yeah, yeah, yeah.

Jeffrey Wagner: 57:39
Okay, sure.

David Swenson: 57:42
Which is both utterly confusing and actually really useful. I think it'll show us not the key, it doesn't show the key, but it shows the key, but not the secret access key. The access key is your username, the secret access key is your password.

Ethan Holz: 58:11
You can also just download the CSV as well and it'll download it for you on the bottom and it'll have.

Jeffrey Wagner: 58:16
Both for you if you want to.

David Swenson: 58:18
I recommend that especially doing it live. You don't have to even hide it from us.

Jeffrey Wagner: 58:26
Yep.

Ethan Holz: 58:27
Okay. And then you're done. That's pretty much it.

David Swenson: 58:32
So from there you would need to put the access key and secret access key into your secrets in the repository that you want to test. And you also have to create a GitHub token as well because it has to go talk to GitHub programmatically as well as AWS programmatically.

Jeffrey Wagner: 58:57
So if you don't mind, do you want to give it a shot? I could open a pr two evaluator that just runs our hello world.

David Swenson: 59:11
Yeah, there's more stuff to set up along the way, especially around, like there's that whole, there's a reason there's a very long step of do this, do this, do this, do this. Because you have to tell the, again, you have to get your personal access token for GitHub as well and then add everything into the secrets. Of course. I think we're good on the AMi and then it should work. Ethan, isn't there something in here that you have to set up for self hosted runners in the repository settings?

Ethan Holz: 59:55
No, you don't have to do anything there. It'll just request from GitHub itself. Like GitHub will provide you with a token. So like our tool asks GitHub for the token, token gets turned back. We use the token to then like re authenticate with GitHub to say here's a runner. And then what it'll do is you can just set it to say like I want to run on a self hosted runner and it'll run self hosted.

David Swenson: 01:00:22
This is if we want to do runner, different groups is where you have to do some settings if you want to have runner groups. Right?

Ethan Holz: 01:00:28
Yeah. And then the other thing is, so.

David Swenson: 01:00:30
Some things you want to run on self hosted gpu, some things you want to run on self hosted CPU CPU, you have the ability to put labels on it that will do that, but that's more advanced to set up, you.

Ethan Holz: 01:00:42
Get a temporary label. But.

David Swenson: 01:01:02
Jeff, when do you have your next meeting with Carmen, or when would you be doing that? Okay, because I'm just saying we have one, and this has come up a few times in the past. We have one a week from today, except for it's probably going to be a day earlier because we're supposed to be changing the date on that. But so we might actually bring that up as well because he's wanted us to bring our, our root account under the OMSF umbrella as well, which makes sense and would mean that instead of being on my ramp card, it was on, you know, some official funds somewhere. But yeah, I'll mention this here, because it's the same problem. And we won't be able to do it until she does that, I think either.

Jeffrey Wagner: 01:01:52
Okay. Yeah. Do you want to add me to that meeting?

David Swenson: 01:01:57
I can, yeah. It's part of our standard, it's just a standing meeting with the three of us. So there'll be a bunch of other stuff there that you're not going to care about, but you're welcome to come to the part you do care about if you want to, up to you whether you want to. We can try and order the agenda so that we're not wasting your time, at least with the stuff you don't care about.

Jeffrey Wagner: 01:02:32
Here's the problem. If I schedule my own meeting with Carmen, I'm going to say, carmen, could you help me fix these user permissions? And she'll say, how do I do that? And I'll be like, I don't know, I need to talk to David.

David Swenson: 01:02:42
Yeah.

Jeffrey Wagner: 01:02:44
And so we'll need to all be in the same place. The question is do we want to hijack your meeting or.

David Swenson: 01:02:51
To a large extent I think that, I mean to, I think what happens is she needs to log in and click that she needs to get another email for that account. Yeah. To be honest, you could probably just do that. Wherever was the, the send verification email, you could probably just send that again and see if she gets it and handles it.

Ethan Holz: 01:03:27
David, this might be something we want to meet with her on anyways and just try and figure out what's going on here because we'll probably have to do this with at least two other organizations.

Jeffrey Wagner: 01:03:38
So.

David Swenson: 01:03:39
Yeah. And if we do it just for ourselves first, then that should get it. Because I think it's just a link. It looks like it's a link in an email that she needed to click. And so she needs to send another one.

Jeffrey Wagner: 01:04:00
Yeah, let's do it with her just so she knows what she's clicking on. Um, could you decide me to your meeting next week and I'll assume that I'm the lowest priority item on the agenda?

David Swenson: 01:04:10
No, no, to the contrary. We'll, we'll try and start with you so that you can, you know, leave and do something productive with your time afterwards.

Jeffrey Wagner: 01:04:20
I don't do productive things. Matt knows that. I just have meetings to schedule.

David Swenson: 01:04:26
More meetings.

Jeffrey Wagner: 01:04:46
Okay.

Jeffrey Wagner: 01:04:46
But I think Matt and I have the tools to set up the GitHub action spot. And we can give you feedback if we have any trouble with that.

David Swenson: 01:04:55
Yeah. Barring something missing from our instructions there, which is the feedback that we need, if there is something missing, the only.

Ethan Holz: 01:05:02
Thing I will say is I think up top where we created the user before, you might have to hit done so it actually saves it.

David Swenson: 01:05:13
Did we have one where we hadn't.

Ethan Holz: 01:05:14
Quite, I think it's right there. So I think you just hit done.

Jeffrey Wagner: 01:05:19
That's interesting. It would let me get the access keys without being done.

Ethan Holz: 01:05:24
I think it'll give it to you, but then they're not active until you hit.

Jeffrey Wagner: 01:05:28
Done.

Ethan Holz: 01:05:29
So they're provisioned but not usable. Same thing with GitHub actions. So admittedly, when we provision your self hosted runner, one thing that we do is if it fails, it will get a token. If the token never gets used, then it's not useful and it expires within 30 minutes anyway. If it just fails and fails, and then we don't care.

Jeffrey Wagner: 01:05:54
Okay, cool.

Jeffrey Wagner: 01:05:56
I will, yeah, I will work with Matt to advance this.

David Swenson: 01:06:13
Yeah. Especially if there's anything unclear in the instructions on setup. If there's anything like that, do just reach out to us immediately because if it's not clear, then that's on us to improve.

Jeffrey Wagner: 01:06:28
Okay, cool.

Jeffrey Wagner: 01:06:33
All right, I think that's good. We have a science meeting that maybe we should get to, but yeah. Anything else that we want to touch on today?

David Swenson: 01:06:45
I think that's it.

Jeffrey Wagner: 01:06:47
Yeah.

David Swenson: 01:06:47
Let me know if there's anything that you, else you need for me.

Jeffrey Wagner: 01:06:50
Okay. David, when you get the fireflies notes, could you put them in here?

David Swenson: 01:06:55
How do you usually do that? Do you just want a copy of. You should actually get a note as well, right? If you don't, let me know, but I can. Do you just copy the, what is, do you put a link to it or what do you do?

Jeffrey Wagner: 01:07:11
Just the raw text.

David Swenson: 01:07:13
Okay. Raw text of the transcript or.

Jeffrey Wagner: 01:07:19
Yes.

David Swenson: 01:07:20
Okay. But yeah, you should have a link to it if you want to as well.

Jeffrey Wagner: 01:07:24
Yeah. Okay.

Jeffrey Wagner: 01:07:26
Yeah, if I can get access to it, then I'll just do it myself.

David Swenson: 01:07:29
I think it'll email you because it's got your name on it anyway.

Jeffrey Wagner: 01:07:33
Okay, perfect. Thanks everyone.

Jeffrey Wagner: 01:07:40
The recording has stopped.

Action items

Decisions