Ethan Holz: 00:00 David, it was fascinating. I was looking at the pricing between AWS and Azure today, and their pricing for the same gpu's are exactly the same, but the cpu and ram specs are completely different. David Swenson: 00:39 Interesting. Ethan Holz: 00:41 For example, if you get a v 100 on AWS, you'll get eight cores and I don't know, let's say this is not accurate, but 64 gigs. But if you did the same on Azure, you'd get six cores but 84 gigs. So they give you just way more memory on Azure, but not as much cpu compute, which I think is kind of interesting. David Swenson: 01:07 This is one of the reasons I'm interested in actually benchmarking was. Hey Jeff, right there with you. Jeffrey Wagner: 01:13 Hey David, sorry. I tried to start the zoom call with my computer close and it got very sad. David Swenson: 01:26 We're waiting on Matt, is that right? Jeffrey Wagner: 01:28 Correct. Jeffrey Wagner: 01:29 Open. David Swenson: 01:52 Jeff, are you aware of anybody having done cloud focused benchmarks with open? Mm. Because as we were just discussing, like you can have the same gpu but different system setups because different cloud providers kind of give you a different default. And we thought it might be a little bit interesting to see if that led to any variants and to really give a price to performance estimate for people. Jeffrey Wagner: 02:22 Yeah, I'm imagining, let's see. Hopefully open eye has done it, but wouldn't have published the results. I think the problem is people who would have done these studies would have been people who were financially incentivized to do the studies. And those are basically people who own clouds that they want you to run OpenMm on. And at that point those benchmarks are probably so gamed that they might not be useful. David Swenson: 02:58 I mean, again, it's not. This is. I don't think this is an expensive thing to do. It's one of the reasons we're thinking about doing it, because you don't need to run, you know, days of compute to get an estimate of performance. I have to see what you know. I know that Peter Eastman has like a benchmark set. I'll have to see how long they run it for. But you know, you need to run it long enough that it's kind of settled in, but you don't really need to run it long enough to do something scientifically valuable in order to get the. So I'm thinking we can probably get it in a few hours per gpu that we're interested in, which then we're talking about a few bucks for most of them. Jeffrey Wagner: 03:49 Yeah, it'd be interesting to check out. Yeah, that'd be interesting. Jeffrey Wagner: 03:59 Cool. Jeffrey Wagner: 04:00 I can start some meeting notes if we want, or if you guys have a place planned, we can do that. David Swenson: 04:04 There don't have a place planned. I brought in fireflies, although it took your name since it's your meeting, but that way I'll have those notes after the fact. Jeffrey Wagner: 04:17 Okay, well, let's do that. And I'll just record and then I won't worry about notes too much. Today. Do I want to start a notes document anyway? No, let's yellow this. This will be fun. Okay, so we. Yeah, we're interested to try out the GitHub actions runners. And the first place that we think it would make sense for us would be on open ff evaluator. We have some basic tests for evaluator, but we would love to do tests that run a short simulation, make sure that approximately the same numbers come out every time. That would have been impractical without a GPU. We want to get started. The place that we're starting from is that we have a root account, I believe, with payment details hooked up for Openforce field on AWS. But I'm not sure if we just want to have stuff right on that root account, if we should make personal accounts and hook up our CI to that, or if there's another kind of account it sounded like maybe we should be making. So my goal today would be tell us what to do for account setup, and then if we have extra time, I'd love to see more hands on in detail what to do. David Swenson: 05:44 Yeah. So I'll. I'll go ahead and share my not yet logged in AWS thing here, just so you can sort of follow along as I'm doing this. So, no, I'm actually not going to use Desmond. So I'm going to log in as root, which is not what I would usually do, but I actually, it is what I do with this account frequently because it gives me my. The root account is the only one that can look at the billing information and I want to regularly check and make sure that we're not spending too much money. So, you know, and I've got my whole two factor going, so I have to pull out that thing. All right. That helps if I didn't click. No, it's just changed. Okay. So also from the root account, we can see everything. And yes, we spent a $3 so far this month. Not bad. Ethan Holz: 06:44 Almost bankrupt. I know. This month. There we go. I did my job correctly. David Swenson: 06:49 So there are two. Well, three things you'll have to do. The first thing you're going to want to do as soon as you start doing this is go ahead and go to your service quotas, because you're not going to have EC two services. You're going to have EC two instances for what you want. So you go to the EC two type of quota and then you're going to want either. You're definitely going to want on demand to start with. And you're going to want either P instances or G instances. Okay, wait, we now have the p is a better gpu. G is g. What is G again, Ethan? It's so it's not performance. Ethan Holz: 07:32 Yeah, it's. They're more like, they have. They're more well rounded gpu's. So you're, your P series are going to think like your v. In this case, the K 80s are the other ones. G instances have more of like a well rounded feature set. So think T four s l. Like smaller, smaller, more efficient gpu's that maybe have a cut down feature set. But like, most like, it has most of your compute capability that you need. So they're. They're accelerated computing, but they're not like HPC optimized or they're not as like built for large vRam instances. So if you need like a decent amount of compute, generally your best price is the G instances. So, like, you can get some like, T four gpu's at like $0.50 an hour, whereas, like, the cheapest on a P instance is going to be like closer to a dollar an hour. It's like $0.90. Jeffrey Wagner: 08:29 Yeah. Ethan Holz: 08:30 So that's kind of the difference. Difference there. David Swenson: 08:34 But in any case, you will want to go through the process of going. Jeffrey Wagner: 08:38 On, how do I get to this page? Or let's establish this. I would love to follow along on my own computer. I'm on two pages ago, but then I lost track of you. David Swenson: 08:46 Okay, so if we go back here ish. So we've got console. The first thing you do is you just. Yep, I can do that. Let me. So service quotas on AWS services. Type ec two. Just type. Yeah. And that's the one you want. Amazon Ec two, elastic cloud computer quotas. And now on either the G or so you want to go search by quota name at the top because the first ones are all spot instances. So search by quota name and type on hyphen demand. And these are your choices for that. And then. Yeah, so G is. We think G is going to be a better price performance thing. G will give you better performance, though. Jeffrey Wagner: 09:40 I think G's yeah, we want the cheapos. David Swenson: 09:44 Yeah, that's what, that'd be my guess. I mean, you're nothing. You don't have a great urgency to get the results from these things. You just want to make sure you run. Ethan Holz: 09:51 The other thing is too, is like the base tier of the G's and the base tier of the P. The g tier is like a little more performant at the base and it's cheaper. And the other advantage is you don't have to use, like, remember how I was talking about, like, we had to make our own machine images to like, get our operating system updated on the g instances? Those are all supported. So, like, we don't have to build anything that'll work for the foreseeable future even without us. But yeah, probably ge to start. And then if you end up needing more compute, then P is probably a better option for a lot of cases for you. David Swenson: 10:26 Yeah. And then you can request an increase at the account level here. I do notice that you are in us east one, the northern Virginia region, which is convenient because that's where we've already been putting a few things. It's not convenient. Well, I don't know on g instances yet because we're waiting for our quota to increase. But we had trouble getting quotas or trouble getting instances of the p three at all. They were just, they were overcapacity. And since us east one is the kind of default region in a lot of ways, I wonder if another region might have more capacity. But, yeah, we haven't had a chance. Jeffrey Wagner: 11:14 To do that yet. I'd be okay to experiment with that. Maybe we try us west. David Swenson: 11:22 Yeah. The only thing, a couple of things here. First off, the. So I believe that when you request a quota here, quota increase, it is region specific. And so you'd have to do it again if you change regions. And thing two, is that right now? Well, if you need the image that we're building, which it sounds like if you're using g instances, you won't. But if you need the machine image that we're building that's region specific, we can export it to other regions, but we need to know which ones you're in. Jeffrey Wagner: 11:54 Okay. David Swenson: 11:55 So it shouldn't be hard for us to make that available in us west one. But it's just like if you don't tell us you're there, we don't do it, and we don't know to. Jeffrey Wagner: 12:07 Okay, then let's keep things maximally simple and I'll stay in us east. Okay. On demand g and bt and then request increase. David Swenson: 12:16 Click. Jeffrey Wagner: 12:16 Yep. David Swenson: 12:18 And there'll be a simple little form here. How many do you want? More than zero. Jeffrey Wagner: 12:24 Is this basically how many GPU's would I want at peak? David Swenson: 12:29 Yes. Yeah. And I would say, you know, expect a smaller number will get approved more quickly. Ethan Holz: 12:36 How many? Jeffrey Wagner: 12:37 Like ten might be reasonable. David Swenson: 12:40 We went up to eight on one of ours and on my current request we're at four and I'm still waiting to hear back. I can tell you how the larger. Jeffrey Wagner: 12:51 Increase take more time. Okay, so let's do a smaller one then. Let's do four. David Swenson: 12:56 You can always increase it more later and most likely what's going to happen is you're going to get, almost immediately, you're going to get a rejection and then you're going to have to explain why you want it. That's just the process. But the first request is easy. You just click request and you're done. As soon as this thing is done, it'll say, yes, you've done this request and then you don't have to think about it. You will get emails at whatever email is associated with this, just so you're aware of that. Yeah, whatever that is. It should come in pretty quickly after. Let's see if we look at the AWS thing again. Ethan Holz: 13:50 It'd be funny if they got approved and we didn't. David Swenson: 13:52 That would be hilarious. Ethan Holz: 13:54 Yeah, we also. David Swenson: 14:04 You can also always check the status if we go back to the AWS thing, the actual quota request forms. There we go. Jeffrey Wagner: 14:13 Yep. David Swenson: 14:14 And so this will have the. Jeffrey Wagner: 14:16 Oh, and then it just sends me back. David Swenson: 14:17 Yeah, yeah, yeah. But you can also do that from, if you go to your quota request history on the left there and it'll eventually. Yeah, you can go to the support case number and you can also interact with it here. This is actually what I've been doing because I don't check the email address that we have. It's only used for admin things. So. Jeffrey Wagner: 14:39 Yeah, I. David Swenson: 14:41 But, yeah, so anyway, we got that process started. That's not the question you had, but that's the thing that might take a week anyway before you can do anything. Okay, let's get it going to the question that you had. There are. So if you go back to the main dashboard of that, we can just do this from your side or actually let me do it for mine for a second because it's worth showing a couple of things. So there are two different things that are very closely related and very easy to get them confused. One is Iam and the other is Iam. Identity center. What we would recommend is any human accounts that you create go into the IAM identity center, which you'll have to set up. And so you'll see in this it's also region specific. And so you'll see we have three users here. I have an admin user, then I have two users that are just associated with our email addresses. This is how we would normally use everything with AWS. I actually almost never use the admin user now because I just use root account because I want the admin user can't see the billing information and I want to see how much money we're spending. I use the user account for that. You can set permissions on this, so it's a mess. And honestly, there are tutorials online that will go through it step by step to do this. But we are both using something that's a combination of, there's a built in power user access, there's a built in IAM full access. We are both using a combination of those two that we did manually, which is, you can just see in here, it's just these two things. So that describes the settings. We have those grouped into a group so that the settings ability or the permission sets are associated with a group name, which we call power users. And then in that group we've added the two of us. So that's kind of the human management. Does that make sense, the kind of groups permission sets? And for the most part, could you. Jeffrey Wagner: 17:10 Give me the overview on this and then if you've got time, we could do it for us? David Swenson: 17:14 Yeah, okay. That's kind of what I'm trying to do. The only thing I'll say is like, you can get into these permission sets and look into great detail at what they actually are under the hood. It's all a lot of JSON. Let's see. Actually that's. But actually. Oh, here we go back. Yeah, so it's a whole bunch of things like this, but the console makes it so you don't have to think about all this stuff. But if you want to look at it, you can. And it makes sense that it says things like it allows the IM create service linked role within the IM process. It allows list roles within all the organizations. I can understand who are the organizations. And so this is something that just goes on and on from all the different little details that it handles. But hopefully you will almost never need to look at it at that level of granularity, just to know that it's there if you do need it. Ethan Holz: 18:15 David, to note though for the GitHub actions runner, we provide you with that JSON. When you go to create the user you will use for your runner, you can just dump the JSON right in. You just copy it right from our docs, put it right in there and it'll set up with the minimum required permissions to provision new instances on your account. And it has no other permissions other than that. Can't see anything else. You can't see billing or anything like that. All it can do is just create an instance, modify it, and shut it back down. That's all it can do. David Swenson: 18:50 That way you can't do anything else. Jeffrey Wagner: 18:54 This is interesting. This is almost suggesting that maybe we don't need human accounts at all and we could just ping pong GitHub actions. David Swenson: 19:03 I don't know that you do. I'm just saying if you use human accounts, I recommend using IAm identity center. Jeffrey Wagner: 19:08 Yep. Okay. David Swenson: 19:10 That is what they recommend. Ethan Holz: 19:14 They're good to have if you want to play around with stuff. Oops, sorry, David. David Swenson: 19:17 Yeah, but as you say, that's distinct from IAM itself. So it's so confusing. Iam, identity center, iam. You'll also see this called SSO single sign on. So it's like Amazon's own single sign on process, which is honestly a much clearer name and less confusing. But now regular Iam, you'll see we have two users right now, one of whom is the terraform test setup stuff. This is what Ethan actually uses when he is doing this stuff for every day. That gives him a little bit more access to try and set things up and try and debug when things aren't going well. And so you probably will not need a user like this. Although if you do, we can give you suggestions on that. You will want a user like this one though, which is the one that actually runs it. So this is the thing that has the credentials to do this. This is the policy statement. And you can see it allows you to run instances, terminate instances, describe instances, and describe instance status. That's all it can do. And so yeah, it's very minimal in that set. And yeah, we do have for an access key which is going to be needed for it to sort of log in. Is that how it's working right now, Ethan? Use that as a secret. Ethan, you're muted, I think. Jeffrey Wagner: 20:47 Yep, yep, yep. Ethan Holz: 20:48 Sorry about that. But basically how it works is you can generate an access key. The access key comes with two parts. So you get an access key, which is you can think of as like your username and then I you'll get like a secret access key, which is what your password is. Both of them are used as secrets on our end. So when we set up the GitHub. David Swenson: 21:05 Action, you will set your repository as. Ethan Holz: 21:08 Secrets in your repository. And then AWS has its own sign in action. And what that does is it sets the environment variables correctly. There's a couple of reasons for that not necessarily needed here, but what it does is it allows us to authenticate it, make sure that all the AWS environment is set up correctly before it runs our stuff. So ours. It basically ensures that any of the things that the SDK might need are set up correctly beforehand. AWS sets that up and then it calls out our runner or our action, which will set up everything for you. But you get one access key and then access key, secret access key, both of those get set in your secrets and then you will use them to log in on the action itself. David Swenson: 22:01 One thing we are hoping to do in the future, but not too immediately, it depends on the people. If you end up using this and liking it will probably do this. I don't know if you've done anything lately with releasing to PI PI, but there you can also do this thing where GitHub is just trusted so you no longer have to give a token. And AwS allows the same thing. It's what is it called? Ethan Holz: 22:26 Yeah, it's OIBC, it's open ID connect. So how it works is like it trusts GitHub and then GitHub sends an authenticated request to AWS, it says, or to Pypi, it says this is the repo that I'm authenticating, here's the repo I'm coming from. And it goes great, cool. I know that this was set up intentionally from person who owns this account. And then it can assume that those roles, so like kind of the policy that we set in place, it can assume that and set that up correctly. So we have some stuff in place and some documentation available to make it. So you don't even have to create users, you just create the policy and then the actual action does that automatically. So then that way you don't have to rotate any your access keys. The access keys are only allowed for the time, for like the amount of time that it is part of the runner. So like as it's the step is there and those keys get reset. So they don't have like this where they're like a long lived key. They are like basically one time use and then they're done. And it's a lot more secure, but it also makes it a lot easier for you guys as well. But this kind of has a lessen. It's an easier cognitive understanding. It's easier to talk about, I think, using access keys rather than OIDC. David Swenson: 23:41 Yeah, I mean, here you're just doing giving. You're using a secret as a login and a password. And the beauty of the OIDC thing is that you no longer need the login and password. Right. It just. It knows that it's coming from a GitHub repository that you've already said, you know and control. And so, yeah, it allows you to. Ethan Holz: 23:59 Not need to use any secrets is what it does is AWS generates the secrets for you on your behalf and then handles it for you. You just have to do some configuration ahead of time so that it knows to provide those secrets when it needs it. Jeffrey Wagner: 24:15 Okay. Jeffrey Wagner: 24:16 I think I'll stick with the secret based workflow at first until you guys formally recommend to switch over. But thanks for the understanding. Also this. My quota request just got approved already. David Swenson: 24:32 Jesus. Hold on. Let me go to. Let me take my support stuff here and see if we've been approved yet. Let's see my support cases. Oh, I must have gotten something back on it pending Amazon action. Yeah, yeah. This is what I had happen last time I had one of these. And it took a week after I got this email before they actually approved it. Man, they must just not like me. Jeffrey Wagner: 25:09 They said it may take 30 minutes, up to 30 minutes to apply to propagate. David Swenson: 25:19 So I think that in terms of the authentication stuff, I think that answers your questions. Is there anything else we had questions on in this? Jeffrey Wagner: 25:29 What is the state of the runners? I guess is there alpha documentation that we can start working with or do you plan to have that out on a later date? David Swenson: 25:40 Ethan, can you pull that up and I'll. Ethan Holz: 25:43 Yep, I can probably get to it. David Swenson: 25:45 Faster than I can. Ethan Holz: 25:46 Yes. David Swenson: 25:51 We go. I think that we still have a pr open for the most complete documentation. So if you can show that one. Ethan Holz: 26:07 Bear that instead. Can you see us a little bit bigger? Jeffrey Wagner: 26:12 It's a little small. Jeffrey Wagner: 26:13 Okay. David Swenson: 26:21 One more. Jeffrey Wagner: 26:22 Maybe that twice more. Jeffrey Wagner: 26:23 Oh. Ethan Holz: 26:26 Yeah. There we go. Jeffrey Wagner: 26:28 And let's see, like one more. There you go. Ethan Holz: 26:30 Is that okay? Jeffrey Wagner: 26:31 Perfect. Jeffrey Wagner: 26:32 Yep. Ethan Holz: 26:33 So this is the fork of like the docs here. All that's really here is. So we have the breakdown of the setup process here. So this kind of shows what you can do. I have to make one change here to this. Now that I look at it, but basically this is all going to look pretty much the same. The other example we have is open MLM PP test. We have an actual usable example that we're also using internally. This is doing the open MM test. This is the bare minimum you need to get it running. This will all be updated with the. We're going to cut a new version this week so that you can use that instead of using the hash for this. But this is what it looks like. We have this part here, kind of the setup parts, and that there's a lot more here. If you want to add more things, there's more for this. Ideally you shouldn't need a whole ton of it. This is kind of more of a simpler example. But the open mm GPU one, I will include docs on that as well to kind of point to. That's what we use to test. It's something that like we are actively developing on and making sure that always works. So it's kind of a good point to look at. The other piece of docs that we have is we're going to have in this docs repo, we're going to have the list of the different providers we support. So right now it's just aws. So you'll be able to go to aws and it walks you through like how to go through the AWS console, create all the tokens you need. So you'll need a GitHub access token and then an AWS token. Set everything up. Here we have some im or Ami stuff. So those are your machine images on like what to do here. Some GPU instance recommendations for you want to use P two instances or other instance types. What kind of what we recommend here. And like what we recommend is useful. Like on those different machine images. Yeah, this is kind of just a list of resources. So if you're interested in like setting up the account, this is exactly the docume |