Compute Expansion - AWS Spot
- 1 Meeting notes from 2020.12.16
- 1.1 Batch
- 1.2 ECS
- 1.3 EKS
- 1.4 Spot instance advisor
- 1.5 Spot Blueprints
- 1.6 Questions
- 1.7 Other notes
Meeting notes from 2020.12.16
Instance flexibility is important.
POC - would be fundable.
Price: up to 90% discount.
Can scale up to significant amounts.
Want to build workloads to handle two-minute sudden termination.
Want time/region flexible.
spot pool = instance type/size * availability zone
Autoscaling group, supports mixed instance types, purchase options
ASG features lifecycle hooks, termination policies, etc.
Allocation strategies
N lowest priced
capacity optimized (preferred)
EC2 instance rebalance recommendation: new signal notification when spot instance is at elevated risk of interruption
allows for proactive rebalances of workloads to a deeper pool
capacity rebalancing on Spot, ASG will create instances when rebalance signal received, eventually kill old instances
AutoscalingGroup: capacityrebalance: true
Batch
Recommend that you choose "Spot capacity optimized" for Batch
ECS
Fargate Spot is available.
Overprovisioning on spot with capacity providers.
capacity providers map to ASGs, allows ECS to manager ASGs
EKS
After creating cluster with cloudformation, create nodegroup on EKS
cluster autoscaler requires a nodegroup
same size instance type (cpu/memory)
separate spot and on-demand in different nodegroups
Handling interruptions
identify
2-minute notification
taint
drain
replace
For self-managed node groups - need a daemonset for handling spot interruptions
managed nodegroups already have this, so those are recommended
Horizontal Pod Autoscaler (HPA)
should run on the on-demand nodegroup
Cluster Autoscaler (CA)
should run on the on-demand nodegroup
Taints applied at node level
a node won't accept any pods that do not tolerate its taints
Node affinity: allows you to constrain which nodes your pod is eligible to be scheduled on based on labels on the node
Can use both taints and node affinity to control whether containers go to spot, on-demand
Spot instance advisor
3-month trailing statistics
Aim for best practices, flexibility, diversification.
Spot Blueprints
Just released; gives you templates for deployment via cloudformation, terraform, etc.
Questions
ECS - would that require any ondemand base capacity?
Will get back to me, but believe the answer is no
Base ondemand capacity for EKS could likely be small; perhaps could try micro instances?
Next steps for me is to evaluate the choices and arrive at a decision on which deployment approach.
Other notes
Need to do some major work on manager code to better support signal handling, pre-emptible, retries, etc.