AWS EKS (Managed) Setup Guide¶
Overview¶
AWS Elastic Kubernetes Service (EKS) provides a managed Kubernetes environment that simplifies cluster deployment and maintenance. EKS is an excellent choice for Orion deployments due to its scalability, reliability, and comprehensive AWS integration.
Base Node Configuration¶
Recommended Approach: EKS Managed Node Groups¶
We highly recommend using EKS Managed Node Groups for your Orion deployment, as they offer several key benefits:
- Simplified node provisioning and lifecycle management
- Automatic scaling capabilities
- Pre-configured services required for Orion
- Regular security updates from AWS
- Pre-installed drivers (including NVIDIA drivers for GPU nodes)
Standard Node Configuration¶
For general workloads without GPU requirements:
Recommended AMI: AL2_x86_64
This AMI provides: - Optimized performance with Amazon Linux 2 - Pre-configured Docker and containerd runtime - Automatic security patching - Stability for production workloads
GPU Configuration¶
GPU Node Setup¶
For workloads requiring GPU acceleration such as inference services and GPU-accelerated processing:
Recommended AMI: AL2_x86_64_GPU
This specialized AMI includes: - Pre-configured NVIDIA drivers for immediate GPU utilization - NVIDIA Container Toolkit pre-installed - GPU runtime configuration for Kubernetes - Optimized for GPU workloads
GPU Node Benefits¶
The GPU-enabled AMI provides several advantages:
- Zero Configuration: NVIDIA drivers are pre-installed and configured
- Container Ready: NVIDIA Container Toolkit is already set up
- Kubernetes Integration: GPU resources are automatically discoverable
- Maintenance Free: AWS handles driver updates and security patches
Verify GPU Configuration¶
Once your GPU nodes are deployed, you can verify the configuration:
# Check GPU availability on the node
kubectl get nodes -o jsonpath='{.items[*].status.allocatable.nvidia\.com/gpu}'
# Verify GPU nodes are labeled correctly
kubectl get nodes --show-labels | grep gpu
Additional Configuration¶
Node Group Strategy¶
Consider using multiple node groups to optimize resource allocation:
- CPU Node Group: For general Orion services and control plane components
- GPU Node Group: For GPU-accelerated workloads and inference services
- Spot Instance Node Group: For cost-optimized batch processing (optional)
Best Practices¶
-
Stay Updated: Regularly check the
amazon-eks-ami
releases page for the latest AMI versions to ensure you have the most recent security patches. -
Right-sizing: Start with appropriate instance types for your workload to optimize cost and performance.
-
Taints and Tolerations: Use Kubernetes taints on GPU nodes to ensure only GPU workloads are scheduled on expensive GPU instances.
-
Monitoring: Implement comprehensive monitoring for both CPU and GPU utilization to optimize resource allocation.
Additional Resources¶
- Official EKS AMI Documentation - Details on AMI components and security
- EKS Best Practices Guide - Comprehensive recommendations from AWS
- Managed Node Groups Documentation - Complete reference for EKS Managed Node Groups
Next Steps¶
After completing this setup, Proceed to Orion cluster deployment by following our Cloud Installation Guide.
For further assistance, contact Juno Support.