AWS EKS (Managed) Setup Guide¶

logo

Overview¶

AWS Elastic Kubernetes Service (EKS) provides a managed Kubernetes environment that simplifies cluster deployment and maintenance. EKS is an excellent choice for Orion deployments due to its scalability, reliability, and comprehensive AWS integration.

Base Node Configuration¶

Recommended Approach: EKS Managed Node Groups¶

We highly recommend using EKS Managed Node Groups for your Orion deployment, as they offer several key benefits:

Simplified node provisioning and lifecycle management
Automatic scaling capabilities
Pre-configured services required for Orion
Regular security updates from AWS
Pre-installed drivers (including NVIDIA drivers for GPU nodes)

Standard Node Configuration¶

For general workloads without GPU requirements:

Recommended AMI: AL2_x86_64

This AMI provides: - Optimized performance with Amazon Linux 2 - Pre-configured Docker and containerd runtime - Automatic security patching - Stability for production workloads

GPU Configuration¶

GPU Node Setup¶

For workloads requiring GPU acceleration such as inference services and GPU-accelerated processing:

Recommended AMI: AL2_x86_64_GPU

This specialized AMI includes: - Pre-configured NVIDIA drivers for immediate GPU utilization - NVIDIA Container Toolkit pre-installed - GPU runtime configuration for Kubernetes - Optimized for GPU workloads

GPU Node Benefits¶

The GPU-enabled AMI provides several advantages:

Zero Configuration: NVIDIA drivers are pre-installed and configured
Container Ready: NVIDIA Container Toolkit is already set up
Kubernetes Integration: GPU resources are automatically discoverable
Maintenance Free: AWS handles driver updates and security patches

Verify GPU Configuration¶

Once your GPU nodes are deployed, you can verify the configuration:

# Check GPU availability on the node
kubectl get nodes -o jsonpath='{.items[*].status.allocatable.nvidia\.com/gpu}'

# Verify GPU nodes are labeled correctly
kubectl get nodes --show-labels | grep gpu

Additional Configuration¶

Node Group Strategy¶

Consider using multiple node groups to optimize resource allocation:

CPU Node Group: For general Orion services and control plane components
GPU Node Group: For GPU-accelerated workloads and inference services
Spot Instance Node Group: For cost-optimized batch processing (optional)

Best Practices¶

Stay Updated: Regularly check the amazon-eks-ami releases page for the latest AMI versions to ensure you have the most recent security patches.
Right-sizing: Start with appropriate instance types for your workload to optimize cost and performance.
Taints and Tolerations: Use Kubernetes taints on GPU nodes to ensure only GPU workloads are scheduled on expensive GPU instances.
Monitoring: Implement comprehensive monitoring for both CPU and GPU utilization to optimize resource allocation.

Additional Resources¶

Official EKS AMI Documentation - Details on AMI components and security
EKS Best Practices Guide - Comprehensive recommendations from AWS
Managed Node Groups Documentation - Complete reference for EKS Managed Node Groups

Next Steps¶

After completing this setup, Proceed to Orion cluster deployment by following our Cloud Installation Guide.

For further assistance, contact Juno Support.