Skip to content

AWS EKS (Managed) Setup Guide

logo


Overview

AWS Elastic Kubernetes Service (EKS) provides a managed Kubernetes environment that simplifies cluster deployment and maintenance. EKS is an excellent choice for Orion deployments due to its scalability, reliability, and comprehensive AWS integration.

Base Node Configuration

We highly recommend using EKS Managed Node Groups for your Orion deployment, as they offer several key benefits:

  • Simplified node provisioning and lifecycle management
  • Automatic scaling capabilities
  • Pre-configured services required for Orion
  • Regular security updates from AWS
  • Pre-installed drivers (including NVIDIA drivers for GPU nodes)

Standard Node Configuration

For general workloads without GPU requirements:

Recommended AMI: AL2_x86_64

This AMI provides: - Optimized performance with Amazon Linux 2 - Pre-configured Docker and containerd runtime - Automatic security patching - Stability for production workloads

GPU Configuration

GPU Node Setup

For workloads requiring GPU acceleration such as inference services and GPU-accelerated processing:

Recommended AMI: AL2_x86_64_GPU

This specialized AMI includes: - Pre-configured NVIDIA drivers for immediate GPU utilization - NVIDIA Container Toolkit pre-installed - GPU runtime configuration for Kubernetes - Optimized for GPU workloads

GPU Node Benefits

The GPU-enabled AMI provides several advantages:

  • Zero Configuration: NVIDIA drivers are pre-installed and configured
  • Container Ready: NVIDIA Container Toolkit is already set up
  • Kubernetes Integration: GPU resources are automatically discoverable
  • Maintenance Free: AWS handles driver updates and security patches

Verify GPU Configuration

Once your GPU nodes are deployed, you can verify the configuration:

# Check GPU availability on the node
kubectl get nodes -o jsonpath='{.items[*].status.allocatable.nvidia\.com/gpu}'

# Verify GPU nodes are labeled correctly
kubectl get nodes --show-labels | grep gpu

Additional Configuration

Node Group Strategy

Consider using multiple node groups to optimize resource allocation:

  1. CPU Node Group: For general Orion services and control plane components
  2. GPU Node Group: For GPU-accelerated workloads and inference services
  3. Spot Instance Node Group: For cost-optimized batch processing (optional)

Best Practices

  1. Stay Updated: Regularly check the amazon-eks-ami releases page for the latest AMI versions to ensure you have the most recent security patches.

  2. Right-sizing: Start with appropriate instance types for your workload to optimize cost and performance.

  3. Taints and Tolerations: Use Kubernetes taints on GPU nodes to ensure only GPU workloads are scheduled on expensive GPU instances.

  4. Monitoring: Implement comprehensive monitoring for both CPU and GPU utilization to optimize resource allocation.

Additional Resources

Next Steps

After completing this setup, Proceed to Orion cluster deployment by following our Cloud Installation Guide.

For further assistance, contact Juno Support.