Troubleshooting Kubernetes
Troubleshooting Guide¶
Common Issues and Solutions¶
1. Viewing Pod Status and Logs¶
Check Pod Status:
# List all pods across namespaces
kubectl get pods -A
# List pods with more details
kubectl get pods -o wide
# Watch pods in real-time
kubectl get pods -w
# Get pod details
kubectl describe pod <pod-name> -n <namespace>
# Get pod events
kubectl get events --field-selector involvedObject.name=<pod-name> -n <namespace>
View Logs:
# Basic log viewing
kubectl logs <pod-name> -n <namespace>
# Logs from specific container in multi-container pod
kubectl logs <pod-name> -c <container-name> -n <namespace>
# Follow logs in real-time (like tail -f)
kubectl logs -f <pod-name> -n <namespace>
# View previous container logs (after crash/restart)
kubectl logs <pod-name> --previous -n <namespace>
# Get last N lines
kubectl logs <pod-name> --tail=100 -n <namespace>
# Logs since specific time
kubectl logs <pod-name> --since=1h -n <namespace>
# Logs with timestamps
kubectl logs <pod-name> --timestamps=true -n <namespace>
# Export logs to file
kubectl logs <pod-name> -n <namespace> > pod-logs.txt
2. Common Pod States and Solutions¶
stateDiagram-v2
[*] --> Pending
Pending --> Running: Resources Available
Pending --> Failed: Scheduling Failed
Running --> Succeeded: Completed
Running --> Failed: Error
Running --> CrashLoopBackOff: Repeated Crashes
CrashLoopBackOff --> Running: Restart
Failed --> [*]
Succeeded --> [*]
Pending Pod:
# Diagnose pending pod
kubectl describe pod <pod-name> -n <namespace>
# Check node resources
kubectl top nodes
kubectl describe nodes
# Check for PVC issues
kubectl get pvc -n <namespace>
# Common causes and solutions:
# 1. Insufficient CPU/Memory: Scale down other apps or add nodes
# 2. Node selector mismatch: Check nodeSelector/affinity rules
# 3. Taint/toleration issues: Verify node taints
# 4. PVC not bound: Check storage class and PV availability
CrashLoopBackOff:
# Check recent logs
kubectl logs <pod-name> -n <namespace> --previous
# Check all container statuses
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.containerStatuses[*]}'
# Common debugging steps:
# 1. Check exit code
kubectl describe pod <pod-name> -n <namespace> | grep -A 10 "Last State"
# 2. Run debug container
kubectl debug <pod-name> -it --image=busybox -n <namespace>
# 3. Check resource limits
kubectl get pod <pod-name> -n <namespace> -o yaml | grep -A 5 resources:
ImagePullBackOff:
# Check the exact error
kubectl describe pod <pod-name> -n <namespace> | grep -A 5 "Failed"
# Verify image name
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].image}'
# Check if secret exists (for private registries)
kubectl get secrets -n <namespace>
# Create image pull secret
kubectl create secret docker-registry regcred \
--docker-server=<registry-server> \
--docker-username=<username> \
--docker-password=<password> \
--docker-email=<email> \
-n <namespace>
3. Node Troubleshooting¶
# Check node status
kubectl get nodes
kubectl describe node <node-name>
# Check node conditions
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.conditions[?(@.type=="Ready")].status}{"\n"}{end}'
# Node resource usage
kubectl top nodes
# Check kubelet logs (on the node)
journalctl -u kubelet -f
# Cordon node (prevent new pods)
kubectl cordon <node-name>
# Drain node safely
kubectl drain <node-name> \
--ignore-daemonsets \
--delete-emptydir-data \
--force \
--grace-period=60
# Uncordon node
kubectl uncordon <node-name>
4. Service and Networking Issues¶
# Check service endpoints
kubectl get endpoints <service-name> -n <namespace>
# Verify service selectors match pod labels
kubectl get svc <service-name> -n <namespace> -o yaml | grep -A 5 selector:
kubectl get pods -n <namespace> --show-labels
# Test DNS resolution
kubectl run -it --rm debug --image=busybox --restart=Never -n <namespace> -- nslookup <service-name>
# Test service connectivity
kubectl run -it --rm debug --image=nicolaka/netshoot --restart=Never -n <namespace> -- curl <service-name>:<port>
# Check network policies
kubectl get networkpolicies -n <namespace>
5. Port Forwarding Issues¶
Common Issues and Solutions:
# Verify pod is running
kubectl get pod <pod-name> -n <namespace>
# Check container port
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].ports[*]}'
# Port forward to pod
kubectl port-forward -n <namespace> pod/<pod-name> 8080:80
# Port forward to service (more stable)
kubectl port-forward -n <namespace> svc/<service-name> 8080:80
# Port forward with specific address binding
kubectl port-forward --address 0.0.0.0 -n <namespace> svc/<service-name> 8080:80
Important Port Forwarding Notes:
- The command will block the terminal - this is expected behavior
- Use a separate terminal for other commands
- "Handling connection" messages are normal
- Connection resets may occur with idle connections
- For production, use proper ingress/load balancers