Debug issues in Kubernetes
Debugging issues in Kubernetes can be difficult and debugging always begins with diagnosing the problem. Finding the information you need to diagnose a failing Deployment or Pod isn't obvious.
If a YAML file applies without error then your app should work, right? Not necessarily, the YAML could be valid but the configuration or image name or any of dozens of other things could be awry.
Let's look at some common errors and techniques to diagnose issues with your Pods.
Cluster status checks
First, checking the status of your cluster components is helpful if your problems aren't isolated to a single Pod or Deployment:
A issue with any of those components can cause issues across your entire cluster. Cluster components can fail for many reasons, from a failed VM to network issues between nodes to corrupt etcd data. The Kubernetes documentation does a good job covering these issues and possible mitigations.
Basic Deployment and Pod status checks
Check the status of your Deployments:
test Deployment seems to have a problem as its Pod is not ready. Check the Pod's status next:
The Pod has an
ImagePullBackOff status. This particular error means Kubernetes could not retrieve the container image for some reason: the name was misspelled, the specified tag doesn't exist, or the repository is private and Kubernetes doesn't have access.
We can see more detail by getting a description of the Pod:
Here we see the
ImagePullBackOff again and, looking at the image name, the obvious reason why it's failing.
Another very common error you will see when a Pod won't run is
CrashLoopBackOff. Kubernetes expects a Pod to start and run continuously. This is by design so that if the app running in a Pod does crash or can't start for any reason, Kubernetes will pick up on the exit error and restart the Pod (unless different behavior is specified with the
restartPolicy on the Pod
spec). If this happens too many times within a set period then Kubernetes assumes there is a problem with the Pod, stops trying to restart it, and returns
Start a Pod with this command:
Wait a moment, then check the status:
What happened? Describe the Pod to get it's startup events:
This outputs a lot of information, but the events we need are at the bottom. Kubernetes pulled the image, started the container, then backed off after restarting the container multiple times. But why?
Kubernetes keeps the logs from the container's runtime environment. View them with
kubectl logs <pod_name>:
Here we can see the error preventing MySQL from starting: it's expecting a password environment variable to be set upon initial startup.
For log streaming, kail is a handy tool for viewing logs in real time. After installing, you can run
kail -p <pod_name> to start a stream of that Pod's logs.
Delete the MySQL Pod, start kail in a new terminal window, and then rerun MySQL, this time setting the root password environment variable:
You should see kail stream the MySQL logs as it starts up (successfully this time).
If a Pod has been running for a while and has accumulated a giant log, or you want to see the logs only from the time the Pod starts, you can restart a Deployment with
kubectl rollout restart deployment <deployment_name>. This will start up new Pods before shutting down old ones, allowing for a restart without interrupting your service uptime.
Exec into Pods
Sometimes a Pod will start OK, but not behave as expected. If logs aren't helpful, you can always connect to the Pod's shell by running
kubectl exec -it <pod_name> -- /bin/bash. This should give you a terminal on the Pod as whatever user it is running as. From here you can curl other Pods by name, confirm your ConfigMaps mounted correctly, or any other diagnosing that is relevant to your app.
Depending on the base image the container was built on, you might have to use a different shell such as
/bin/ash. Also, given the lightweight nature of many Docker base images, you might find many commands like
vim need to be installed after an
apt-get update or