How to Start Troubleshooting CrashLoopBackoff Errors in Kubernetes Using a Blocking Command
Everyone who has worked with Kubernetes has seen that awful status before — CrashLoopBackoff. A CrashLoopBackoff indicates that the process running in your container is failing. Your container’s process could fail for a variety of reasons. Perhaps you are trying to run a server that is failing to load a configuration file. Or, maybe you are trying to deploy an application that fails due to being unable to reach another service.
In an attempt to recover from CrashLoopBackoff errors, Kubernetes will continuously restart the pod, but often there is something fundamentally wrong with your process, and a simple restart will not work. Most times, you need to correct something with your image or the application that you are trying to run.
One quick way you can begin troubleshooting a CrashLoopBackoff error is to bypass it in a separate deployment using a blocking command. The new deployment will still use your image, but you’ll override the command with a blocking command such as sleep infinity. Doing this will allow the pod to run persistently and will enable you to access the pod’s terminal so you can troubleshoot.
Imagine you were trying to deploy a Wildfly instance but were getting a CrashLoopBackoff error. You could create a deployment similar to the following to create a persistently running Wildfly pod for troubleshooting purposes:
- image: docker.io/jboss/wildfly:20.0.0.Final
command: ["sleep", "infinity"]
Notice the final line above. This is the blocking command that will allow your container to run persistently and bypass the CrashLoopBackoff. Once the pod is up and running, you can access the terminal using the kubectl exec command, as shown:
kubectl exec -it deploy/wildfly-test -n test-ns -- /bin/bash
You’ve Accessed the Terminal. What Now?
Now it’s time to start troubleshooting your container. From here, the task becomes fairly open-ended, but here are some common issues that I have solved that might apply to your use case and help you understand where to start:
- Many errors I have experienced had to do with environment variables that were unset or incorrect. I often use the env command to inspect environment variables that my application or process expects and make sure that they are correct.
- Sometimes, an application may be unable to access other services. If I know that my application needs to access another service or endpoint but have a suspicion that this call is failing, I may try to “curl” it manually inside the pod. I usually use curl -v so that I get verbose output. Often, I either get a timeout or an x.509 insecure error when troubleshooting networking issues, which usually ends up being the root cause.
- An application may fail to start due to being misconfigured or due to a missing configuration file. I troubleshoot this issue by inspecting the locations that I expect my application’s files to be with tools like ls, find, cat, and less. Using “ls” and “find” help make sure that a file exists. Using “cat” and “less” is helpful to inspect files and check that they are not misconfigured.
Often when troubleshooting a CrashLoopBackoff error, the application logs are also revealing. Use this command to check the logs (from outside your pod’s terminal):
kubectl logs -f deploy/$APPLICATION -n $NAMESPACE
Watch out for any errors, warnings, or stack traces. Take note of these so that you can focus on these particular issues when you troubleshoot the CrashLoopBackoff error inside your pod’s terminal.
Thanks for Reading!
Try using this trick next time you encounter a CrashLoopBackoff error. Use a blocking command like “sleep infinity” to bypass the CrashLoopBackoff and gain entry to your pod. Once inside, you’ll be able to inspect your pod in greater detail to help determine the root cause of your CrashLoopBackoff issue.
Originally published at https://austindewey.com on June 30, 2020.