6 Best Practices for Effective Readiness and Liveness Probes

6 Best Practices for Effective Readiness and Liveness Probes

The biggest mistake Kubernetes admins make with health probes is configuring the probes the same way for all apps. A payment app does not need to meet the same standards as a log collecting app - so why should their probes be configured the same way?

For this reason, we recently added to Datree a policy rule that verifies you are customizing the health probes’ parameters rather than using the default values. You can view the full list of policy rules here.

Yet, you might ask yourself what values you should give to those parameters. If this is the case, read on to learn 6 best practices for configuring your liveness and readiness probes.

1. Configure the probes based on how long it takes your app to load

If your app has a long startup sequence, give it some time before initiating the probe. Otherwise, Kubernetes might deem your application inoperable, even when this is not the case. You can do this by increasing the initialDelaySeconds parameter.

datree logo

Reminder: Health probes’ parameters

The initialDelaySeconds parameter defines the number of seconds between starting the container and initiating the liveness or readiness probes. The default value is 0 seconds.

The timeoutSeconds parameter defines how many seconds pass until the probe times out. The default value is 1 second.

The periodSeconds parameter defines how often the probe should be performed, in seconds. The default value is 10 seconds.

The successThreshold parameter defines the minimum number of repeating, consecutive successes required for a probe to be considered successful, after a failure. The default value is 1.

The failureThreshold configuration defines the number of times a probe fails, after which Kuberenetes no longer tries to restart the container. The default value is 3.

2. And also based on how long it takes your app to respond

Similar to #1, some applications take a while to respond. If this is the case with your application, you should give it some time to do so by increasing the timeoutSeconds parameter.

3. Raise the bar for your critical apps

Mission-critical apps, like a payment application, should be probed frequently. You can achieve this by decreasing the periodSeconds parameter. Additionally, you should increase the successThreshold and decrease failureThreshold to make absolutely sure no traffic goes to an unavailable pod.

4. And lower it for your non-critical apps

The opposite is true as well. If your app is not critical you should not probe it as frequently and should not be as strict with the success and failure definitions. Practically, this means increasing the periodSeconds, decreasing the successThreshold and increasing the failureThreshold.

5. Give your flaky apps some wiggle room

If you know your app has timeouts or networking issues, you should give it more time to actually respond before you decide to kill it. Do this by increasing the failureThreshold as well as the periodSeconds.

6. Check your entire application to determine its real state

This one is not really about the parameters, but it is important enough to be mentioned anyway:

Don’t set up a high level HTTP check to an endpoint that returns a general response 200.

This won’t tell you the real state of your app. Instead, set the probe to check all the dependencies of the application. For example, if your application talks to your database and a cache, have the probe do the same.

Learn from Nana, AWS Hero & CNCF Ambassador, how to enforce K8s best practices with Datree

Watch Now

🍿 Techworld with Nana: How to enforce Kubernetes best practices and prevent misconfigurations from reaching production. Watch now.

Headingajsdajk jkahskjafhkasj khfsakjhf

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Reveal misconfigurations within minutes

3 Quick Steps to Get Started