Istio’s default retry behavior
This article discusses Istio’s default retry behavior and how to check if it is happening.
Using Istio’s traffic rules one can easily control service-level properties like circuit breakers, timeouts, and retries to make the services more resilient or to setup A/B testing.
But there are certain default behaviors that may not suit your application behavior. One of them being the default retry behavior.
As per Istio’s documentation:
A retry setting specifies the maximum number of times an Envoy proxy attempts to connect to a service if the initial call fails. Retries can enhance service availability and application performance by making sure that calls don’t fail permanently because of transient problems such as a temporarily overloaded service or network. The interval between retries (25ms+) is variable and determined automatically by Istio, preventing the called service from being overwhelmed with requests. The default retry behavior for HTTP requests is to retry twice before returning the error.
So, if you have a service where retrying requests may be expensive (ex. db transactions) this default behavior will cause unexpected application behavior. You would be left wondering why are retries happening!!
I could not find any Istio documentation to check it and hence this article.
TLDR;
- Enable debug logs for istio-proxy:
istioctl pc log --level debug deploy/$DEPLOYMENT_NAME -n $NAMESPACE
2. Check the logs for retry behavior
kubectl logs deploy/$DEPLOYMENT_NAME -c istio-proxy -n $NAMESPACE | grep "x-envoy-attempt-count"
Prerequisites
- Set up Istio by following the instructions in the Installation guide.
- Deploy the Bookinfo sample application
- All the below commands assume that you have kubeconfig configured to the namespace in which bookinfo is installed
Set up Virtual Service and Destination Rules
Apply the following definitions in the namespace where bookinfo is installed
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: ratings
spec:
host: ratings
subsets:
- name: v1
labels:
version: v1
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: ratings
spec:
hosts:
- ratings
http:
- route:
- destination:
host: ratings
subset: v1
We are not setting any retry behavior in VirtualService. So, this should cause Istio’s default behavior to kick in.
Enable debug logs for istio-proxy
istioctl pc log --level debug deploy/ratings-v1
Simulate App failure
Patch the ratings deployment to add a sleep
kubectl patch deploy ratings-v1 --patch '{"spec":{"template":{"spec":{"containers":[{"name":"ratings","command":["sleep","1h"]}]}}}}'
Send some traffic
Use a curl pod to send some traffic to ratings service
kubectl run -it curl --image=curlimages/curl:7.73.0 --rm -- sh
curl -v http://ratings:9080/ratings/1
Check istio-proxy logs
kubectl logs deploy/ratings -c istio-proxy | grep "x-envoy-attempt-count"
You should see such logs showing retry was attempted:
'x-envoy-attempt-count', '1'
'x-envoy-attempt-count', '1'
'x-envoy-attempt-count', '2'
'x-envoy-attempt-count', '2'
'x-envoy-attempt-count', '3'
'x-envoy-attempt-count', '3'
You can get more details to see routing
kubectl logs deploy/ratings-v1 -c istio-proxy | grep -B 10 "x-envoy-attempt-count"
2022-11-23T17:44:25.179776Z debug envoy router [C49][S10562819190590727274] cluster 'inbound|9080||' match for URL '/ratings/1'
2022-11-23T17:44:25.179821Z debug envoy router [C49][S10562819190590727274] router decoding headers:
':authority', 'ratings:9080'
':path', '/ratings/1'
':method', 'GET'
':scheme', 'http'
'user-agent', 'curl/7.73.0-DEV'
'accept', '*/*'
'x-forwarded-proto', 'http'
'x-request-id', '04dc377f-5052-92f8-9aa3-8a52069172df'
'x-envoy-attempt-count', '3'