Retries
The retries mechanism allows to retry a failing operation according to a given policy (e.g. retry 3 times with a 100ms delay between attempts).
API
The basic syntax for retries is:
import ox.resilience.retry
retry(operation)(policy)
Operation definition
The operation
can be provided directly using a by-name parameter, i.e. f: => T
.
There’s also a retryEither
variant which accepts a by-name Either[E, T]
, i.e. f: => Either[E, T]
, as well as one
which accepts arbitrary error modes, accepting the computation in an F
context: f: => F[T]
.
Policies
A retry policy consists of three parts:
a
Schedule
, which indicates how many times and with what delay should we retry theoperation
after an initial failure,a
ResultPolicy
, which indicates whether:a non-erroneous outcome of the
operation
should be considered a success (if not, theoperation
would be retried),an erroneous outcome of the
operation
should be retried or fail fast.
a
onRetry
, which is a callback function that is invoked after each attempt to execute the operation. It is used to perform any necessary actions or checks after each attempt, regardless of whether the attempt was successful or not.
The available schedules are defined in the Schedule
object. Each schedule has a finite and an infinite variant.
Finite schedules
Finite schedules have a common maxRetries: Int
parameter, which determines how many times the operation
would be
retried after an initial failure. This means that the operation could be executed at most maxRetries + 1
times.
Infinite schedules
Each finite schedule has an infinite variant, whose settings are similar to those of the respective finite schedule, but
without the maxRetries
setting. Using the infinite variant can lead to a possibly infinite number of retries (unless
the operation
starts to succeed again at some point). The infinite schedules are created by calling .forever
on the
companion object of the respective finite schedule (see examples below).
Schedule types
The supported schedules (specifically - their finite variants) are:
Immediate(maxRetries: Int)
- retries up tomaxRetries
times without any delay between subsequent attempts.Delay(maxRetries: Int, delay: FiniteDuration)
- retries up tomaxRetries
times , sleeping fordelay
between subsequent attempts.Backoff(maxRetries: Int, initialDelay: FiniteDuration, maxDelay: FiniteDuration, jitter: Jitter)
- retries up tomaxRetries
times , sleeping forinitialDelay
before the first retry, increasing the sleep between subsequent attempts exponentially (with base2
) up to an optionalmaxDelay
(default: 1 minute).Optionally, a random factor (jitter) can be used when calculating the delay before the next attempt. The purpose of jitter is to avoid clustering of subsequent retries, i.e. to reduce the number of clients calling a service exactly at the same time, which can result in subsequent failures, contrary to what you would expect from retrying. By introducing randomness to the delays, the retries become more evenly distributed over time.
See the AWS Architecture Blog article on backoff and jitter for a more in-depth explanation.
The following jitter strategies are available (defined in the
Jitter
enum):None
- the default one, when no randomness is added, i.e. a pure exponential backoff is used,Full
- picks a random value between0
and the exponential backoff calculated for the current attempt,Equal
- similar toFull
, but prevents very short delays by always using a half of the original backoff and adding a random value between0
and the other half,Decorrelated
- uses the delay from the previous attempt (lastDelay
) and picks a random value between theinitalAttempt
and3 * lastDelay
.
Result policies
A result policy allows to customize how the results of the operation
are treated. It consists of two predicates:
isSuccess: T => Boolean
(default:true
) - determines whether a non-erroneous result of theoperation
should be considered a success. When it evaluates totrue
- no further attempts would be made, otherwise - we’d keep retrying.With finite schedules (i.e. those with
maxRetries
defined), ifisSuccess
keeps returningfalse
whenmaxRetries
are reached, the result is returned as-is, even though it’s considered “unsuccessful”,isWorthRetrying: E => Boolean
(default:true
) - determines whether another attempt would be made if theoperation
results in an errorE
. When it evaluates totrue
- we’d keep retrying, otherwise - we’d fail fast with the error.
The ResultPolicy[E, T]
is generic both over the error (E
) and result (T
) type. Note, however, that for the direct
variant retry
, the error type E
is fixed to Throwable
, while for the Either
and error-mode variants, E
can ba
an arbitrary type.
On retry
The callback function has the following signature:
(Int, Either[E, T]) => Unit
Where:
The first parameter, an
Int
, represents the attempt number of the retry operation.The second parameter is an
Either[E, T]
type, representing the result of the retry operation. Left represents an error and Right represents a successful result.
API shorthands
When you don’t need to customize the result policy (i.e. use the default one), you can use one of the following
shorthands to define a retry policy with a given schedule (note that the parameters are the same as when manually
creating the respective Schedule
):
RetryPolicy.immediate(maxRetries: Int)
,RetryPolicy.immediateForever
,RetryPolicy.delay(maxRetries: Int, delay: FiniteDuration)
,RetryPolicy.delayForever(delay: FiniteDuration)
,RetryPolicy.backoff(maxRetries: Int, initialDelay: FiniteDuration, maxDelay: FiniteDuration, jitter: Jitter)
,RetryPolicy.backoffForever(initialDelay: FiniteDuration, maxDelay: FiniteDuration, jitter: Jitter)
.
If you want to customize a part of the result policy, you can use the following shorthands:
ResultPolicy.default[E, T]
- uses the default settings,ResultPolicy.successfulWhen[E, T](isSuccess: T => Boolean)
- uses the defaultisWorthRetrying
and the providedisSuccess
,ResultPolicy.retryWhen[E, T](isWorthRetrying: E => Boolean)
- uses the defaultisSuccess
and the providedisWorthRetrying
,ResultPolicy.neverRetry[E, T]
- uses the defaultisSuccess
and fails fast on any error.
Examples
import ox.UnionMode
import ox.resilience.{retry, retryEither, retryWithErrorMode}
import ox.resilience.{Jitter, ResultPolicy, RetryPolicy, Schedule}
import scala.concurrent.duration.*
def directOperation: Int = ???
def eitherOperation: Either[String, Int] = ???
def unionOperation: String | Int = ???
// various operation definitions - same syntax
retry(RetryPolicy.immediate(3))(directOperation)
retryEither(RetryPolicy.immediate(3))(eitherOperation)
// various policies with custom schedules and default ResultPolicy
retry(RetryPolicy.delay(3, 100.millis))(directOperation)
retry(RetryPolicy.backoff(3, 100.millis))(directOperation) // defaults: maxDelay = 1.minute, jitter = Jitter.None
retry(RetryPolicy.backoff(3, 100.millis, 5.minutes, Jitter.Equal))(directOperation)
// infinite retries with a default ResultPolicy
retry(RetryPolicy.delayForever(100.millis))(directOperation)
retry(RetryPolicy.backoffForever(100.millis, 5.minutes, Jitter.Full))(directOperation)
// result policies
// custom success
retry[Int](RetryPolicy(Schedule.Immediate(3), ResultPolicy.successfulWhen(_ > 0)))(directOperation)
// fail fast on certain errors
retry(RetryPolicy(Schedule.Immediate(3), ResultPolicy.retryWhen(_.getMessage != "fatal error")))(directOperation)
retryEither(RetryPolicy(Schedule.Immediate(3), ResultPolicy.retryWhen(_ != "fatal error")))(eitherOperation)
// custom error mode
retryWithErrorMode(UnionMode[String])(RetryPolicy(Schedule.Immediate(3), ResultPolicy.retryWhen(_ != "fatal error")))(unionOperation)
See the tests in ox.resilience.*
for more.