Besides retries, circuit breakers1 are probably one of the most commonly employed resilience patterns in distributed systems. While writing a retry routine is pretty simple, implementing a circuit breaker needs a little bit of work.
I realized that I usually just go for off-the-shelf libraries for circuit breaking and haven’t written one from scratch before. So, this is an attempt to create a sloppy one in Go. I picked Go instead of Python because I didn’t want to deal with sync-async idiosyncrasies or abstract things away under a soup of decorators.
Circuit breakers
A circuit breaker acts like an automatic switch that prevents your application from repeatedly trying to execute an operation that’s likely to fail. In a distributed system, you don’t want to bombard a remote service when it’s already failing, and circuit breakers prevent that.
It has three states: Closed, Open, and Half-Open. Here’s a diagram that shows the state transitions:
Closed: This is the healthy operating state where all requests are allowed to pass through to the service. If a certain number of consecutive requests fail (reaching a failure threshold), the circuit breaker switches to the Open state.
Open: In this state, all requests are immediately blocked, and an error is returned to the caller without attempting to contact the failing service. This prevents overwhelming the service and gives it time to recover. After a predefined recovery period, the circuit breaker transitions to the Half-Open state.
Half-Open: The circuit breaker allows a limited number of test requests to see if the service has recovered. If these requests succeed, it transitions back to the Closed state. If any of them fail, it goes back to the Open state.
Building one in Go
Here’s a simple circuit breaker in Go.
Defining states
First, we’ll define the constants for our states and create the circuitBreaker
struct,
which holds all the configurable knobs.
// The three possible states of a circuit breaker
const (
Closed = "closed"
Open = "open"
HalfOpen = "half-open"
)
// circuitBreaker manages the state and behavior of the circuit breaker
type circuitBreaker struct {
mu sync.Mutex // Guards the circuit breaker state
state string // Current state of the circuit breaker
failureCount int // Number of consecutive failures
lastFailureTime time.Time // Time of the last failure
halfOpenSuccessCount int // Successful requests in half-open state
failureThreshold int // Failures to trigger open state
recoveryTime time.Duration // Wait time before half-open
halfOpenMaxRequests int // Requests allowed in half-open state
timeout time.Duration // Timeout for requests
}
This struct includes:
mu
: A mutex to ensure thread-safe access to the circuit breaker.state
: The current state of the circuit breaker (Closed
,Open
, orHalfOpen
).failureCount
: The current count of consecutive failures.lastFailureTime
: The timestamp of the last failure.halfOpenSuccessCount
: The number of successful requests in theHalfOpen
state.failureThreshold
: The number of consecutive failures allowed before opening the circuit.recoveryTime
: The cool-down period before the circuit breaker transitions fromOpen
toHalfOpen
.halfOpenMaxRequests
: The maximum number of successful requests needed to close the circuit.timeout
: The maximum duration to wait for a request to complete.
Initializing the breaker
Next, we provide a constructor function to initialize a new circuitBreaker
instance.
// NewCircuitBreaker initializes a new CircuitBreaker
func NewCircuitBreaker(
failureThreshold int,
recoveryTime time.Duration,
halfOpenMaxRequests int,
timeout time.Duration,
) *circuitBreaker {
return &circuitBreaker{
state: Closed,
failureThreshold: failureThreshold,
recoveryTime: recoveryTime,
halfOpenMaxRequests: halfOpenMaxRequests,
timeout: timeout,
}
}
This function sets the initial state to Closed
and initializes the thresholds and timeout.
Implementing the Call method
The Call
method is the primary interface for executing functions through the circuit
breaker. It dispatches the appropriate state handler based on the current state.
// Call attempts to execute the provided function, managing state transitions
func (cb *circuitBreaker) Call(fn func() (any, error)) (any, error) {
cb.mu.Lock()
defer cb.mu.Unlock()
slog.Info("Making a request", "state", cb.state)
switch cb.state {
case Closed:
return cb.handleClosedState(fn)
case Open:
return cb.handleOpenState()
case HalfOpen:
return cb.handleHalfOpenState(fn)
default:
return nil, errors.New("unknown circuit state")
}
}
We use a mutex to protect against concurrent access since the circuit breaker might be used
by multiple goroutines. The Call
method uses a switch statement to delegate the function
call to the appropriate handler based on the current state.
Handling closed states
In the Closed
state, all requests are allowed to pass through. We monitor the requests for
failures to decide when to trip the circuit breaker.
// handleClosedState executes the function and monitors failures
func (cb *circuitBreaker) handleClosedState(fn func() (any, error)) (any, error) {
result, err := cb.runWithTimeout(fn)
if err != nil {
slog.Warn(
"Request failed in closed state",
"failureCount", cb.failureCount+1,
)
cb.failureCount++
cb.lastFailureTime = time.Now()
if cb.failureCount >= cb.failureThreshold {
cb.state = Open
slog.Error("Failure threshold reached, transitioning to open")
}
return nil, err
}
slog.Info("Request succeeded in closed state")
cb.resetCircuit()
return result, nil
}
In this function:
- We attempt to execute the provided function
fn
usingrunWithTimeout
to handle possible timeouts. - If the function call fails, we increment the
failureCount
and updatelastFailureTime
. - If the
failureCount
reaches thefailureThreshold
, we transition the circuit to theOpen
state. - If the function call succeeds, we reset the circuit breaker to the
Closed
state by callingresetCircuit
.
Resetting the breaker
When a request succeeds, we reset the failure count and keep the circuit in the Closed
state.
// resetCircuit resets the circuit breaker to closed state
func (cb *circuitBreaker) resetCircuit() {
cb.failureCount = 0
cb.state = Closed
slog.Info("Circuit reset to closed state")
}
Handling open states
In the Open
state, all requests are blocked to prevent further strain on the failing
service. We check if the recovery period has expired before transitioning to the HalfOpen
state.
// handleOpenState blocks requests if recovery time hasn't passed
func (cb *circuitBreaker) handleOpenState() (any, error) {
if time.Since(cb.lastFailureTime) > cb.recoveryTime {
cb.state = HalfOpen
cb.halfOpenSuccessCount = 0
cb.failureCount = 0
slog.Info("Recovery period over, transitioning to half-open")
return nil, nil
}
slog.Warn("Circuit is still open, blocking request")
return nil, errors.New("circuit open, request blocked")
}
Here:
- We check if the recovery period (
recoveryTime
) has passed since the last failure. - If it has, we transition to the
HalfOpen
state and reset the counters. - If not, we block the request and return an error immediately.
Handling half-open states
In the HalfOpen
state, we allow a limited number of requests to test if the service has
recovered.
// handleHalfOpenState executes the function and checks for recovery
func (cb *circuitBreaker) handleHalfOpenState(
fn func() (any, error)) (any, error) {
result, err := cb.runWithTimeout(fn)
if err != nil {
slog.Error("Failed in half-open state, transitioning to open")
cb.state = Open
cb.lastFailureTime = time.Now()
return nil, err
}
cb.halfOpenSuccessCount++
slog.Info("Succeeded in half-open",
"successCount", cb.halfOpenSuccessCount)
if cb.halfOpenSuccessCount >= cb.halfOpenMaxRequests {
slog.Info("Max success, transitioning to closed")
cb.resetCircuit()
}
return result, nil
}
In this function:
- We attempt to execute the provided function
fn
. - If the function call fails, we transition back to the
Open
state. - If the function call succeeds, we increment
halfOpenSuccessCount
. - Once the success count reaches
halfOpenMaxRequests
, we reset the circuit breaker to theClosed
state.
Running functions with timeout
To prevent the circuit breaker from hanging on slow or unresponsive functions, we implement
a timeout mechanism. You probably noticed that inside each state handler we called the
wrapped functions with runWithTimeout
.
// runWithTimeout executes the provided function with a timeout
func (cb *circuitBreaker) runWithTimeout(fn func() (any, error)) (any, error) {
ctx, cancel := context.WithTimeout(context.Background(), cb.timeout)
defer cancel()
resultChan := make(chan struct {
result any
err error
}, 1)
go func() {
result, err := fn()
resultChan <- struct {
result any
err error
}{result, err}
}()
select {
case <-ctx.Done():
return nil, errors.New("request timed out")
case res := <-resultChan:
return res.result, res.err
}
}
This function:
- Creates a context with a timeout using
context.WithTimeout
. - Executes the provided function
fn
in a separate goroutine. - Waits for either the result or the timeout.
- Returns an error if the function takes longer than the specified timeout.
Taking it for a spin
Let’s test our circuit breaker with an unreliable service that sometimes fails.
func unreliableService() (any, error) {
if time.Now().Unix()%2 == 0 {
return nil, errors.New("service failed")
}
return "Success!", nil
}
In the main
function, we’ll create a circuit breaker and make several calls to the
unreliable service.
func main() {
cb := cb.NewCircuitBreaker(
2, // Failure threshold
2*time.Second, // Recovery time
2, // Half-open max requests
2*time.Second, // Half-open max time
)
for i := 0; i < 5; i++ {
result, err := cb.Call(unreliableService)
if err != nil {
slog.Error("Service request failed", "error", err)
} else {
slog.Info("Service request succeeded", "result", result)
}
time.Sleep(1 * time.Second)
log.Println("-------------------------------------------")
}
}
This loop simulates multiple service calls, using the circuit breaker to handle failures and transitions between states.
This prints:
2024/10/06 17:24:27 INFO Making a request state=closed
2024/10/06 17:24:27 INFO Request succeeded in closed state
2024/10/06 17:24:27 INFO Circuit reset to closed state
2024/10/06 17:24:27 INFO Service request succeeded result=42
2024/10/06 17:24:28 -----------------------------------------------
2024/10/06 17:24:28 INFO Making a request state=closed
2024/10/06 17:24:28 WARN Request failed in closed state failureCount=1
2024/10/06 17:24:28 ERROR Service request failed error="service failed"
2024/10/06 17:24:29 -----------------------------------------------
2024/10/06 17:24:29 INFO Making a request state=closed
2024/10/06 17:24:29 INFO Request succeeded in closed state
2024/10/06 17:24:29 INFO Circuit reset to closed state
2024/10/06 17:24:29 INFO Service request succeeded result=42
2024/10/06 17:24:30 -----------------------------------------------
2024/10/06 17:24:30 INFO Making a request state=closed
2024/10/06 17:24:30 WARN Request failed in closed state failureCount=1
2024/10/06 17:24:30 ERROR Service request failed error="service failed"
2024/10/06 17:24:31 -----------------------------------------------
2024/10/06 17:24:31 INFO Making a request state=closed
2024/10/06 17:24:31 INFO Request succeeded in closed state
2024/10/06 17:24:31 INFO Circuit reset to closed state
2024/10/06 17:24:31 INFO Service request succeeded result=42
2024/10/06 17:24:32 -----------------------------------------------
The log messages will give you a sense of what’s happening when we retry an intermittently failing function wrapped in a circuit breaker.
The API could be better
One limitation of Go generics is that you can’t use type parameters with methods that have a
receiver. This means you can’t define a method like
func (cb *CircuitBreaker[T]) Call(fn func() (T, error)) (T, error)
.
For this, we have to use workarounds such as using any
(an alias for interface{}
) as the
return type in our function signatures. While this sacrifices some type safety, it allows us
to create a flexible circuit breaker that can handle functions returning different types.
Handling incompatible function signatures
What if the function you want to wrap doesn’t match the func() (any, error)
signature? You
can easily adapt it by wrapping your function to fit the required signature.
Suppose you have a function like this:
func fetchData(id int) (Data, error) {
// ... implementation ...
}
You can wrap it like this:
wrappedFunc := func() (any, error) {
return fetchData(42) // Replace 42 with your desired argument
}
Now, wrappedFunc
matches the func() (any, error)
signature and can be used with our
circuit breaker.
Here’s the complete implementation2 with tests.
Recent posts
- SSH saga
- Injecting Pytest fixtures without cluttering test signatures
- Explicit method overriding with @typing.override
- Quicker startup with module-level __getattr__
- Docker mount revisited
- Topological sort
- Discovering direnv
- Notes on building event-driven systems
- Bash namerefs for dynamic variable referencing
- Behind the blog