Skip to content

Phase 8.1: Implement Polly Circuit Breaker Pattern #108

@artcava

Description

@artcava

📋 Task Description

Implement circuit breaker pattern using Polly to prevent cascading failures when external services are unavailable. Configure break duration, failure thresholds, and automatic recovery testing to protect the system from prolonged outages.

🎯 Objectives

  • Implement circuit breaker policies for HTTP clients
  • Implement circuit breaker policies for database operations
  • Implement circuit breaker policies for message broker
  • Configure failure thresholds and break duration
  • Implement half-open state for recovery testing
  • Add circuit state change notifications
  • Integrate with retry policies (wrap pattern)
  • Add comprehensive logging for circuit state changes
  • Expose circuit breaker metrics
  • Write unit tests for circuit breaker behavior
  • Write integration tests with failure simulation
  • Document circuit breaker configuration and monitoring

📦 Deliverables

1. Create Circuit Breaker Configuration

Create src/StarGate.Infrastructure/Resilience/CircuitBreakerConfiguration.cs:

namespace StarGate.Infrastructure.Resilience;

/// <summary>
/// Configuration for circuit breaker policies.
/// </summary>
public class CircuitBreakerConfiguration
{
    /// <summary>
    /// Number of consecutive failures before breaking the circuit.
    /// </summary>
    public int FailureThreshold { get; set; } = 5;

    /// <summary>
    /// Percentage of failures in sampling duration before breaking.
    /// </summary>
    public double FailureRateThreshold { get; set; } = 0.5; // 50%

    /// <summary>
    /// Minimum throughput before considering failure rate.
    /// </summary>
    public int MinimumThroughput { get; set; } = 10;

    /// <summary>
    /// Duration to keep circuit open before testing recovery (seconds).
    /// </summary>
    public double BreakDurationSeconds { get; set; } = 30.0;

    /// <summary>
    /// Duration to sample for failure rate calculation (seconds).
    /// </summary>
    public double SamplingDurationSeconds { get; set; } = 60.0;

    /// <summary>
    /// Gets the break duration as TimeSpan.
    /// </summary>
    public TimeSpan BreakDuration => TimeSpan.FromSeconds(BreakDurationSeconds);

    /// <summary>
    /// Gets the sampling duration as TimeSpan.
    /// </summary>
    public TimeSpan SamplingDuration => TimeSpan.FromSeconds(SamplingDurationSeconds);
}

2. Create Circuit Breaker Factory

Create src/StarGate.Infrastructure/Resilience/CircuitBreakerFactory.cs:

namespace StarGate.Infrastructure.Resilience;

using Microsoft.Extensions.Logging;
using Polly;
using Polly.CircuitBreaker;

/// <summary>
/// Factory for creating Polly circuit breaker policies.
/// </summary>
public static class CircuitBreakerFactory
{
    /// <summary>
    /// Creates a circuit breaker policy for HTTP operations.
    /// </summary>
    public static AsyncCircuitBreakerPolicy<HttpResponseMessage> CreateHttpCircuitBreaker(
        CircuitBreakerConfiguration config,
        ILogger logger)
    {
        return Policy
            .HandleResult<HttpResponseMessage>(r => !r.IsSuccessStatusCode)
            .Or<HttpRequestException>()
            .Or<TimeoutException>()
            .AdvancedCircuitBreakerAsync(
                failureThreshold: config.FailureRateThreshold,
                samplingDuration: config.SamplingDuration,
                minimumThroughput: config.MinimumThroughput,
                durationOfBreak: config.BreakDuration,
                onBreak: (outcome, breakDuration, context) =>
                {
                    var statusCode = outcome.Result?.StatusCode.ToString() ?? "N/A";
                    var exception = outcome.Exception?.GetType().Name ?? "None";

                    logger.LogError(
                        "HTTP circuit breaker opened: StatusCode={StatusCode}, Exception={Exception}, BreakDuration={BreakDuration}s",
                        statusCode,
                        exception,
                        breakDuration.TotalSeconds);
                },
                onReset: context =>
                {
                    logger.LogInformation("HTTP circuit breaker reset: Circuit closed");
                },
                onHalfOpen: () =>
                {
                    logger.LogWarning("HTTP circuit breaker half-open: Testing recovery");
                });
    }

    /// <summary>
    /// Creates a circuit breaker policy for database operations.
    /// </summary>
    public static AsyncCircuitBreakerPolicy CreateDatabaseCircuitBreaker(
        CircuitBreakerConfiguration config,
        ILogger logger)
    {
        return Policy
            .Handle<TimeoutException>()
            .Or<IOException>()
            .Or<InvalidOperationException>(ex => ex.Message.Contains("connection"))
            .AdvancedCircuitBreakerAsync(
                failureThreshold: config.FailureRateThreshold,
                samplingDuration: config.SamplingDuration,
                minimumThroughput: config.MinimumThroughput,
                durationOfBreak: config.BreakDuration,
                onBreak: (exception, breakDuration, context) =>
                {
                    logger.LogError(
                        exception,
                        "Database circuit breaker opened: Exception={Exception}, BreakDuration={BreakDuration}s",
                        exception.GetType().Name,
                        breakDuration.TotalSeconds);
                },
                onReset: context =>
                {
                    logger.LogInformation("Database circuit breaker reset: Circuit closed");
                },
                onHalfOpen: () =>
                {
                    logger.LogWarning("Database circuit breaker half-open: Testing recovery");
                });
    }

    /// <summary>
    /// Creates a circuit breaker policy for message broker operations.
    /// </summary>
    public static AsyncCircuitBreakerPolicy CreateBrokerCircuitBreaker(
        CircuitBreakerConfiguration config,
        ILogger logger)
    {
        return Policy
            .Handle<IOException>()
            .Or<TimeoutException>()
            .Or<InvalidOperationException>(ex => ex.Message.Contains("connection"))
            .AdvancedCircuitBreakerAsync(
                failureThreshold: config.FailureRateThreshold,
                samplingDuration: config.SamplingDuration,
                minimumThroughput: config.MinimumThroughput,
                durationOfBreak: config.BreakDuration,
                onBreak: (exception, breakDuration, context) =>
                {
                    logger.LogError(
                        exception,
                        "Broker circuit breaker opened: Exception={Exception}, BreakDuration={BreakDuration}s",
                        exception.GetType().Name,
                        breakDuration.TotalSeconds);
                },
                onReset: context =>
                {
                    logger.LogInformation("Broker circuit breaker reset: Circuit closed");
                },
                onHalfOpen: () =>
                {
                    logger.LogWarning("Broker circuit breaker half-open: Testing recovery");
                });
    }
}

3. Create Resilience Policy Wrapper

Create src/StarGate.Infrastructure/Resilience/ResiliencePolicyWrapper.cs:

namespace StarGate.Infrastructure.Resilience;

using Microsoft.Extensions.Logging;
using Polly;
using Polly.Wrap;

/// <summary>
/// Wraps retry and circuit breaker policies together.
/// </summary>
public static class ResiliencePolicyWrapper
{
    /// <summary>
    /// Creates a wrapped policy with retry inside circuit breaker for HTTP.
    /// </summary>
    public static AsyncPolicyWrap<HttpResponseMessage> CreateHttpResiliencePolicy(
        RetryPolicyConfiguration retryConfig,
        CircuitBreakerConfiguration circuitConfig,
        ILogger logger)
    {
        var retryPolicy = RetryPolicyFactory.CreateHttpRetryPolicy(retryConfig, logger);
        var circuitBreaker = CircuitBreakerFactory.CreateHttpCircuitBreaker(circuitConfig, logger);

        // Wrap: Circuit Breaker (outer) -> Retry (inner)
        return Policy.WrapAsync(circuitBreaker, retryPolicy);
    }

    /// <summary>
    /// Creates a wrapped policy with retry inside circuit breaker for database.
    /// </summary>
    public static AsyncPolicyWrap CreateDatabaseResiliencePolicy(
        RetryPolicyConfiguration retryConfig,
        CircuitBreakerConfiguration circuitConfig,
        ILogger logger)
    {
        var retryPolicy = RetryPolicyFactory.CreateDatabaseRetryPolicy(retryConfig, logger);
        var circuitBreaker = CircuitBreakerFactory.CreateDatabaseCircuitBreaker(circuitConfig, logger);

        return Policy.WrapAsync(circuitBreaker, retryPolicy);
    }

    /// <summary>
    /// Creates a wrapped policy with retry inside circuit breaker for broker.
    /// </summary>
    public static AsyncPolicyWrap CreateBrokerResiliencePolicy(
        RetryPolicyConfiguration retryConfig,
        CircuitBreakerConfiguration circuitConfig,
        ILogger logger)
    {
        var retryPolicy = RetryPolicyFactory.CreateBrokerRetryPolicy(retryConfig, logger);
        var circuitBreaker = CircuitBreakerFactory.CreateBrokerCircuitBreaker(circuitConfig, logger);

        return Policy.WrapAsync(circuitBreaker, retryPolicy);
    }
}

4. Update Resilience Extensions

Update src/StarGate.Infrastructure/Extensions/ResilienceServiceCollectionExtensions.cs:

public static IServiceCollection AddResiliencePolicies(
    this IServiceCollection services,
    IConfiguration configuration)
{
    // Register configurations
    services.Configure<RetryPolicyConfiguration>(
        configuration.GetSection("Resilience:Retry"));
    services.Configure<CircuitBreakerConfiguration>(
        configuration.GetSection("Resilience:CircuitBreaker"));

    // Register wrapped resilience policies
    services.AddSingleton<AsyncPolicyWrap>(provider =>
    {
        var retryConfig = provider.GetRequiredService<IOptions<RetryPolicyConfiguration>>().Value;
        var circuitConfig = provider.GetRequiredService<IOptions<CircuitBreakerConfiguration>>().Value;
        var logger = provider.GetRequiredService<ILogger<ResiliencePolicyWrapper>>();
        return ResiliencePolicyWrapper.CreateDatabaseResiliencePolicy(retryConfig, circuitConfig, logger);
    });

    services.AddSingleton<AsyncPolicyWrap>(provider =>
    {
        var retryConfig = provider.GetRequiredService<IOptions<RetryPolicyConfiguration>>().Value;
        var circuitConfig = provider.GetRequiredService<IOptions<CircuitBreakerConfiguration>>().Value;
        var logger = provider.GetRequiredService<ILogger<ResiliencePolicyWrapper>>();
        return ResiliencePolicyWrapper.CreateBrokerResiliencePolicy(retryConfig, circuitConfig, logger);
    });

    return services;
}

public static IHttpClientBuilder AddHttpClientWithResilience<TClient>(
    this IServiceCollection services,
    string name)
    where TClient : class
{
    return services
        .AddHttpClient<TClient>(name)
        .AddPolicyHandler((provider, request) =>
        {
            var retryConfig = provider.GetRequiredService<IOptions<RetryPolicyConfiguration>>().Value;
            var circuitConfig = provider.GetRequiredService<IOptions<CircuitBreakerConfiguration>>().Value;
            var logger = provider.GetRequiredService<ILogger<TClient>>();
            return ResiliencePolicyWrapper.CreateHttpResiliencePolicy(retryConfig, circuitConfig, logger);
        });
}

5. Update Configuration

Update src/StarGate.Server/appsettings.json:

{
  "Resilience": {
    "Retry": {
      "MaxRetryAttempts": 3,
      "InitialDelaySeconds": 1.0,
      "MaxDelaySeconds": 30.0,
      "BackoffMultiplier": 2.0,
      "UseJitter": true
    },
    "CircuitBreaker": {
      "FailureThreshold": 5,
      "FailureRateThreshold": 0.5,
      "MinimumThroughput": 10,
      "BreakDurationSeconds": 30.0,
      "SamplingDurationSeconds": 60.0
    }
  }
}

6. Create Circuit Breaker State Service

Create src/StarGate.Infrastructure/Resilience/CircuitBreakerStateService.cs:

namespace StarGate.Infrastructure.Resilience;

using Polly.CircuitBreaker;
using System.Collections.Concurrent;

/// <summary>
/// Service for tracking circuit breaker states.
/// </summary>
public class CircuitBreakerStateService
{
    private readonly ConcurrentDictionary<string, CircuitState> _states = new();

    /// <summary>
    /// Records circuit state change.
    /// </summary>
    public void RecordStateChange(string circuitName, CircuitState state)
    {
        _states.AddOrUpdate(circuitName, state, (_, __) => state);
    }

    /// <summary>
    /// Gets current state of a circuit.
    /// </summary>
    public CircuitState? GetState(string circuitName)
    {
        return _states.TryGetValue(circuitName, out var state) ? state : null;
    }

    /// <summary>
    /// Gets all circuit states.
    /// </summary>
    public Dictionary<string, CircuitState> GetAllStates()
    {
        return new Dictionary<string, CircuitState>(_states);
    }

    /// <summary>
    /// Checks if any circuit is open.
    /// </summary>
    public bool HasOpenCircuit()
    {
        return _states.Values.Any(state => state == CircuitState.Open);
    }
}

7. Create Health Check for Circuit Breakers

Create src/StarGate.Server/HealthChecks/CircuitBreakerHealthCheck.cs:

namespace StarGate.Server.HealthChecks;

using Microsoft.Extensions.Diagnostics.HealthChecks;
using StarGate.Infrastructure.Resilience;

/// <summary>
/// Health check that monitors circuit breaker states.
/// </summary>
public class CircuitBreakerHealthCheck : IHealthCheck
{
    private readonly CircuitBreakerStateService _stateService;

    public CircuitBreakerHealthCheck(CircuitBreakerStateService stateService)
    {
        _stateService = stateService ?? throw new ArgumentNullException(nameof(stateService));
    }

    public Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context,
        CancellationToken cancellationToken = default)
    {
        var states = _stateService.GetAllStates();

        if (states.Count == 0)
        {
            return Task.FromResult(
                HealthCheckResult.Healthy(
                    "No circuit breakers configured"));
        }

        var openCircuits = states.Where(kvp => kvp.Value == CircuitState.Open).ToList();
        var halfOpenCircuits = states.Where(kvp => kvp.Value == CircuitState.HalfOpen).ToList();

        var data = new Dictionary<string, object>();
        foreach (var (name, state) in states)
        {
            data[name] = state.ToString();
        }

        if (openCircuits.Any())
        {
            var openNames = string.Join(", ", openCircuits.Select(kvp => kvp.Key));
            return Task.FromResult(
                HealthCheckResult.Unhealthy(
                    $"Circuit breakers open: {openNames}",
                    data: data));
        }

        if (halfOpenCircuits.Any())
        {
            var halfOpenNames = string.Join(", ", halfOpenCircuits.Select(kvp => kvp.Key));
            return Task.FromResult(
                HealthCheckResult.Degraded(
                    $"Circuit breakers half-open: {halfOpenNames}",
                    data: data));
        }

        return Task.FromResult(
            HealthCheckResult.Healthy(
                "All circuit breakers closed",
                data: data));
    }
}

8. Update Repositories with Wrapped Policies

Update src/StarGate.Infrastructure/Repositories/MongoProcessRepository.cs:

private readonly AsyncPolicyWrap _resiliencePolicy;

public MongoProcessRepository(
    IMongoDatabase database,
    AsyncPolicyWrap resiliencePolicy,
    ILogger<MongoProcessRepository> logger)
{
    _database = database ?? throw new ArgumentNullException(nameof(database));
    _resiliencePolicy = resiliencePolicy ?? throw new ArgumentNullException(nameof(resiliencePolicy));
    _logger = logger ?? throw new ArgumentNullException(nameof(logger));
    _collection = _database.GetCollection<ProcessDocument>("processes");
}

public async Task CreateAsync(Process process, CancellationToken cancellationToken = default)
{
    await _resiliencePolicy.ExecuteAsync(async () =>
    {
        var document = MapToDocument(process);
        await _collection.InsertOneAsync(document, cancellationToken: cancellationToken);
        _logger.LogDebug("Process created: ProcessId={ProcessId}", process.ProcessId);
    });
}

9. Register Health Check

Update src/StarGate.Server/Program.cs:

// Register circuit breaker state service
builder.Services.AddSingleton<CircuitBreakerStateService>();

// Add health checks
builder.Services.AddHealthChecks()
    .AddCheck<ProcessWorkerHealthCheck>("process-worker")
    .AddCheck<CircuitBreakerHealthCheck>("circuit-breakers");

10. Create Unit Tests

Create tests/StarGate.Infrastructure.Tests/Resilience/CircuitBreakerTests.cs:

namespace StarGate.Infrastructure.Tests.Resilience;

using FluentAssertions;
using Microsoft.Extensions.Logging.Abstractions;
using Polly.CircuitBreaker;
using StarGate.Infrastructure.Resilience;
using Xunit;

public class CircuitBreakerTests
{
    private readonly CircuitBreakerConfiguration _config;
    private readonly NullLogger<CircuitBreakerFactory> _logger;

    public CircuitBreakerTests()
    {
        _config = new CircuitBreakerConfiguration
        {
            FailureThreshold = 3,
            FailureRateThreshold = 0.5,
            MinimumThroughput = 5,
            BreakDurationSeconds = 1.0,
            SamplingDurationSeconds = 10.0
        };
        _logger = NullLogger<CircuitBreakerFactory>.Instance;
    }

    [Fact]
    public async Task CircuitBreaker_Should_OpenAfterThresholdExceeded()
    {
        // Arrange
        var circuitBreaker = CircuitBreakerFactory.CreateDatabaseCircuitBreaker(_config, _logger);
        var failures = 0;

        // Act - Execute until circuit opens
        for (int i = 0; i < 10; i++)
        {
            try
            {
                await circuitBreaker.ExecuteAsync(async () =>
                {
                    failures++;
                    await Task.CompletedTask;
                    throw new TimeoutException("Simulated failure");
                });
            }
            catch (TimeoutException)
            {
                // Expected
            }
            catch (BrokenCircuitException)
            {
                // Circuit opened
                break;
            }
        }

        // Assert - Circuit should be open after threshold reached
        var act = async () => await circuitBreaker.ExecuteAsync(async () =>
        {
            await Task.CompletedTask;
        });

        await act.Should().ThrowAsync<BrokenCircuitException>();
        failures.Should().BeGreaterThanOrEqualTo(5); // MinimumThroughput
    }

    [Fact]
    public async Task CircuitBreaker_Should_ResetAfterBreakDuration()
    {
        // Arrange
        var config = new CircuitBreakerConfiguration
        {
            FailureThreshold = 2,
            FailureRateThreshold = 0.5,
            MinimumThroughput = 3,
            BreakDurationSeconds = 0.5,
            SamplingDurationSeconds = 10.0
        };
        var circuitBreaker = CircuitBreakerFactory.CreateDatabaseCircuitBreaker(config, _logger);

        // Act - Cause circuit to open
        for (int i = 0; i < 5; i++)
        {
            try
            {
                await circuitBreaker.ExecuteAsync(async () =>
                {
                    await Task.CompletedTask;
                    throw new TimeoutException();
                });
            }
            catch { }
        }

        // Verify circuit is open
        var actWhileOpen = async () => await circuitBreaker.ExecuteAsync(async () =>
        {
            await Task.CompletedTask;
        });
        await actWhileOpen.Should().ThrowAsync<BrokenCircuitException>();

        // Wait for break duration
        await Task.Delay(TimeSpan.FromSeconds(1));

        // Act - Execute successful operation (half-open -> closed)
        await circuitBreaker.ExecuteAsync(async () =>
        {
            await Task.CompletedTask;
        });

        // Assert - Circuit should be closed
        await circuitBreaker.ExecuteAsync(async () =>
        {
            await Task.CompletedTask;
        });
    }

    [Fact]
    public void CircuitBreakerStateService_Should_TrackStates()
    {
        // Arrange
        var service = new CircuitBreakerStateService();

        // Act
        service.RecordStateChange("database", CircuitState.Closed);
        service.RecordStateChange("broker", CircuitState.Open);

        // Assert
        service.GetState("database").Should().Be(CircuitState.Closed);
        service.GetState("broker").Should().Be(CircuitState.Open);
        service.HasOpenCircuit().Should().BeTrue();
    }
}

✅ Acceptance Criteria

  • CircuitBreakerConfiguration implemented
  • CircuitBreakerFactory created for HTTP, database, and broker
  • Advanced circuit breaker with failure rate threshold
  • ResiliencePolicyWrapper combines retry and circuit breaker
  • Circuit breaker state change callbacks (onBreak, onReset, onHalfOpen)
  • CircuitBreakerStateService tracks circuit states
  • CircuitBreakerHealthCheck monitors circuit states
  • Wrapped policies registered in DI container
  • Repositories updated to use wrapped policies
  • Configuration files updated
  • Comprehensive logging for circuit state changes
  • Unit tests for circuit breaker behavior
  • Integration tests with failure simulation
  • Health endpoint reflects circuit states
  • Code follows CODING-CONVENTIONS.md

📝 Testing Instructions

# Run unit tests
dotnet test tests/StarGate.Infrastructure.Tests --filter "FullyQualifiedName~CircuitBreaker"

# Test circuit breaker with MongoDB
# 1. Start services
docker-compose up -d

# 2. Monitor health endpoint
watch -n 1 curl -s http://localhost:5000/health | jq

# 3. Stop MongoDB
docker-compose stop mongodb

# 4. Create multiple processes (trigger failures)
for i in {1..20}; do
  curl -X POST http://localhost:5000/api/processes \
    -H "Content-Type: application/json" \
    -d '{
      "clientId": "test",
      "processType": "order",
      "clientProcessId": "test-'$i'"
    }'
  sleep 0.1
done

# 5. Check logs for circuit breaker opening
# Expected:
# "Database retry attempt 1/3..."
# "Database retry attempt 2/3..."
# "Database retry attempt 3/3..."
# (after ~10 failures)
# "Database circuit breaker opened: BreakDuration=30s"

# 6. Verify health check shows unhealthy
curl http://localhost:5000/health
# Expected: Status=Unhealthy, "Circuit breakers open: database"

# 7. Further requests fail immediately (no retry)
# "Circuit breaker is open"

# 8. Wait 30 seconds for half-open state
sleep 30

# 9. Check logs
# "Database circuit breaker half-open: Testing recovery"

# 10. Restart MongoDB
docker-compose start mongodb

# 11. Create process (should succeed)
POST /api/processes

# 12. Check logs
# "Database circuit breaker reset: Circuit closed"

# 13. Verify health check is healthy
curl http://localhost:5000/health
# Expected: Status=Healthy, "All circuit breakers closed"

# Test broker circuit breaker
# Repeat steps 3-13 with RabbitMQ instead

📚 References

🏷️ Labels

phase-8 resilience sprint-8.1 polly circuit-breaker

⏱️ Estimated Effort

8-10 hours

🔗 Dependencies

🔗 Related Issues

Part of Phase 8: Resilience - Sprint 8.1: Polly Integration

📌 Important Notes

Circuit Breaker States

Closed (Normal)
  ↓ (failures > threshold)
Open (Blocking all requests)
  ↓ (after break duration)
Half-Open (Testing recovery)
  ↓ (success)     ↓ (failure)
Closed          Open

Closed:

  • Normal operation
  • Requests pass through
  • Failures tracked

Open:

  • All requests fail immediately
  • No calls to downstream service
  • Prevents cascading failures

Half-Open:

  • Testing recovery
  • One request allowed
  • Success → Closed
  • Failure → Open

Advanced vs Simple Circuit Breaker

Simple Circuit Breaker:

.CircuitBreakerAsync(
    handledEventsAllowedBeforeBreaking: 5,
    durationOfBreak: TimeSpan.FromSeconds(30))
  • Counts consecutive failures
  • Opens after N failures

Advanced Circuit Breaker (Used):

.AdvancedCircuitBreakerAsync(
    failureThreshold: 0.5,        // 50% failure rate
    samplingDuration: 60s,         // In last 60 seconds
    minimumThroughput: 10,         // At least 10 requests
    durationOfBreak: 30s)
  • Calculates failure rate
  • More sophisticated
  • Better for varying load

Why Advanced?

  • Handles bursty traffic better
  • Requires minimum throughput
  • Percentage-based (not absolute count)
  • More production-ready

Policy Wrapping Order

Circuit Breaker (outer)
  ↓
Retry (inner)
  ↓
Actual Operation

Why this order?

  1. Circuit breaker checks first
  2. If open → fail immediately (no retry)
  3. If closed → allow retry attempts
  4. If retries exhausted → circuit breaker counts failure

Wrong order (Retry outer):

  • Retry attempts made even when circuit is open
  • Defeats purpose of circuit breaker
  • Wastes resources

Configuration Recommendations

Conservative (Production):

{
  "FailureThreshold": 5,
  "FailureRateThreshold": 0.5,
  "MinimumThroughput": 10,
  "BreakDurationSeconds": 60.0
}
  • Higher thresholds
  • Longer break duration
  • Less sensitive to transients

Aggressive (Testing):

{
  "FailureThreshold": 3,
  "FailureRateThreshold": 0.3,
  "MinimumThroughput": 5,
  "BreakDurationSeconds": 10.0
}
  • Lower thresholds
  • Shorter break duration
  • Faster to trigger

Monitoring and Alerting

Key Metrics:

  • Circuit state (Closed/Open/Half-Open)
  • Number of open circuits
  • Circuit open duration
  • Circuit open frequency

Alerts:

  • Circuit opened → Page on-call
  • Circuit open > 5 minutes → Escalate
  • Multiple circuits open → Major incident

Health Check Integration:

Healthy: All circuits closed
Degraded: Some circuits half-open
Unhealthy: Any circuit open

Cascading Failure Prevention

Without Circuit Breaker:

Service A → Service B (slow/down)
  ↓
Service A threads blocked
  ↓
Service A becomes unresponsive
  ↓
Clients timeout
  ↓
Cascading failure

With Circuit Breaker:

Service A → Service B (slow/down)
  ↓
Circuit breaker opens
  ↓
Service A fails fast
  ↓
Service A remains responsive
  ↓
Other features still work

Testing Strategy

Unit Tests:

  • Test state transitions
  • Verify thresholds
  • Mock failures

Integration Tests:

  • Stop infrastructure
  • Trigger circuit opening
  • Verify health check
  • Test recovery

Load Tests:

  • Simulate high failure rate
  • Verify circuit protection
  • Measure fail-fast latency

Performance Impact

Circuit Closed:

  • Minimal overhead (<1ms)
  • Slight memory for state tracking

Circuit Open:

  • Fail immediately (<0.1ms)
  • No downstream calls
  • Protects resources

Circuit Half-Open:

  • One test request
  • Slightly slower
  • Worth the cost for recovery

Fallback Strategies

When circuit is open, consider:

1. Cached Response:

if (circuit is open)
    return cachedData;

2. Default Value:

if (circuit is open)
    return defaultValue;

3. Degraded Service:

if (circuit is open)
    return limitedFunctionality;

4. Error Response:

if (circuit is open)
    throw new ServiceUnavailableException();

For StarGate, we use error response approach with clear messaging.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions