📋 Task Description
Implement circuit breaker pattern using Polly to prevent cascading failures when external services are unavailable. Configure break duration, failure thresholds, and automatic recovery testing to protect the system from prolonged outages.
🎯 Objectives
- Implement circuit breaker policies for HTTP clients
- Implement circuit breaker policies for database operations
- Implement circuit breaker policies for message broker
- Configure failure thresholds and break duration
- Implement half-open state for recovery testing
- Add circuit state change notifications
- Integrate with retry policies (wrap pattern)
- Add comprehensive logging for circuit state changes
- Expose circuit breaker metrics
- Write unit tests for circuit breaker behavior
- Write integration tests with failure simulation
- Document circuit breaker configuration and monitoring
📦 Deliverables
1. Create Circuit Breaker Configuration
Create src/StarGate.Infrastructure/Resilience/CircuitBreakerConfiguration.cs:
namespace StarGate.Infrastructure.Resilience;
/// <summary>
/// Configuration for circuit breaker policies.
/// </summary>
public class CircuitBreakerConfiguration
{
/// <summary>
/// Number of consecutive failures before breaking the circuit.
/// </summary>
public int FailureThreshold { get; set; } = 5;
/// <summary>
/// Percentage of failures in sampling duration before breaking.
/// </summary>
public double FailureRateThreshold { get; set; } = 0.5; // 50%
/// <summary>
/// Minimum throughput before considering failure rate.
/// </summary>
public int MinimumThroughput { get; set; } = 10;
/// <summary>
/// Duration to keep circuit open before testing recovery (seconds).
/// </summary>
public double BreakDurationSeconds { get; set; } = 30.0;
/// <summary>
/// Duration to sample for failure rate calculation (seconds).
/// </summary>
public double SamplingDurationSeconds { get; set; } = 60.0;
/// <summary>
/// Gets the break duration as TimeSpan.
/// </summary>
public TimeSpan BreakDuration => TimeSpan.FromSeconds(BreakDurationSeconds);
/// <summary>
/// Gets the sampling duration as TimeSpan.
/// </summary>
public TimeSpan SamplingDuration => TimeSpan.FromSeconds(SamplingDurationSeconds);
}
2. Create Circuit Breaker Factory
Create src/StarGate.Infrastructure/Resilience/CircuitBreakerFactory.cs:
namespace StarGate.Infrastructure.Resilience;
using Microsoft.Extensions.Logging;
using Polly;
using Polly.CircuitBreaker;
/// <summary>
/// Factory for creating Polly circuit breaker policies.
/// </summary>
public static class CircuitBreakerFactory
{
/// <summary>
/// Creates a circuit breaker policy for HTTP operations.
/// </summary>
public static AsyncCircuitBreakerPolicy<HttpResponseMessage> CreateHttpCircuitBreaker(
CircuitBreakerConfiguration config,
ILogger logger)
{
return Policy
.HandleResult<HttpResponseMessage>(r => !r.IsSuccessStatusCode)
.Or<HttpRequestException>()
.Or<TimeoutException>()
.AdvancedCircuitBreakerAsync(
failureThreshold: config.FailureRateThreshold,
samplingDuration: config.SamplingDuration,
minimumThroughput: config.MinimumThroughput,
durationOfBreak: config.BreakDuration,
onBreak: (outcome, breakDuration, context) =>
{
var statusCode = outcome.Result?.StatusCode.ToString() ?? "N/A";
var exception = outcome.Exception?.GetType().Name ?? "None";
logger.LogError(
"HTTP circuit breaker opened: StatusCode={StatusCode}, Exception={Exception}, BreakDuration={BreakDuration}s",
statusCode,
exception,
breakDuration.TotalSeconds);
},
onReset: context =>
{
logger.LogInformation("HTTP circuit breaker reset: Circuit closed");
},
onHalfOpen: () =>
{
logger.LogWarning("HTTP circuit breaker half-open: Testing recovery");
});
}
/// <summary>
/// Creates a circuit breaker policy for database operations.
/// </summary>
public static AsyncCircuitBreakerPolicy CreateDatabaseCircuitBreaker(
CircuitBreakerConfiguration config,
ILogger logger)
{
return Policy
.Handle<TimeoutException>()
.Or<IOException>()
.Or<InvalidOperationException>(ex => ex.Message.Contains("connection"))
.AdvancedCircuitBreakerAsync(
failureThreshold: config.FailureRateThreshold,
samplingDuration: config.SamplingDuration,
minimumThroughput: config.MinimumThroughput,
durationOfBreak: config.BreakDuration,
onBreak: (exception, breakDuration, context) =>
{
logger.LogError(
exception,
"Database circuit breaker opened: Exception={Exception}, BreakDuration={BreakDuration}s",
exception.GetType().Name,
breakDuration.TotalSeconds);
},
onReset: context =>
{
logger.LogInformation("Database circuit breaker reset: Circuit closed");
},
onHalfOpen: () =>
{
logger.LogWarning("Database circuit breaker half-open: Testing recovery");
});
}
/// <summary>
/// Creates a circuit breaker policy for message broker operations.
/// </summary>
public static AsyncCircuitBreakerPolicy CreateBrokerCircuitBreaker(
CircuitBreakerConfiguration config,
ILogger logger)
{
return Policy
.Handle<IOException>()
.Or<TimeoutException>()
.Or<InvalidOperationException>(ex => ex.Message.Contains("connection"))
.AdvancedCircuitBreakerAsync(
failureThreshold: config.FailureRateThreshold,
samplingDuration: config.SamplingDuration,
minimumThroughput: config.MinimumThroughput,
durationOfBreak: config.BreakDuration,
onBreak: (exception, breakDuration, context) =>
{
logger.LogError(
exception,
"Broker circuit breaker opened: Exception={Exception}, BreakDuration={BreakDuration}s",
exception.GetType().Name,
breakDuration.TotalSeconds);
},
onReset: context =>
{
logger.LogInformation("Broker circuit breaker reset: Circuit closed");
},
onHalfOpen: () =>
{
logger.LogWarning("Broker circuit breaker half-open: Testing recovery");
});
}
}
3. Create Resilience Policy Wrapper
Create src/StarGate.Infrastructure/Resilience/ResiliencePolicyWrapper.cs:
namespace StarGate.Infrastructure.Resilience;
using Microsoft.Extensions.Logging;
using Polly;
using Polly.Wrap;
/// <summary>
/// Wraps retry and circuit breaker policies together.
/// </summary>
public static class ResiliencePolicyWrapper
{
/// <summary>
/// Creates a wrapped policy with retry inside circuit breaker for HTTP.
/// </summary>
public static AsyncPolicyWrap<HttpResponseMessage> CreateHttpResiliencePolicy(
RetryPolicyConfiguration retryConfig,
CircuitBreakerConfiguration circuitConfig,
ILogger logger)
{
var retryPolicy = RetryPolicyFactory.CreateHttpRetryPolicy(retryConfig, logger);
var circuitBreaker = CircuitBreakerFactory.CreateHttpCircuitBreaker(circuitConfig, logger);
// Wrap: Circuit Breaker (outer) -> Retry (inner)
return Policy.WrapAsync(circuitBreaker, retryPolicy);
}
/// <summary>
/// Creates a wrapped policy with retry inside circuit breaker for database.
/// </summary>
public static AsyncPolicyWrap CreateDatabaseResiliencePolicy(
RetryPolicyConfiguration retryConfig,
CircuitBreakerConfiguration circuitConfig,
ILogger logger)
{
var retryPolicy = RetryPolicyFactory.CreateDatabaseRetryPolicy(retryConfig, logger);
var circuitBreaker = CircuitBreakerFactory.CreateDatabaseCircuitBreaker(circuitConfig, logger);
return Policy.WrapAsync(circuitBreaker, retryPolicy);
}
/// <summary>
/// Creates a wrapped policy with retry inside circuit breaker for broker.
/// </summary>
public static AsyncPolicyWrap CreateBrokerResiliencePolicy(
RetryPolicyConfiguration retryConfig,
CircuitBreakerConfiguration circuitConfig,
ILogger logger)
{
var retryPolicy = RetryPolicyFactory.CreateBrokerRetryPolicy(retryConfig, logger);
var circuitBreaker = CircuitBreakerFactory.CreateBrokerCircuitBreaker(circuitConfig, logger);
return Policy.WrapAsync(circuitBreaker, retryPolicy);
}
}
4. Update Resilience Extensions
Update src/StarGate.Infrastructure/Extensions/ResilienceServiceCollectionExtensions.cs:
public static IServiceCollection AddResiliencePolicies(
this IServiceCollection services,
IConfiguration configuration)
{
// Register configurations
services.Configure<RetryPolicyConfiguration>(
configuration.GetSection("Resilience:Retry"));
services.Configure<CircuitBreakerConfiguration>(
configuration.GetSection("Resilience:CircuitBreaker"));
// Register wrapped resilience policies
services.AddSingleton<AsyncPolicyWrap>(provider =>
{
var retryConfig = provider.GetRequiredService<IOptions<RetryPolicyConfiguration>>().Value;
var circuitConfig = provider.GetRequiredService<IOptions<CircuitBreakerConfiguration>>().Value;
var logger = provider.GetRequiredService<ILogger<ResiliencePolicyWrapper>>();
return ResiliencePolicyWrapper.CreateDatabaseResiliencePolicy(retryConfig, circuitConfig, logger);
});
services.AddSingleton<AsyncPolicyWrap>(provider =>
{
var retryConfig = provider.GetRequiredService<IOptions<RetryPolicyConfiguration>>().Value;
var circuitConfig = provider.GetRequiredService<IOptions<CircuitBreakerConfiguration>>().Value;
var logger = provider.GetRequiredService<ILogger<ResiliencePolicyWrapper>>();
return ResiliencePolicyWrapper.CreateBrokerResiliencePolicy(retryConfig, circuitConfig, logger);
});
return services;
}
public static IHttpClientBuilder AddHttpClientWithResilience<TClient>(
this IServiceCollection services,
string name)
where TClient : class
{
return services
.AddHttpClient<TClient>(name)
.AddPolicyHandler((provider, request) =>
{
var retryConfig = provider.GetRequiredService<IOptions<RetryPolicyConfiguration>>().Value;
var circuitConfig = provider.GetRequiredService<IOptions<CircuitBreakerConfiguration>>().Value;
var logger = provider.GetRequiredService<ILogger<TClient>>();
return ResiliencePolicyWrapper.CreateHttpResiliencePolicy(retryConfig, circuitConfig, logger);
});
}
5. Update Configuration
Update src/StarGate.Server/appsettings.json:
{
"Resilience": {
"Retry": {
"MaxRetryAttempts": 3,
"InitialDelaySeconds": 1.0,
"MaxDelaySeconds": 30.0,
"BackoffMultiplier": 2.0,
"UseJitter": true
},
"CircuitBreaker": {
"FailureThreshold": 5,
"FailureRateThreshold": 0.5,
"MinimumThroughput": 10,
"BreakDurationSeconds": 30.0,
"SamplingDurationSeconds": 60.0
}
}
}
6. Create Circuit Breaker State Service
Create src/StarGate.Infrastructure/Resilience/CircuitBreakerStateService.cs:
namespace StarGate.Infrastructure.Resilience;
using Polly.CircuitBreaker;
using System.Collections.Concurrent;
/// <summary>
/// Service for tracking circuit breaker states.
/// </summary>
public class CircuitBreakerStateService
{
private readonly ConcurrentDictionary<string, CircuitState> _states = new();
/// <summary>
/// Records circuit state change.
/// </summary>
public void RecordStateChange(string circuitName, CircuitState state)
{
_states.AddOrUpdate(circuitName, state, (_, __) => state);
}
/// <summary>
/// Gets current state of a circuit.
/// </summary>
public CircuitState? GetState(string circuitName)
{
return _states.TryGetValue(circuitName, out var state) ? state : null;
}
/// <summary>
/// Gets all circuit states.
/// </summary>
public Dictionary<string, CircuitState> GetAllStates()
{
return new Dictionary<string, CircuitState>(_states);
}
/// <summary>
/// Checks if any circuit is open.
/// </summary>
public bool HasOpenCircuit()
{
return _states.Values.Any(state => state == CircuitState.Open);
}
}
7. Create Health Check for Circuit Breakers
Create src/StarGate.Server/HealthChecks/CircuitBreakerHealthCheck.cs:
namespace StarGate.Server.HealthChecks;
using Microsoft.Extensions.Diagnostics.HealthChecks;
using StarGate.Infrastructure.Resilience;
/// <summary>
/// Health check that monitors circuit breaker states.
/// </summary>
public class CircuitBreakerHealthCheck : IHealthCheck
{
private readonly CircuitBreakerStateService _stateService;
public CircuitBreakerHealthCheck(CircuitBreakerStateService stateService)
{
_stateService = stateService ?? throw new ArgumentNullException(nameof(stateService));
}
public Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken cancellationToken = default)
{
var states = _stateService.GetAllStates();
if (states.Count == 0)
{
return Task.FromResult(
HealthCheckResult.Healthy(
"No circuit breakers configured"));
}
var openCircuits = states.Where(kvp => kvp.Value == CircuitState.Open).ToList();
var halfOpenCircuits = states.Where(kvp => kvp.Value == CircuitState.HalfOpen).ToList();
var data = new Dictionary<string, object>();
foreach (var (name, state) in states)
{
data[name] = state.ToString();
}
if (openCircuits.Any())
{
var openNames = string.Join(", ", openCircuits.Select(kvp => kvp.Key));
return Task.FromResult(
HealthCheckResult.Unhealthy(
$"Circuit breakers open: {openNames}",
data: data));
}
if (halfOpenCircuits.Any())
{
var halfOpenNames = string.Join(", ", halfOpenCircuits.Select(kvp => kvp.Key));
return Task.FromResult(
HealthCheckResult.Degraded(
$"Circuit breakers half-open: {halfOpenNames}",
data: data));
}
return Task.FromResult(
HealthCheckResult.Healthy(
"All circuit breakers closed",
data: data));
}
}
8. Update Repositories with Wrapped Policies
Update src/StarGate.Infrastructure/Repositories/MongoProcessRepository.cs:
private readonly AsyncPolicyWrap _resiliencePolicy;
public MongoProcessRepository(
IMongoDatabase database,
AsyncPolicyWrap resiliencePolicy,
ILogger<MongoProcessRepository> logger)
{
_database = database ?? throw new ArgumentNullException(nameof(database));
_resiliencePolicy = resiliencePolicy ?? throw new ArgumentNullException(nameof(resiliencePolicy));
_logger = logger ?? throw new ArgumentNullException(nameof(logger));
_collection = _database.GetCollection<ProcessDocument>("processes");
}
public async Task CreateAsync(Process process, CancellationToken cancellationToken = default)
{
await _resiliencePolicy.ExecuteAsync(async () =>
{
var document = MapToDocument(process);
await _collection.InsertOneAsync(document, cancellationToken: cancellationToken);
_logger.LogDebug("Process created: ProcessId={ProcessId}", process.ProcessId);
});
}
9. Register Health Check
Update src/StarGate.Server/Program.cs:
// Register circuit breaker state service
builder.Services.AddSingleton<CircuitBreakerStateService>();
// Add health checks
builder.Services.AddHealthChecks()
.AddCheck<ProcessWorkerHealthCheck>("process-worker")
.AddCheck<CircuitBreakerHealthCheck>("circuit-breakers");
10. Create Unit Tests
Create tests/StarGate.Infrastructure.Tests/Resilience/CircuitBreakerTests.cs:
namespace StarGate.Infrastructure.Tests.Resilience;
using FluentAssertions;
using Microsoft.Extensions.Logging.Abstractions;
using Polly.CircuitBreaker;
using StarGate.Infrastructure.Resilience;
using Xunit;
public class CircuitBreakerTests
{
private readonly CircuitBreakerConfiguration _config;
private readonly NullLogger<CircuitBreakerFactory> _logger;
public CircuitBreakerTests()
{
_config = new CircuitBreakerConfiguration
{
FailureThreshold = 3,
FailureRateThreshold = 0.5,
MinimumThroughput = 5,
BreakDurationSeconds = 1.0,
SamplingDurationSeconds = 10.0
};
_logger = NullLogger<CircuitBreakerFactory>.Instance;
}
[Fact]
public async Task CircuitBreaker_Should_OpenAfterThresholdExceeded()
{
// Arrange
var circuitBreaker = CircuitBreakerFactory.CreateDatabaseCircuitBreaker(_config, _logger);
var failures = 0;
// Act - Execute until circuit opens
for (int i = 0; i < 10; i++)
{
try
{
await circuitBreaker.ExecuteAsync(async () =>
{
failures++;
await Task.CompletedTask;
throw new TimeoutException("Simulated failure");
});
}
catch (TimeoutException)
{
// Expected
}
catch (BrokenCircuitException)
{
// Circuit opened
break;
}
}
// Assert - Circuit should be open after threshold reached
var act = async () => await circuitBreaker.ExecuteAsync(async () =>
{
await Task.CompletedTask;
});
await act.Should().ThrowAsync<BrokenCircuitException>();
failures.Should().BeGreaterThanOrEqualTo(5); // MinimumThroughput
}
[Fact]
public async Task CircuitBreaker_Should_ResetAfterBreakDuration()
{
// Arrange
var config = new CircuitBreakerConfiguration
{
FailureThreshold = 2,
FailureRateThreshold = 0.5,
MinimumThroughput = 3,
BreakDurationSeconds = 0.5,
SamplingDurationSeconds = 10.0
};
var circuitBreaker = CircuitBreakerFactory.CreateDatabaseCircuitBreaker(config, _logger);
// Act - Cause circuit to open
for (int i = 0; i < 5; i++)
{
try
{
await circuitBreaker.ExecuteAsync(async () =>
{
await Task.CompletedTask;
throw new TimeoutException();
});
}
catch { }
}
// Verify circuit is open
var actWhileOpen = async () => await circuitBreaker.ExecuteAsync(async () =>
{
await Task.CompletedTask;
});
await actWhileOpen.Should().ThrowAsync<BrokenCircuitException>();
// Wait for break duration
await Task.Delay(TimeSpan.FromSeconds(1));
// Act - Execute successful operation (half-open -> closed)
await circuitBreaker.ExecuteAsync(async () =>
{
await Task.CompletedTask;
});
// Assert - Circuit should be closed
await circuitBreaker.ExecuteAsync(async () =>
{
await Task.CompletedTask;
});
}
[Fact]
public void CircuitBreakerStateService_Should_TrackStates()
{
// Arrange
var service = new CircuitBreakerStateService();
// Act
service.RecordStateChange("database", CircuitState.Closed);
service.RecordStateChange("broker", CircuitState.Open);
// Assert
service.GetState("database").Should().Be(CircuitState.Closed);
service.GetState("broker").Should().Be(CircuitState.Open);
service.HasOpenCircuit().Should().BeTrue();
}
}
✅ Acceptance Criteria
📝 Testing Instructions
# Run unit tests
dotnet test tests/StarGate.Infrastructure.Tests --filter "FullyQualifiedName~CircuitBreaker"
# Test circuit breaker with MongoDB
# 1. Start services
docker-compose up -d
# 2. Monitor health endpoint
watch -n 1 curl -s http://localhost:5000/health | jq
# 3. Stop MongoDB
docker-compose stop mongodb
# 4. Create multiple processes (trigger failures)
for i in {1..20}; do
curl -X POST http://localhost:5000/api/processes \
-H "Content-Type: application/json" \
-d '{
"clientId": "test",
"processType": "order",
"clientProcessId": "test-'$i'"
}'
sleep 0.1
done
# 5. Check logs for circuit breaker opening
# Expected:
# "Database retry attempt 1/3..."
# "Database retry attempt 2/3..."
# "Database retry attempt 3/3..."
# (after ~10 failures)
# "Database circuit breaker opened: BreakDuration=30s"
# 6. Verify health check shows unhealthy
curl http://localhost:5000/health
# Expected: Status=Unhealthy, "Circuit breakers open: database"
# 7. Further requests fail immediately (no retry)
# "Circuit breaker is open"
# 8. Wait 30 seconds for half-open state
sleep 30
# 9. Check logs
# "Database circuit breaker half-open: Testing recovery"
# 10. Restart MongoDB
docker-compose start mongodb
# 11. Create process (should succeed)
POST /api/processes
# 12. Check logs
# "Database circuit breaker reset: Circuit closed"
# 13. Verify health check is healthy
curl http://localhost:5000/health
# Expected: Status=Healthy, "All circuit breakers closed"
# Test broker circuit breaker
# Repeat steps 3-13 with RabbitMQ instead
📚 References
🏷️ Labels
phase-8 resilience sprint-8.1 polly circuit-breaker
⏱️ Estimated Effort
8-10 hours
🔗 Dependencies
🔗 Related Issues
Part of Phase 8: Resilience - Sprint 8.1: Polly Integration
📌 Important Notes
Circuit Breaker States
Closed (Normal)
↓ (failures > threshold)
Open (Blocking all requests)
↓ (after break duration)
Half-Open (Testing recovery)
↓ (success) ↓ (failure)
Closed Open
Closed:
- Normal operation
- Requests pass through
- Failures tracked
Open:
- All requests fail immediately
- No calls to downstream service
- Prevents cascading failures
Half-Open:
- Testing recovery
- One request allowed
- Success → Closed
- Failure → Open
Advanced vs Simple Circuit Breaker
Simple Circuit Breaker:
.CircuitBreakerAsync(
handledEventsAllowedBeforeBreaking: 5,
durationOfBreak: TimeSpan.FromSeconds(30))
- Counts consecutive failures
- Opens after N failures
Advanced Circuit Breaker (Used):
.AdvancedCircuitBreakerAsync(
failureThreshold: 0.5, // 50% failure rate
samplingDuration: 60s, // In last 60 seconds
minimumThroughput: 10, // At least 10 requests
durationOfBreak: 30s)
- Calculates failure rate
- More sophisticated
- Better for varying load
Why Advanced?
- Handles bursty traffic better
- Requires minimum throughput
- Percentage-based (not absolute count)
- More production-ready
Policy Wrapping Order
Circuit Breaker (outer)
↓
Retry (inner)
↓
Actual Operation
Why this order?
- Circuit breaker checks first
- If open → fail immediately (no retry)
- If closed → allow retry attempts
- If retries exhausted → circuit breaker counts failure
Wrong order (Retry outer):
- Retry attempts made even when circuit is open
- Defeats purpose of circuit breaker
- Wastes resources
Configuration Recommendations
Conservative (Production):
{
"FailureThreshold": 5,
"FailureRateThreshold": 0.5,
"MinimumThroughput": 10,
"BreakDurationSeconds": 60.0
}
- Higher thresholds
- Longer break duration
- Less sensitive to transients
Aggressive (Testing):
{
"FailureThreshold": 3,
"FailureRateThreshold": 0.3,
"MinimumThroughput": 5,
"BreakDurationSeconds": 10.0
}
- Lower thresholds
- Shorter break duration
- Faster to trigger
Monitoring and Alerting
Key Metrics:
- Circuit state (Closed/Open/Half-Open)
- Number of open circuits
- Circuit open duration
- Circuit open frequency
Alerts:
- Circuit opened → Page on-call
- Circuit open > 5 minutes → Escalate
- Multiple circuits open → Major incident
Health Check Integration:
Healthy: All circuits closed
Degraded: Some circuits half-open
Unhealthy: Any circuit open
Cascading Failure Prevention
Without Circuit Breaker:
Service A → Service B (slow/down)
↓
Service A threads blocked
↓
Service A becomes unresponsive
↓
Clients timeout
↓
Cascading failure
With Circuit Breaker:
Service A → Service B (slow/down)
↓
Circuit breaker opens
↓
Service A fails fast
↓
Service A remains responsive
↓
Other features still work
Testing Strategy
Unit Tests:
- Test state transitions
- Verify thresholds
- Mock failures
Integration Tests:
- Stop infrastructure
- Trigger circuit opening
- Verify health check
- Test recovery
Load Tests:
- Simulate high failure rate
- Verify circuit protection
- Measure fail-fast latency
Performance Impact
Circuit Closed:
- Minimal overhead (<1ms)
- Slight memory for state tracking
Circuit Open:
- Fail immediately (<0.1ms)
- No downstream calls
- Protects resources
Circuit Half-Open:
- One test request
- Slightly slower
- Worth the cost for recovery
Fallback Strategies
When circuit is open, consider:
1. Cached Response:
if (circuit is open)
return cachedData;
2. Default Value:
if (circuit is open)
return defaultValue;
3. Degraded Service:
if (circuit is open)
return limitedFunctionality;
4. Error Response:
if (circuit is open)
throw new ServiceUnavailableException();
For StarGate, we use error response approach with clear messaging.
📋 Task Description
Implement circuit breaker pattern using Polly to prevent cascading failures when external services are unavailable. Configure break duration, failure thresholds, and automatic recovery testing to protect the system from prolonged outages.
🎯 Objectives
📦 Deliverables
1. Create Circuit Breaker Configuration
Create
src/StarGate.Infrastructure/Resilience/CircuitBreakerConfiguration.cs:2. Create Circuit Breaker Factory
Create
src/StarGate.Infrastructure/Resilience/CircuitBreakerFactory.cs:3. Create Resilience Policy Wrapper
Create
src/StarGate.Infrastructure/Resilience/ResiliencePolicyWrapper.cs:4. Update Resilience Extensions
Update
src/StarGate.Infrastructure/Extensions/ResilienceServiceCollectionExtensions.cs:5. Update Configuration
Update
src/StarGate.Server/appsettings.json:{ "Resilience": { "Retry": { "MaxRetryAttempts": 3, "InitialDelaySeconds": 1.0, "MaxDelaySeconds": 30.0, "BackoffMultiplier": 2.0, "UseJitter": true }, "CircuitBreaker": { "FailureThreshold": 5, "FailureRateThreshold": 0.5, "MinimumThroughput": 10, "BreakDurationSeconds": 30.0, "SamplingDurationSeconds": 60.0 } } }6. Create Circuit Breaker State Service
Create
src/StarGate.Infrastructure/Resilience/CircuitBreakerStateService.cs:7. Create Health Check for Circuit Breakers
Create
src/StarGate.Server/HealthChecks/CircuitBreakerHealthCheck.cs:8. Update Repositories with Wrapped Policies
Update
src/StarGate.Infrastructure/Repositories/MongoProcessRepository.cs:9. Register Health Check
Update
src/StarGate.Server/Program.cs:10. Create Unit Tests
Create
tests/StarGate.Infrastructure.Tests/Resilience/CircuitBreakerTests.cs:✅ Acceptance Criteria
📝 Testing Instructions
📚 References
🏷️ Labels
phase-8resiliencesprint-8.1pollycircuit-breaker⏱️ Estimated Effort
8-10 hours
🔗 Dependencies
🔗 Related Issues
Part of Phase 8: Resilience - Sprint 8.1: Polly Integration
📌 Important Notes
Circuit Breaker States
Closed:
Open:
Half-Open:
Advanced vs Simple Circuit Breaker
Simple Circuit Breaker:
Advanced Circuit Breaker (Used):
Why Advanced?
Policy Wrapping Order
Why this order?
Wrong order (Retry outer):
Configuration Recommendations
Conservative (Production):
{ "FailureThreshold": 5, "FailureRateThreshold": 0.5, "MinimumThroughput": 10, "BreakDurationSeconds": 60.0 }Aggressive (Testing):
{ "FailureThreshold": 3, "FailureRateThreshold": 0.3, "MinimumThroughput": 5, "BreakDurationSeconds": 10.0 }Monitoring and Alerting
Key Metrics:
Alerts:
Health Check Integration:
Cascading Failure Prevention
Without Circuit Breaker:
With Circuit Breaker:
Testing Strategy
Unit Tests:
Integration Tests:
Load Tests:
Performance Impact
Circuit Closed:
Circuit Open:
Circuit Half-Open:
Fallback Strategies
When circuit is open, consider:
1. Cached Response:
2. Default Value:
3. Degraded Service:
4. Error Response:
For StarGate, we use error response approach with clear messaging.