Skip to content

Phase 4+: Implement Health Checks for Liveness and Readiness Probes #117

@artcava

Description

@artcava

📋 Task Description

Implement comprehensive health checks for Kubernetes liveness and readiness probes. Include checks for MongoDB, Redis, RabbitMQ, and application readiness. Create dedicated endpoints with appropriate response formats and configurable thresholds.

🎯 Objectives

  • Implement ASP.NET Core Health Checks
  • Create MongoDB health check
  • Create Redis health check
  • Create RabbitMQ health check
  • Add application startup health check
  • Create dedicated liveness endpoint (/health/live)
  • Create dedicated readiness endpoint (/health/ready)
  • Create detailed health endpoint (/health)
  • Add configurable timeout for health checks
  • Include dependency status in responses
  • Add health check UI for monitoring
  • Write unit tests for health checks
  • Document health check usage

📦 Deliverables

1. Install Health Checks NuGet Packages

Update src/StarGate.Server/StarGate.Server.csproj:

<ItemGroup>
  <PackageReference Include="AspNetCore.HealthChecks.MongoDb" Version="8.0.1" />
  <PackageReference Include="AspNetCore.HealthChecks.Redis" Version="8.0.1" />
  <PackageReference Include="AspNetCore.HealthChecks.RabbitMQ" Version="8.0.2" />
  <PackageReference Include="AspNetCore.HealthChecks.UI" Version="8.0.1" />
  <PackageReference Include="AspNetCore.HealthChecks.UI.Client" Version="8.0.1" />
  <PackageReference Include="AspNetCore.HealthChecks.UI.InMemory.Storage" Version="8.0.1" />
</ItemGroup>

2. Create Custom Health Checks

Create src/StarGate.Server/HealthChecks/StartupHealthCheck.cs:

namespace StarGate.Server.HealthChecks;

using Microsoft.Extensions.Diagnostics.HealthChecks;

public class StartupHealthCheck : IHealthCheck
{
    private volatile bool _startupCompleted;

    public bool StartupCompleted
    {
        get => _startupCompleted;
        set => _startupCompleted = value;
    }

    public Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context,
        CancellationToken cancellationToken = default)
    {
        if (_startupCompleted)
        {
            return Task.FromResult(
                HealthCheckResult.Healthy("Application startup completed"));
        }

        return Task.FromResult(
            HealthCheckResult.Unhealthy("Application startup in progress"));
    }
}

3. Configure Health Checks in Program.cs

Update src/StarGate.Server/Program.cs:

var builder = WebApplication.CreateBuilder(args);

// Add startup health check
var startupHealthCheck = new StartupHealthCheck();
builder.Services.AddSingleton(startupHealthCheck);

// Configure health checks
builder.Services.AddHealthChecks()
    .AddCheck("self", () => HealthCheckResult.Healthy(), tags: new[] { "ready" })
    .AddCheck("startup", startupHealthCheck, tags: new[] { "ready" })
    .AddMongoDb(
        builder.Configuration["MongoDB:ConnectionString"]!,
        name: "mongodb",
        timeout: TimeSpan.FromSeconds(3),
        tags: new[] { "ready", "database" })
    .AddRedis(
        builder.Configuration["Redis:ConnectionString"]!,
        name: "redis",
        timeout: TimeSpan.FromSeconds(3),
        tags: new[] { "ready", "cache" })
    .AddRabbitMQ(
        rabbitConnectionString: builder.Configuration["RabbitMQ:ConnectionString"]!,
        name: "rabbitmq",
        timeout: TimeSpan.FromSeconds(3),
        tags: new[] { "ready", "messaging" });

// Add health checks UI
builder.Services
    .AddHealthChecksUI(setup =>
    {
        setup.SetEvaluationTimeInSeconds(60);
        setup.MaximumHistoryEntriesPerEndpoint(50);
        setup.AddHealthCheckEndpoint("StarGate API", "/health");
    })
    .AddInMemoryStorage();

var app = builder.Build();

// Map health check endpoints
app.MapHealthChecks("/health/live", new HealthCheckOptions
{
    Predicate = _ => false, // No checks, just returns healthy
    ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse
});

app.MapHealthChecks("/health/ready", new HealthCheckOptions
{
    Predicate = check => check.Tags.Contains("ready"),
    ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse
});

app.MapHealthChecks("/health", new HealthCheckOptions
{
    ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse
});

// Health checks UI
app.MapHealthChecksUI(setup =>
{
    setup.UIPath = "/healthchecks-ui";
    setup.ApiPath = "/healthchecks-api";
});

// Mark startup as completed after app starts
var lifetime = app.Services.GetRequiredService<IHostApplicationLifetime>();
lifetime.ApplicationStarted.Register(() =>
{
    startupHealthCheck.StartupCompleted = true;
});

app.Run();

4. Create Health Check Response Models

Create src/StarGate.Core/Models/HealthCheckResponse.cs:

namespace StarGate.Core.Models;

public class HealthCheckResponse
{
    public string Status { get; set; } = string.Empty;
    public Dictionary<string, HealthCheckEntry> Checks { get; set; } = new();
    public TimeSpan Duration { get; set; }
    public DateTime Timestamp { get; set; } = DateTime.UtcNow;
}

public class HealthCheckEntry
{
    public string Status { get; set; } = string.Empty;
    public string? Description { get; set; }
    public TimeSpan Duration { get; set; }
    public Dictionary<string, object>? Data { get; set; }
    public string? Exception { get; set; }
    public List<string>? Tags { get; set; }
}

5. Add Configuration

Update src/StarGate.Server/appsettings.json:

{
  "HealthChecksUI": {
    "HealthChecks": [
      {
        "Name": "StarGate API",
        "Uri": "http://localhost:5000/health"
      }
    ],
    "EvaluationTimeInSeconds": 60,
    "MinimumSecondsBetweenFailureNotifications": 60
  }
}

6. Create Documentation

Create docs/HEALTH-CHECKS.md:

# Health Checks - StarGate

## Endpoints

### Liveness Probe: `/health/live`
- **Purpose:** Check if application is alive
- **Use:** Kubernetes liveness probe
- **Response:** Always returns 200 OK
- **Checks:** None (just returns healthy)

### Readiness Probe: `/health/ready`
- **Purpose:** Check if application is ready to serve traffic
- **Use:** Kubernetes readiness probe
- **Response:** 200 OK if ready, 503 Service Unavailable if not
- **Checks:**
  - Startup completed
  - MongoDB connection
  - Redis connection
  - RabbitMQ connection

### Detailed Health: `/health`
- **Purpose:** Detailed health information
- **Use:** Monitoring and debugging
- **Response:** JSON with all health check details
- **Checks:** All registered health checks

## Response Format

### Healthy Response (200 OK)
```json
{
  "status": "Healthy",
  "checks": {
    "mongodb": {
      "status": "Healthy",
      "description": "MongoDB is healthy",
      "duration": "00:00:00.1234567",
      "tags": ["ready", "database"]
    },
    "redis": {
      "status": "Healthy",
      "description": "Redis is healthy",
      "duration": "00:00:00.0123456",
      "tags": ["ready", "cache"]
    },
    "rabbitmq": {
      "status": "Healthy",
      "description": "RabbitMQ is healthy",
      "duration": "00:00:00.0234567",
      "tags": ["ready", "messaging"]
    }
  },
  "duration": "00:00:00.2345678",
  "timestamp": "2026-02-18T14:30:00.000Z"
}

Unhealthy Response (503 Service Unavailable)

{
  "status": "Unhealthy",
  "checks": {
    "mongodb": {
      "status": "Unhealthy",
      "description": "MongoDB connection failed",
      "exception": "MongoConnectionException: Unable to connect",
      "duration": "00:00:03.0000000"
    }
  },
  "duration": "00:00:03.1234567",
  "timestamp": "2026-02-18T14:30:00.000Z"
}

Kubernetes Configuration

livenessProbe:
  httpGet:
    path: /health/live
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /health/ready
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5
  timeoutSeconds: 3
  failureThreshold: 3

Health Checks UI

Access at: http://localhost:5000/healthchecks-ui

Troubleshooting

MongoDB Unhealthy

  • Check connection string
  • Verify MongoDB is running
  • Check network connectivity

Redis Unhealthy

  • Check Redis connection
  • Verify Redis is accessible
  • Check authentication

RabbitMQ Unhealthy

  • Check RabbitMQ connection
  • Verify vhost exists
  • Check credentials

## ✅ Acceptance Criteria

- [ ] Health check NuGet packages installed
- [ ] MongoDB health check configured
- [ ] Redis health check configured
- [ ] RabbitMQ health check configured
- [ ] Startup health check implemented
- [ ] `/health/live` endpoint created
- [ ] `/health/ready` endpoint created
- [ ] `/health` endpoint with detailed info created
- [ ] Health checks UI configured
- [ ] Configurable timeouts for checks
- [ ] Health checks return proper HTTP status codes
- [ ] JSON response format matches specification
- [ ] Health checks tested with all dependencies healthy
- [ ] Health checks tested with dependencies failing
- [ ] Documentation complete
- [ ] Code follows CODING-CONVENTIONS.md

## 📝 Testing Instructions

```bash
# Start infrastructure
docker-compose up -d mongodb redis rabbitmq

# Run application
dotnet run --project src/StarGate.Server

# Test liveness (should always be healthy)
curl http://localhost:5000/health/live

# Test readiness (healthy if all dependencies ready)
curl http://localhost:5000/health/ready

# Test detailed health
curl http://localhost:5000/health | jq

# View Health Checks UI
open http://localhost:5000/healthchecks-ui

# Test with MongoDB down
docker stop stargate-mongodb
curl http://localhost:5000/health/ready
# Should return 503

# Test with MongoDB back up
docker start stargate-mongodb
sleep 5
curl http://localhost:5000/health/ready
# Should return 200

📚 References

🏷️ Labels

phase-4+ production-readiness health-checks kubernetes resilience

⏱️ Estimated Effort

4-6 hours

🔗 Dependencies

  • Phase 4: Configuration Management (completed)
  • MongoDB, Redis, RabbitMQ infrastructure

🔗 Related Issues

Part of "Production-Ready API" initiative - foundational for Kubernetes deployment

📌 Important Notes

Liveness vs Readiness

Liveness:

  • Checks if application is alive
  • Kubernetes restarts pod if unhealthy
  • Should NOT check dependencies
  • Fast response (<100ms)

Readiness:

  • Checks if ready to serve traffic
  • Kubernetes removes from load balancer if unhealthy
  • SHOULD check dependencies
  • Allows slower response (<3s)

Health Check Timeouts

  • MongoDB: 3 seconds
  • Redis: 3 seconds
  • RabbitMQ: 3 seconds
  • Total readiness timeout: ~10 seconds max

Kubernetes Integration

Liveness probe failure:

  • Pod restarted
  • Downtime during restart
  • Use conservative thresholds

Readiness probe failure:

  • Pod removed from service
  • No traffic sent
  • No pod restart
  • More aggressive thresholds OK

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions