Skip to content

Conversation

lfelipediniz
Copy link

🚑 Add Health Check Endpoint to Dash Core (#3446)

This PR introduces a built-in /health endpoint for Dash applications, designed to support load balancers, Docker health checks, Kubernetes probes, monitoring scripts, and CI/CD pipelines.

✨ Key Features

  • Adds /health endpoint to all Dash apps

  • Configurable path support (respects routes_pathname_prefix)

  • Toggle to disable endpoint if needed

  • Includes detailed metadata:

    • Dash version, Python version, platform
    • Server info (host, port, debug mode)
    • System metrics (CPU, memory, disk usage via psutil)
    • Callback counts (total and background)
  • Includes comprehensive test suite with 10 test cases

  • Endpoint is unauthenticated by default to support external probes

✅ Example Output

{
  "status": "healthy",
  "timestamp": "2025-09-30T00:35:03.982409Z",
  "dash_version": "3.2.0",
  "python_version": "3.12.3",
  "platform": "Linux-6.8.0-71-generic-x86_64",
  "server_name": "my_dash_app",
  "debug_mode": true,
  "host": "127.0.0.1",
  "port": "8050",
  "system": {
    "cpu_percent": 17.8,
    "memory_percent": 37.9,
    "disk_percent": 52.6
  },
  "callbacks": {
    "total_callbacks": 4,
    "background_callbacks": 0
  }
}

🔧 Configuration Notes

  • The endpoint automatically adjusts to any routes_pathname_prefix
  • It is designed to be fast and lightweight (no callback triggering)
  • psutil is optional - system metrics are included only if available

🔬 Testing

A full suite of 10 tests has been added, covering:

  • Basic response
  • Disabled endpoint
  • Custom prefix support
  • Missing psutil
  • System info fallback
  • Edge cases with callback registration
  • Status code and JSON format

All tests pass locally :)

- Add configurable health endpoint to Dash core
- Support custom endpoint paths and disable option
- Include system metrics and callback information
- Add comprehensive test suite with 10 test cases
- Compatible with load balancers, Docker, Kubernetes

Resolves: Add health check endpoint to Dash framework
@gvwilson gvwilson added feature something new P2 considered for next cycle community community contribution labels Oct 1, 2025
@gvwilson
Copy link
Contributor

gvwilson commented Oct 1, 2025

@T4rk1n please let us know what you think - thanks

@ndrezn
Copy link
Member

ndrezn commented Oct 1, 2025

cc @selfeki

dash/dash.py Outdated
Comment on lines 986 to 998
def serve_health(self):
"""
Health check endpoint for monitoring Dash server status.
Returns a JSON response indicating the server is running and healthy.
This endpoint can be used by load balancers, monitoring systems,
and other platforms to check if the Dash server is operational.
:return: JSON response with status information
"""
import datetime
import platform
import psutil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there should be any code in the default handler, it should simply returns "OK" with response 200. The api should provide a way for the developer to add a function to handle the response and do it's health check properly by checking the needed services.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @T4rk1n, I would expect the /health response to be as simple as

{"status":"ok"}

The remaining metadata is still useful if we return it via a different endpoint, eg. /info

dash/dash.py Outdated
Comment on lines 986 to 1048
def serve_health(self):
"""
Health check endpoint for monitoring Dash server status.
Returns a JSON response indicating the server is running and healthy.
This endpoint can be used by load balancers, monitoring systems,
and other platforms to check if the Dash server is operational.
:return: JSON response with status information
"""
import datetime
import platform
import psutil
import sys

# Basic health information
health_data = {
"status": "healthy",
"timestamp": datetime.datetime.utcnow().isoformat() + "Z",
"dash_version": __version__,
"python_version": sys.version,
"platform": platform.platform(),
}

# Add server information if available
try:
health_data.update({
"server_name": self.server.name,
"debug_mode": self.server.debug,
"host": getattr(self.server, 'host', 'unknown'),
"port": getattr(self.server, 'port', 'unknown'),
})
except Exception:
pass

# Add system resource information if psutil is available
try:
health_data.update({
"system": {
"cpu_percent": psutil.cpu_percent(interval=0.1),
"memory_percent": psutil.virtual_memory().percent,
"disk_percent": psutil.disk_usage('/').percent if os.name != 'nt' else psutil.disk_usage('C:').percent,
}
})
except ImportError:
# psutil not available, skip system metrics
pass
except Exception:
# Error getting system metrics, skip them
pass

# Add callback information
try:
health_data.update({
"callbacks": {
"total_callbacks": len(self.callback_map),
"background_callbacks": len(getattr(self, '_background_callback_map', {})),
}
})
except Exception:
pass

return flask.jsonify(health_data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normal health check should return "OK", 200

Copy link

@scjody scjody left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great idea! I'll let those more familiar with the codebase give a detailed view.

I'd like to see a /ready endpoint as well. Can one be added as part of this work? It should be possible to reuse a lot of this code. To summarize the difference:

  • Health checks ("is this container responding at all?") is relevant to the cluster scheduler.
  • Readiness checks ("is this container able to respond to this class of requests right now?") are relevant to the load balancer.

For example, if a service has an external database connection, one might want to have the health check just make sure the app can respond to traffic, but have the readiness check test that the DB is reachable and can perform a basic query.

@lfelipediniz
Copy link
Author

Thanks for the analysis, guys!

So, after reading what you sent, I believe that for this issue focused on the /health route, I should make the following changes:

  • Return a simple OK with HTTP 200
  • Make it optional: health_endpoint=None by default
  • Keep system metadata optional and disabled by default
  • Protect the psutil import and run it without it
  • Ensure it respects routes_pathname_prefix
  • Update the tests and documentation

Regarding /ready: great suggestion. It makes sense to open a follow-up issue and a separate PR to add a readiness endpoint with pluggable dependency checks.

Does this plan work for ?

@ndrezn
Copy link
Member

ndrezn commented Oct 1, 2025

Looks 👍 , but I would recommend removing the system metadata from this PR and creating a second PR for that feature; feels like there's some more discussion to be had on whether that should be in a new endpoint that I don't think is worth blocking the initial implementation for.

- Add optional health_endpoint parameter to Dash constructor (default: None)
- Implement simple health check endpoint returning 'OK' with HTTP 200
- Health endpoint respects routes_pathname_prefix configuration
- Add comprehensive unit tests for health endpoint functionality
- Remove integration tests in favor of simpler unit tests
- Apply code formatting with black
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community community contribution feature something new P2 considered for next cycle
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants