Skip to content

Commit 3955cb5

Browse files
authored
feat(observability): implement distributed tracing with OpenTelemetry (#10)
* feat(observability): instrument web-client with Prometheus metrics * feat(infra): add and configure Tempo tracing backend * feat(app): instrument services with OpenTelemetry agent * docs(readme): create detailed README guide for Phase 4
1 parent a46c031 commit 3955cb5

File tree

15 files changed

+747
-103
lines changed

15 files changed

+747
-103
lines changed

README.md

Lines changed: 75 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,92 @@
11
# Spring Boot Security & Observability Lab
22

3-
This repository is an advanced, hands-on lab demonstrating the architectural evolution of a modern Java application. We will build a system from the ground up, starting with a secure monolith and progressively refactoring it into a fully observable, distributed system using cloud-native best practices.
3+
This repository is a hands-on lab designed to demonstrate the architectural evolution of a modern Java application. We will build a system from the ground up, starting with a secure monolith and progressively refactoring it into a fully observable, distributed system using cloud-native best practices.
44

55
---
66

7-
## Workshop Guide: The Evolutionary Phases
7+
## Lab Progress: Phase 4 - Tracing a Distributed System
88

9-
This lab is structured in distinct, self-contained phases. The `main` branch always represents the latest completed phase. To explore a previous phase's code and detailed documentation, use the links below.
9+
The `main` branch currently represents the completed state of **Phase 4**.
1010

11-
| Phase | Description & Key Concepts | Code & Docs (at tag) | Key Pull Requests |
12-
|:-----------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
13-
| **1. The Secure Monolith** | A standalone service that issues and validates its own JWTs. Concepts: `AuthenticationManager`, custom `JwtAuthenticationFilter`, `jjwt` library, and a foundational CI pipeline. | [`v1.0-secure-monolith`](https://github.com/apenlor/spring-boot-security-observability-lab/blob/v1.0-secure-monolith/README.md) | [#2](https://github.com/apenlor/spring-boot-security-observability-lab/pull/2), [#3](https://github.com/apenlor/spring-boot-security-observability-lab/pull/3), [#4](https://github.com/apenlor/spring-boot-security-observability-lab/pull/4) |
14-
| **2. Observing the Monolith** | The service is containerized and orchestrated via `docker-compose`. Concepts: Micrometer, Prometheus, Grafana, custom metrics, and automated dashboard provisioning. | [`v2.0-observable-monolith`](https://github.com/apenlor/spring-boot-security-observability-lab/blob/v2.0-observable-monolith/README.md) | [#6](https://github.com/apenlor/spring-boot-security-observability-lab/pull/6) |
15-
| **3. Evolving to Federated Identity** | The system is refactored into a multi-service architecture with an external IdP. Concepts: Keycloak, OIDC, OAuth2 Client (`web-client`) vs. Resource Server, Traefik reverse proxy, service-to-service security. | [`v3.0-federated-identity`](https://github.com/apenlor/spring-boot-security-observability-lab/blob/v3.0-federated-identity/README.md) | [#8](https://github.com/apenlor/spring-boot-security-observability-lab/pull/8) |
16-
| **4. Tracing a Distributed System** | _Upcoming..._ | - | - |
17-
| **5. Correlated Logs & Access Auditing** | _Upcoming..._ | - | - |
18-
| **6. Proactive Alerting** | _Upcoming..._ | - | - |
19-
| **7. Continuous Security Integration** | _Upcoming..._ | - | - |
20-
| **8. Advanced Secret Management** | _Upcoming..._ | - | - |
11+
* **Git Tag for this Phase:** `v4.0-distributed-tracing`
12+
* **Key Pull Request for this Phase:** [#10 - feat(observability): implement distributed tracing with OpenTelemetry](https://github.com/apenlor/spring-boot-security-observability-lab/pull/10)
13+
14+
### Objective
15+
16+
The goal of this phase was to complete the "three pillars of observability" by adding **distributed tracing**. We have instrumented our services to generate trace data, allowing us to gain end-to-end visibility of a single request as it travels from the user's browser, through the `web-client`, and into the backend `resource-server`. This transforms our monitoring from service-level metrics to transaction-level insights.
17+
18+
### Key Concepts Demonstrated
19+
20+
* **Agent-Based Auto-Instrumentation:** Using the OpenTelemetry Java Agent to instrument our Spring Boot applications with zero code changes.
21+
* **Distributed Context Propagation:** Understanding how the W3C Trace Context (`traceparent` header) is automatically propagated between services to link individual spans into a single, cohesive trace.
22+
* **Trace Visualization:** Storing and visualizing traces in Grafana Tempo, including the parent-child relationships between spans in a waterfall diagram.
23+
* **Service Graph Generation:** Configuring Tempo's `metrics_generator` to process trace data and automatically generate the metrics required to build a live service graph, visualizing the topology and dependencies of our system.
24+
* **Hybrid PUSH/PULL Metrics Architecture:** Implementing the official, production-grade pattern where Tempo **pushes** trace-derived metrics to Prometheus via `remote_write`, while Prometheus continues to **pull** operational metrics from other services via scraping.
25+
* **Anatomy of a Trace:** Identifying and analyzing the core components of a trace (TraceID, Span, SpanID, Parent SpanID) within the Grafana UI.
26+
27+
### Architecture Overview
28+
29+
Phase 4 introduces Grafana Tempo as the tracing backend and integrates it deeply with our existing observability stack. The OpenTelemetry agent is attached to each Java service to produce and send trace data.
30+
31+
```mermaid
32+
graph TD
33+
subgraph "Application Containers"
34+
A[Web Client + OTel Agent] -->|"Trace Data (OTLP/gRPC)"| T[Tempo]
35+
B[Resource Server + OTel Agent] -->|"Trace Data (OTLP/gRPC)"| T
36+
end
37+
38+
subgraph "Observability Stack"
39+
T -- "1. Processes traces" --> TG[Metrics Generator]
40+
TG -- "2. PUSHES trace-derived metrics" --> P[Prometheus]
41+
B -- "3. PULLS operational metrics" --> P
42+
A -- "4. PULLS operational metrics" --> P
43+
44+
G[Grafana] -- "Queries Traces" --> T
45+
G -- "Queries Metrics" --> P
46+
end
47+
48+
U[Developer/User] -->|"Views Traces & Service Graph"| G
49+
```
50+
51+
1. **[OpenTelemetry Agent](config/otel/opentelemetry-javaagent.jar):** A Java agent attached to both the `web-client` and `resource-server` at startup. It automatically instruments common frameworks (like Spring Web MVC and `WebClient`) to create spans and propagate the trace context.
52+
2. **[Grafana Tempo](config/tempo/tempo.yml):** Our new tracing backend. It ingests trace data via the OTLP protocol, stores it, and makes it queryable by Grafana.
53+
3. **Metrics Generator:** A component within Tempo that is now configured to process these traces in the background. It generates aggregate metrics (request counts, latency, errors) that are essential for building the service graph.
54+
4. **Prometheus & Grafana:** Their roles are enhanced. Prometheus now receives metrics *pushed* from Tempo in addition to its regular scraping. Grafana's Tempo data source is linked to its Prometheus data source, allowing it to correlate traces with metrics and render the service graph.
2155

2256
---
2357

24-
## How to Follow This Lab
58+
## Local Development & Quick Start
2559

26-
1. **Start with the `main` branch** to see the latest state of the project.
27-
2. To go back in time, use the **"Code & Docs" link** for a specific phase. This will show you the `README.md` for that phase, which contains the specific instructions and examples for that version of the code.
28-
3. To understand the *"why"* behind the changes, review the **Key Pull Requests** for each phase.
60+
The prerequisites and startup process are the same as Phase 3.
61+
62+
1. **Configure Local Hostnames (If you haven't already):**
63+
Ensure your `/etc/hosts` file contains `127.0.0.1 keycloak.local`.
64+
65+
2. **Build and run the entire stack:**
66+
From the project root, run the Docker Compose `up` command.
67+
```bash
68+
docker-compose up --build -d
69+
```
2970

3071
---
3172

32-
## Running the Project
73+
## Usage Example: Viewing a Distributed Trace
74+
75+
1. **Generate a Trace:**
76+
* Navigate to the web client at `http://localhost:8082`.
77+
* Log in (e.g., `lab-user`/`lab-user`).
78+
* Click the **"Fetch Secure Data from API"** button a few times. This action creates a request that travels from the `web-client` to the `resource-server`.
3379

34-
To run the application and see usage examples for the **current phase**, please refer to the detailed instructions in its tagged `README.md` file.
80+
2. **Find the Trace in Grafana:**
81+
* Navigate to Grafana at `http://localhost:3000`.
82+
* Go to the **Explore** view (compass icon on the left).
83+
* Select the **Tempo** data source from the dropdown at the top.
84+
* In the "Search" panel, select `web-client` from the "Service Name" dropdown and click "Run query".
85+
* Find and click on a trace named **`GET /fetch-data`**.
3586

36-
**[>> Go to instructions for the current phase: `v3.0-federated-identity` <<](https://github.com/apenlor/spring-boot-security-observability-lab/blob/v3.0-federated-identity/README.md#local-development--quick-start)**
87+
3. **Analyze the Trace Waterfall:**
88+
You will see a diagram showing the parent span from the `web-client` and, nested underneath it, the child span from the `resource-server`, proving the end-to-end trace was captured successfully.
3789

38-
As the lab progresses, this link will always be updated to point to the latest completed phase.
90+
4. **View the Service Graph:**
91+
* While still in the Tempo Explore view, click the **"Service Graph"** tab.
92+
* After waiting a minute for the metrics to generate and be scraped, the graph will appear, visually confirming the dependency between the `web-client` and `resource-server`.

0 commit comments

Comments
 (0)