|
1 | 1 | # Spring Boot Security & Observability Lab |
2 | 2 |
|
3 | | -This repository is a hands-on lab designed to demonstrate the architectural evolution of a modern Java application. We will build a system from the ground up, starting with a secure monolith and progressively refactoring it into a fully observable, distributed system using cloud-native best practices. |
| 3 | +This repository is an advanced, hands-on lab demonstrating the architectural evolution of a modern Java application. We will build a system from the ground up, starting with a secure monolith and progressively refactoring it into a fully observable, distributed system using cloud-native best practices. |
4 | 4 |
|
5 | 5 | --- |
6 | 6 |
|
7 | | -## Lab Progress: Phase 5 - Correlated Logs & Access Auditing |
| 7 | +## Workshop Guide: The Evolutionary Phases |
8 | 8 |
|
9 | | -The `main` branch currently represents the completed state of **Phase 5**. |
| 9 | +This lab is structured in distinct, self-contained phases. The `main` branch always represents the latest completed phase. To explore a previous phase's code and detailed documentation, use the links below. |
10 | 10 |
|
11 | | -* **Git Tag for this Phase:** `v5.0-correlated-logs-auditing` |
12 | | - |
13 | | -### Objective |
14 | | - |
15 | | -The goal of this phase was to complete the "three pillars of observability" by introducing a centralized, structured logging pipeline. We have also added a critical security layer by implementing a non-invasive, AOP-based audit logging mechanism. The system is now not only fully observable (metrics, traces, and logs), but all three pillars are correlated, allowing for seamless navigation from a distributed trace directly to the logs generated during that specific transaction. |
16 | | - |
17 | | -### Key Concepts Demonstrated |
18 | | - |
19 | | -* **Centralized Logging:** Introducing Grafana Loki as a scalable, efficient log aggregation system. |
20 | | -* **Unified Telemetry Collection:** Adopting Grafana Alloy as the modern, state-of-the-art agent for collecting **both logs and traces**, replacing older, single-purpose agents. |
21 | | -* **Docker Service Discovery:** Configuring Alloy to use the Docker socket to automatically discover and scrape logs from all running containers, creating a "zero-touch" logging pipeline that scales automatically. |
22 | | -* **Trace-to-Log Correlation:** Configuring Grafana to provide one-click navigation from a trace span in Tempo to the exact logs in Loki that correspond to that trace ID. |
23 | | -* **Structured JSON Logging:** Ensuring all application logs are emitted as single-line, machine-readable JSON, a critical prerequisite for reliable parsing and querying. |
24 | | -* **Aspect-Oriented Programming (AOP):** Creating a shared `lab-aspects` module to implement cross-cutting concerns without modifying business logic. |
25 | | -* **Custom Audit Logs & Metrics:** Building `@Auditable` aspect that generates rich, structured audit logs and corresponding Micrometer metrics (`Counter` and `Timer`) for security monitoring and alerting. |
26 | | -* **Multi-Module Maven Project:** Refactoring the build to support a shared library module and creating robust, multi-module-aware `Dockerfiles`. |
27 | | - |
28 | | -### Architecture Overview |
29 | | - |
30 | | -Phase 5 enriches our distributed system with a complete, correlated observability pipeline managed by Grafana Alloy. |
31 | | - |
32 | | -```mermaid |
33 | | -graph TD |
34 | | - subgraph "User's Machine" |
35 | | - B[Browser] |
36 | | - end |
37 | | -
|
38 | | - subgraph "Docker Compose Network (lab-net)" |
39 | | - P[Traefik Proxy] |
40 | | -
|
41 | | - subgraph "Application Services" |
42 | | - WC[Web Client] |
43 | | - RS[Resource Server] |
44 | | - end |
45 | | - |
46 | | - subgraph "Identity Services" |
47 | | - KC[Keycloak] |
48 | | - DB[(PostgreSQL)] |
49 | | - end |
50 | | -
|
51 | | - subgraph "Observability Stack" |
52 | | - A[Alloy Agent] |
53 | | - L[Loki] |
54 | | - T[Tempo] |
55 | | - Prom[Prometheus] |
56 | | - G[Grafana] |
57 | | - end |
58 | | - end |
59 | | -
|
60 | | - %% User and Service Flows (Unchanged) |
61 | | - B -- "User Interaction" --> P --> WC; |
62 | | - WC -- "Backend API Call" --> RS; |
63 | | - RS -- "Token Validation" --> KC; |
64 | | -
|
65 | | - %% NEW: Observability Data Flow |
66 | | - subgraph "Telemetry Collection" |
67 | | - RS -- "1a. Emits Traces (OTLP)" --> A; |
68 | | - WC -- "1b. Emits Traces (OTLP)" --> A; |
69 | | - RS -- "2a. Writes Logs (stdout)" --> Docker; |
70 | | - WC -- "2b. Writes Logs (stdout)" --> Docker; |
71 | | - Docker -- "3. Scraped by" --> A; |
72 | | - end |
73 | | - |
74 | | - subgraph "Telemetry Processing & Storage" |
75 | | - A -- "4a. Forwards Traces" --> T; |
76 | | - A -- "4b. Forwards Logs" --> L; |
77 | | - RS -- "5. Exposes Metrics" --> Prom; |
78 | | - WC -- "6. Exposes Metrics" --> Prom; |
79 | | - end |
80 | | -
|
81 | | - subgraph "Visualization" |
82 | | - G -- "Queries Traces" --> T; |
83 | | - G -- "Queries Logs" --> L; |
84 | | - G -- "Queries Metrics" --> Prom; |
85 | | - end |
86 | | -``` |
87 | | - |
88 | | -1. **[Grafana Loki](config/loki/loki-config.yml):** The new log storage backend. It is configured to run in a simple, single-tenant mode and stores its data in a persistent Docker volume. |
89 | | -2. **[Grafana Alloy](config/alloy/alloy-config.river):** The new heart of our collection pipeline. It performs two critical functions: |
90 | | - * **Log Collection:** It connects to the Docker socket to discover our running application containers, scrapes their `stdout` log streams, and forwards them to Loki. |
91 | | - * **Trace Collection:** It acts as an OTLP endpoint, receiving traces from our applications' Java agents and forwarding them to Tempo. |
| 11 | +| Phase | Description & Key Concepts | Code & Docs (at tag) | Key Pull Requests | |
| 12 | +|:-----------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
| 13 | +| **1. The Secure Monolith** | A standalone service that issues and validates its own JWTs. Concepts: `AuthenticationManager`, custom `JwtAuthenticationFilter`, `jjwt` library, and a foundational CI pipeline. | [`v1.0-secure-monolith`](https://github.com/apenlor/spring-boot-security-observability-lab/tree/v1.0-secure-monolith) | [#2](https://github.com/apenlor/spring-boot-security-observability-lab/pull/2), [#3](https://github.com/apenlor/spring-boot-security-observability-lab/pull/3), [#4](https://github.com/apenlor/spring-boot-security-observability-lab/pull/4) | |
| 14 | +| **2. Observing the Monolith** | The service is containerized and orchestrated via `docker-compose`. Concepts: Micrometer, Prometheus, Grafana, custom metrics, and automated dashboard provisioning. | [`v2.0-observable-monolith`](https://github.com/apenlor/spring-boot-security-observability-lab/tree/v2.0-observable-monolith) | [#6](https://github.com/apenlor/spring-boot-security-observability-lab/pull/6) | |
| 15 | +| **3. Evolving to Federated Identity** | The system is refactored into a multi-service architecture with an external IdP. Concepts: Keycloak, OIDC, OAuth2 Client (`web-client`) vs. Resource Server, Traefik reverse proxy, service-to-service security. | [`v3.0-federated-identity`](https://github.com/apenlor/spring-boot-security-observability-lab/tree/v3.0-federated-identity) | [#8](https://github.com/apenlor/spring-boot-security-observability-lab/pull/8) | |
| 16 | +| **4. Tracing a Distributed System** | Services are instrumented with the OpenTelemetry agent to generate traces. Concepts: Tempo, agent-based instrumentation, W3C Trace Context, Service Graphs, and a hybrid PUSH/PULL metrics architecture. | [`v4.0-distributed-tracing`](https://github.com/apenlor/spring-boot-security-observability-lab/tree/v4.0-distributed-tracing) | [#10](https://github.com/apenlor/spring-boot-security-observability-lab/pull/10) | |
| 17 | +| **5. Correlated Logs & Access Auditing** | The three pillars of observability are complete (metrics, traces, logs). Alloy is the unified collection agent. Concepts: Loki, Grafana Alloy, Docker service discovery, structured JSON logs, AOP-based auditing, trace-to-log correlation, and detailed audit metrics. | [`v5.0-correlated-logs-auditing`](https://github.com/apenlor/spring-boot-security-observability-lab/tree/v5.0-correlated-logs-auditing) | [#12](https://github.com/apenlor/spring-boot-security-observability-lab/pull/12) | |
| 18 | +| **6. Proactive Alerting** | _Upcoming..._ | - | - | |
| 19 | +| **7. Continuous Security Integration** | _Upcoming..._ | - | - | |
| 20 | +| **8. Advanced Secret Management** | _Upcoming..._ | - | - | |
92 | 21 |
|
93 | 22 | --- |
94 | 23 |
|
95 | | -### Key Configuration Details |
| 24 | +## How to Follow This Lab |
96 | 25 |
|
97 | | -#### 1. Grafana Alloy & Docker Service Discovery |
98 | | - |
99 | | -To achieve a fully automated logging pipeline the `alloy` service in our `docker-compose.yml` mounts the host's Docker socket (`/var/run/docker.sock`) in read-only mode. |
100 | | - |
101 | | -This is a privileged operation, and the decision to use it is a deliberate architectural trade-off, as documented in the `docker-compose.yml`'s security disclaimer. |
102 | | -* **Benefit:** Alloy can query the Docker API to automatically discover every container on our project's network. It gets rich metadata like the `container_name` for free, which it uses to create labels in Loki. This means we can add new services, and our logging pipeline will **automatically start collecting their logs** with zero configuration changes. |
103 | | -* **Mitigation:** The risk is managed by using the official, minimalist Grafana Alloy image and mounting the socket as **read-only**. Anyway, not recommendable for production environments. |
104 | | - |
105 | | -The [Alloy configuration](config/alloy/alloy-config.river) is written in the River (`.river`) language and defines a clear pipeline: discover Docker containers, filter them by our project's network, relabel them with a clean `container_name`, and forward their logs to Loki. |
106 | | - |
107 | | -#### 2. AOP-based Audit Logging |
108 | | - |
109 | | -To handle security auditing as a cross-cutting concern, we introduced a new, shared Maven module: [`lab-aspects`](lab-aspects). |
110 | | -* This module contains a custom `@Auditable` annotation and the `AuditLogAspect`. |
111 | | -* The aspect intercepts any method marked with `@Auditable` and performs two actions: |
112 | | - 1. **Logs a Structured Event:** It uses SLF4J's Fluent API to create a rich, nested JSON object containing detailed context about the event (principal, roles, outcome, duration, sanitized request details, and exception info). These are logged to a dedicated `AUDIT` logger. |
113 | | - 2. **Emits Metrics:** It records a `Counter` (`app.audit.events.total`) and a `Timer` (`app_audit_events_duration_seconds`) for every audit event. These metrics are tagged with low-cardinality labels (`method`, `outcome`), making them perfect for building high-performance dashboards and alerts. |
114 | | - |
115 | | -This implementation is fully tested with its own integration test suite, which validates every feature, including the metric emission and context handling. |
| 26 | +1. **Start with the `main` branch** to see the latest state of the project. |
| 27 | +2. To go back in time, use the **"Code & Docs" link** for a specific phase. This will show you the `README.md` for that phase, which contains the specific instructions and examples for that version of the code. |
| 28 | +3. To understand the *"why"* behind the changes, review the **Key Pull Requests** for each phase. |
116 | 29 |
|
117 | 30 | --- |
118 | 31 |
|
119 | | -## Local Development & Quick Start |
120 | | - |
121 | | -The prerequisites and setup are the same as in previous phases. |
122 | | - |
123 | | -1. **Configure Local Hostnames (One-Time Setup, if not already done):** |
124 | | - Edit your local `hosts` file to add: |
125 | | - ``` |
126 | | - 127.0.0.1 keycloak.local |
127 | | - ``` |
128 | | -2. **Create and Configure Your Environment File:** |
129 | | - ```bash |
130 | | - cp .env.example .env |
131 | | - # ...then edit .env to add your WEB_CLIENT_SECRET from Keycloak. |
132 | | - ``` |
133 | | -3. **Build and run the entire stack:** |
134 | | - ```bash |
135 | | - docker-compose up --build -d |
136 | | - ``` |
137 | | -4. **Access the Services:** |
138 | | - * **Web Client Application:** [http://localhost:8082](http://localhost:8082) (Login with `lab-user`/`lab-user` or |
139 | | - `lab-admin`/`lab-admin`) |
140 | | - * **Keycloak Admin Console:** [http://keycloak.local](http://keycloak.local) (Login with `admin`/`admin`) |
141 | | - * **Traefik Dashboard:** [http://localhost:8080](http://localhost:8080) |
142 | | - * **Prometheus UI:** [http://localhost:9090](http://localhost:9090) |
143 | | - * **Grafana UI:** [http://localhost:3000](http://localhost:3000) (Login with `admin`/`admin`) |
144 | | ---- |
145 | | -
|
146 | | -## Validating the New Observability Features |
147 | | -
|
148 | | -1. **Generate Traffic:** Log in to the web client as `lab-user`/`lab-user` and click the "Call Secure API" and "Call Admin API" buttons several times. |
149 | | -
|
150 | | -2. **Validate Audit Logs:** |
151 | | - * In Grafana, go to Explore -> Loki. |
152 | | - * Run the query: `{container_name="resource-server"} | json | logger_name="AUDIT"` |
153 | | - * Inspect the logs. You will see the structured `audit` object with `outcome="SUCCESS"` for successful calls and `outcome="FAILURE"` for the denied admin call. |
154 | | -
|
155 | | -3. **Validate Audit Metrics:** |
156 | | - * In Grafana, go to Explore -> Prometheus. |
157 | | - * Run the query: `rate(app_audit_events_total{outcome="FAILURE"}[1m])` |
158 | | - * You should see the rate of failed audit events for the `getAdminData` method. |
| 32 | +## Running the Project |
159 | 33 |
|
160 | | -4. **Validate Trace-to-Log Correlation:** |
161 | | - * Find a trace in Tempo for a `GET /fetch-data` operation. |
162 | | - * Click on the span for the `resource-server`. |
163 | | - * In the span details panel, a **blue "Logs for this span" button** will be visible. |
164 | | - * Clicking it will open Loki and show you the exact logs—including the audit log—for that specific trace. |
| 34 | +To run the application and see usage examples for the **current phase**, please refer to the detailed instructions in its tagged `README.md` file. |
165 | 35 |
|
166 | | -#### Stop the Environment |
| 36 | +**[>> Go to instructions for the current phase: `v5.0-correlated-logs-auditing` <<](https://github.com/apenlor/spring-boot-security-observability-lab/blob/v5.0-correlated-logs-auditing/docs/phase-5-readme.md#local-development--quick-start)** |
167 | 37 |
|
168 | | -```bash |
169 | | -docker-compose down -v |
170 | | -``` |
| 38 | +As the lab progresses, this link will always be updated to point to the latest completed phase. |
0 commit comments