|
1 | 1 | # Spring Boot Security & Observability Lab |
2 | 2 |
|
3 | | -This repository is a hands-on lab designed to demonstrate the architectural evolution of a modern Java application. We will build a system from the ground up, starting with a secure monolith and progressively refactoring it into a fully observable, distributed system using cloud-native best practices. |
| 3 | +This repository is an advanced, hands-on lab demonstrating the architectural evolution of a modern Java application. We will build a system from the ground up, starting with a secure monolith and progressively refactoring it into a fully observable, distributed system using cloud-native best practices. |
4 | 4 |
|
5 | 5 | --- |
6 | 6 |
|
7 | | -## Lab Progress: Phase 6 - Proactive Alerting with Alertmanager |
| 7 | +## Workshop Guide: The Evolutionary Phases |
8 | 8 |
|
9 | | -The `main` branch currently represents the completed state of **Phase 6**. |
| 9 | +This lab is structured in distinct, self-contained phases. The `main` branch always represents the latest completed phase. To explore a previous phase's code and detailed documentation, use the links below. |
10 | 10 |
|
11 | | -* **Git Tag for this Phase:** `v6.0-proactive-alerting` |
12 | | - |
13 | | -### Objective |
14 | | - |
15 | | -The goal of this phase was to transition our monitoring strategy from passive (dashboards) to **proactive**. We have integrated the Prometheus Alertmanager into our stack to create a system that can automatically detect and route notifications about problems, without requiring a human to be watching a screen. This demonstrates the completion of a production-grade monitoring feedback loop. |
16 | | - |
17 | | -### Key Concepts Demonstrated |
18 | | - |
19 | | -* **Prometheus Alerting Pipeline:** Understanding the distinct roles of Prometheus (which evaluates rules and generates alerts) and Alertmanager (which receives, de-duplicates, groups, and routes alerts). |
20 | | -* **Declarative Alerting Rules:** Defining alerting conditions as code using PromQL expressions in a version-controlled YAML file. |
21 | | -* **Alerting on Technical & Security Metrics:** Creating two distinct types of alerts: |
22 | | - 1. A **technical alert** (`ApiServerErrorRateHigh`) that fires on infrastructure-level signals like a spike in 5xx server errors. |
23 | | - 2. A **security alert** (`UnauthorizedAdminAccessSpike`) that fires on application-level signals, such as an abnormal rate of `4xx` errors on a privileged endpoint. |
24 | | -* **Alert Lifecycle:** Observing the full lifecycle of an alert: `Inactive` -> `Pending` -> `Firing` -> `Resolved`. |
25 | | -* **UI-Driven Test Harness:** Building a dedicated "Alerting Test Panel" in our web application to reliably trigger alert conditions on demand, proving the entire pipeline works end-to-end. |
26 | | - |
27 | | -### Architecture Overview |
28 | | - |
29 | | -Phase 6 introduces Alertmanager and connects it to our existing Prometheus instance. The data flow for alerting is now a core part of our observability stack. |
30 | | - |
31 | | -```mermaid |
32 | | -graph TD |
33 | | - subgraph "Application Services" |
34 | | - RS[Resource Server] |
35 | | - WC[Web Client] |
36 | | - end |
37 | | -
|
38 | | - subgraph "Observability Stack" |
39 | | - Prom[Prometheus] -->|1. Scrapes Metrics| RS |
40 | | - Prom -->|1. Scrapes Metrics| WC |
41 | | - |
42 | | - subgraph "Alerting Pipeline" |
43 | | - Rules[alerts.yml] -->|2. Evaluates| Prom |
44 | | - Prom -->|3. Sends Firing Alerts| AM[Alertmanager] |
45 | | - end |
46 | | -
|
47 | | - G[Grafana] |
48 | | - end |
49 | | - |
50 | | - subgraph "Operators / External Systems" |
51 | | - AM -->|4. Routes Notifications| Notif[Email, Slack, etc.] |
52 | | - Ops[Operator] -->|Views & Manages Alerts| AM |
53 | | - Ops -->|Views Dashboards| G |
54 | | - end |
55 | | -``` |
56 | | - |
57 | | -1. **[Prometheus](config/prometheus/prometheus.yml):** Its role is expanded. It is now configured to load a [rule file](config/prometheus/alerts.yml) and to send any alerts that become "Firing" to the Alertmanager service. The `--web.external-url` flag is set to ensure backlinks are generated with a browser-resolvable hostname. |
58 | | -2. **[Alertmanager](config/alertmanager/alertmanager.yml):** The new central hub for all alerts. It receives alerts from Prometheus, groups them to reduce noise, and would (in a production setup) route them to configured receivers. For this lab, we use a "null" receiver. |
| 11 | +| Phase | Description & Key Concepts | Code & Docs (at tag) | Key Pull Requests | |
| 12 | +|:-----------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
| 13 | +| **1. The Secure Monolith** | A standalone service that issues and validates its own JWTs. Concepts: `AuthenticationManager`, custom `JwtAuthenticationFilter`, `jjwt` library, and a foundational CI pipeline. | [`v1.0-secure-monolith`](https://github.com/apenlor/spring-boot-security-observability-lab/tree/v1.0-secure-monolith) | [#2](https://github.com/apenlor/spring-boot-security-observability-lab/pull/2), [#3](https://github.com/apenlor/spring-boot-security-observability-lab/pull/3), [#4](https://github.com/apenlor/spring-boot-security-observability-lab/pull/4) | |
| 14 | +| **2. Observing the Monolith** | The service is containerized and orchestrated via `docker-compose`. Concepts: Micrometer, Prometheus, Grafana, custom metrics, and automated dashboard provisioning. | [`v2.0-observable-monolith`](https://github.com/apenlor/spring-boot-security-observability-lab/tree/v2.0-observable-monolith) | [#6](https://github.com/apenlor/spring-boot-security-observability-lab/pull/6) | |
| 15 | +| **3. Evolving to Federated Identity** | The system is refactored into a multi-service architecture with an external IdP. Concepts: Keycloak, OIDC, OAuth2 Client (`web-client`) vs. Resource Server, Traefik reverse proxy, service-to-service security. | [`v3.0-federated-identity`](https://github.com/apenlor/spring-boot-security-observability-lab/tree/v3.0-federated-identity) | [#8](https://github.com/apenlor/spring-boot-security-observability-lab/pull/8) | |
| 16 | +| **4. Tracing a Distributed System** | Services are instrumented with the OpenTelemetry agent to generate traces. Concepts: Tempo, agent-based instrumentation, W3C Trace Context, Service Graphs, and a hybrid PUSH/PULL metrics architecture. | [`v4.0-distributed-tracing`](https://github.com/apenlor/spring-boot-security-observability-lab/tree/v4.0-distributed-tracing) | [#10](https://github.com/apenlor/spring-boot-security-observability-lab/pull/10) | |
| 17 | +| **5. Correlated Logs & Access Auditing** | The three pillars of observability are complete (metrics, traces, logs). Alloy is the unified collection agent. Concepts: Loki, Grafana Alloy, Docker service discovery, structured JSON logs, AOP-based auditing, trace-to-log correlation, and detailed audit metrics. | [`v5.0-correlated-logs-auditing`](https://github.com/apenlor/spring-boot-security-observability-lab/tree/v5.0-correlated-logs-auditing) | [#12](https://github.com/apenlor/spring-boot-security-observability-lab/pull/12) | |
| 18 | +| **6. Proactive Alerting** | The system transitions from passive to proactive monitoring. Concepts: Alertmanager, declarative PromQL alert rules, alerting on technical vs. security metrics, and a UI-driven test harness. | [`v6.0-proactive-alerting`](https://github.com/apenlor/spring-boot-security-observability-lab/tree/v6.0-proactive-alerting) | [#14](https://github.com/apenlor/spring-boot-security-observability-lab/pull/14) | |
| 19 | +| **7. Continuous Security Integration** | _Upcoming..._ | - | - | |
| 20 | +| **8. Advanced Secret Management** | _Upcoming..._ | - | - | |
59 | 21 |
|
60 | 22 | --- |
61 | 23 |
|
62 | | -### Key Configuration Details |
| 24 | +## How to Follow This Lab |
63 | 25 |
|
64 | | -#### 1. Prometheus Alert Rules |
65 | | - |
66 | | -The core of this phase is the [alerts.yml](config/prometheus/alerts.yml) file. We have defined two rules that are specifically tailored for our application and optimized for a lab environment with short `for` durations for rapid testing. |
67 | | - |
68 | | -* **`ApiServerErrorRateHigh`:** This rule fires when the rate of `5xx` status codes from the `resource-server` exceeds 0 for a continuous period. It is designed to be triggered by our `ChaosController`. |
69 | | -* **`UnauthorizedAdminAccessSpike`:** This security-focused rule fires when the rate of `4xx` status codes on the specific `/api/secure/admin` endpoint exceeds 0. This is more robust than checking for just `403` as it captures any client-side error on this privileged endpoint, signaling a potential issue. |
70 | | - |
71 | | -#### 2. UI-Driven Test Harness |
72 | | - |
73 | | -To validate the entire alerting pipeline, we implemented a dedicated "Alerting Test Panel" in the `web-client`. |
74 | | -* The `ChaosController` in the `resource-server` was enhanced with a guaranteed-failure endpoint (`/api/chaos/error`). |
75 | | -* The `WebController` in the `web-client` was updated with two new `POST` endpoints that call the backend to generate `5xx` and `4xx` errors. |
76 | | - |
77 | | ---- |
78 | | - |
79 | | -## Local Development & Quick Start |
80 | | - |
81 | | -The prerequisites and setup are the same as in previous phases. |
82 | | - |
83 | | -1. **Configure Local Hostnames (One-Time Setup, if not already done):** |
84 | | - Edit your local `hosts` file to add: |
85 | | - ``` |
86 | | - 127.0.0.1 keycloak.local |
87 | | - ``` |
88 | | -2. **Create and Configure Your Environment File:** |
89 | | - ```bash |
90 | | - cp .env.example .env |
91 | | - # ...then edit .env to add your WEB_CLIENT_SECRET from Keycloak. |
92 | | - ``` |
93 | | -3. **Build and run the entire stack:** |
94 | | - ```bash |
95 | | - docker-compose up --build -d |
96 | | - ``` |
97 | | -4. **Access the Services:** |
98 | | - * **Web Client Application:** [http://localhost:8082](http://localhost:8082) (Login with `lab-user`/`lab-user` or `lab-admin`/`lab-admin`) |
99 | | - * **Keycloak Admin Console:** [http://keycloak.local](http://keycloak.local) (Login with `admin`/`admin`) |
100 | | - * **Prometheus UI:** [http://localhost:9090](http://localhost:9090) |
101 | | - * **Alertmanager UI:** [http://localhost:9093](http://localhost:9093) |
102 | | - * **Grafana UI:** [http://localhost:3000](http://localhost:3000) |
| 26 | +1. **Start with the `main` branch** to see the latest state of the project. |
| 27 | +2. To go back in time, use the **"Code & Docs" link** for a specific phase. This will show you the `README.md` for that phase, which contains the specific instructions and examples for that version of the code. |
| 28 | +3. To understand the *"why"* behind the changes, review the **Key Pull Requests** for each phase. |
103 | 29 |
|
104 | 30 | --- |
105 | 31 |
|
106 | | -## Validating the New Alerting Features |
107 | | -
|
108 | | -1. **Confirm Rules are Loaded:** |
109 | | - * Navigate to the Prometheus UI's "Alerts" tab ([http://localhost:9090/alerts](http://localhost:9090/alerts)). |
110 | | - * Verify that both new alerts are present and in the green "Inactive" state. |
| 32 | +## Running the Project |
111 | 33 |
|
112 | | -2. **Trigger the Alerts via the UI:** |
113 | | - * Log in to the Web Client as **`lab-user` / `lab-user`**. |
114 | | - * In the "Alerting Test Panel", repeatedly click the buttons to generate `403` and `5xx` errors. |
115 | | - * Watch the Prometheus Alerts UI. The alerts will transition from `Inactive` to `Pending` (yellow) and then to `Firing` (red). |
116 | | - * Once firing, the alerts will appear in the Alertmanager UI. |
| 34 | +To run the application and see usage examples for the **current phase**, please refer to the detailed instructions in its tagged `README.md` file. |
117 | 35 |
|
118 | | -#### Stop the Environment |
| 36 | +**[>> Go to instructions for the current phase: `v6.0-proactive-alerting` <<](https://github.com/apenlor/spring-boot-security-observability-lab/tree/v6.0-proactive-alerting?tab=readme-ov-file#local-development--quick-start)** |
119 | 37 |
|
120 | | -```bash |
121 | | -docker-compose down -v |
122 | | -``` |
| 38 | +As the lab progresses, this link will always be updated to point to the latest completed phase. |
0 commit comments