Skip to content

**Docker build and deployment improvements:**#22

Merged
zlovtnik merged 7 commits intomainfrom
dev
Nov 25, 2025
Merged

**Docker build and deployment improvements:**#22
zlovtnik merged 7 commits intomainfrom
dev

Conversation

@zlovtnik
Copy link
Copy Markdown
Owner

@zlovtnik zlovtnik commented Nov 23, 2025

This pull request introduces significant improvements to the Docker build and deployment workflow for the project. The main changes include a new multi-stage Docker build that separates backend and frontend builds, adds support for build-time configuration via arguments, and enhances security and flexibility for SSL certificate handling. The documentation has also been updated to reflect these new capabilities and provide clear usage instructions.

Docker build and deployment improvements:

  • Refactored Dockerfile to use multi-stage builds for backend (eclipse-temurin:21-jdk-alpine) and frontend (node:20-alpine), improving build efficiency and separation of concerns. [1] [2]
  • Added build-time arguments FRONTEND_DIST_PATH and CERT_CN to allow customization of frontend output location and SSL certificate Common Name, with validation steps to prevent misconfiguration.
  • Integrated nginx into the final runtime image, with automatic generation of self-signed SSL certificates for local development and validation of nginx configuration at build time.
  • Enhanced file permissions and ownership setup for static assets and application files, improving security and compatibility with container best practices.

Documentation updates:

  • Updated README.md to document new Docker build arguments, usage examples, production warnings for SSL certificates, and Docker Compose integration, making deployment steps clearer and safer for developers.

Build context optimization:

  • Added a comprehensive .dockerignore file to exclude unnecessary files and directories from the Docker build context, reducing image size and build times.

Summary by CodeRabbit

  • New Features

    • Full Analytics area (Dashboard, Performance, Usage, Reports) with charts, CSV/JSON exports and lazy loading.
    • Backend observability: slow-query detection, recommendation persistence, index-effectiveness analysis, and latency analysis.
  • Infrastructure

    • Multi-stage Docker build, HTTPS/nginx startup script, non-root runtime, expanded ports, Logstash & Kibana additions and ELK/ILM templates.
  • Database

    • New migrations and tables for persisted queries, audits, user settings; data-consistency fixes and schema updates.
  • Bug Fixes

    • CORS/JWT preflight handling, startup robustness, account deactivation constraint fixes.
  • Documentation

    • README expanded for build-time options; several internal docs consolidated or removed.

✏️ Tip: You can customize this high-level summary in your review settings.

…tage Dockerfile

Refactor Dockerfile to use multi-stage builds for backend and frontend, add Nginx for serving frontend with SSL support, expose ports 80/443/8443, and update start script. Expand observability roadmap section. Enhance BatchMetricsRecorder with cache update for batch operation consistency.
…and docs

Enhance Dockerfile with ARG FRONTEND_DIST_PATH for customizable frontend build output, add build-time validation to fail fast on missing assets, introduce CERT_CN for self-signed certs, validate nginx config, and update comments for better context. Refine SQL query normalization for consistent whitespace handling to improve matching accuracy. Update README with Docker build arguments guide for deployment flexibility across environments.
@vercel
Copy link
Copy Markdown

vercel bot commented Nov 23, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
spring-graphql Ready Ready Preview Comment Nov 25, 2025 0:08am

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Nov 23, 2025

Walkthrough

Adds a full observability and analytics stack: multi-stage Docker build (frontend + backend), nginx/start scripts with certs, frontend Analytics module and pages/services, backend index/slow-query analyzers and persistence, many DB migrations/PL/SQL updates, tracing/latency tooling, logging/logstash, and large docs and build updates.

Changes

Cohort / File(s) Summary
Docker & runtime
\.dockerignore, Dockerfile, docker-compose.yml, docker-start.sh, nginx.conf
Multi-stage build split (backend/frontend), FRONTEND_DIST_PATH & CERT_CN args, self-signed certs, nginx SPA routing and caching, expose 80/443/8443, improved healthchecks, graceful start/stop script, logstash/Kibana services added.
Frontend analytics & UI
frontend/src/app/app.routes.ts, frontend/src/app/app.component.html, frontend/src/app/features/analytics/*, frontend/src/styles.css
New AnalyticsModule, lazy routes and safe loaders, AnalyticsService + data types, AnalyticsComponent and four page components (dashboard, performance, usage, reports), menu reorganized, large style/theme additions.
Backend analyzers & services
src/main/java/com/rcs/ssf/metrics/*, src/test/java/com/rcs/ssf/metrics/IndexEffectivenessAnalyzerTest.java
Added IndexEffectivenessAnalyzer (scheduled), SlowQueryAnalyzer (+ ExecutionPlan), RecommendationPersistenceService, QueryNormalizationUtils, unit tests, metrics export and persistence into audit tables.
Tracing & latency
src/main/java/com/rcs/ssf/tracing/*, src/main/java/com/rcs/ssf/SsfApplication.java, src/main/resources/application*.yml
Added LatencyAnalyzer, new OtelConfig beans, enabled @EnableScheduling, CORS/MinIO/observability config updates and slow-query settings.
Dynamic CRUD & service changes
src/main/java/com/rcs/ssf/dynamic/DynamicCrudGateway.java, src/main/java/com/rcs/ssf/service/DynamicCrudService.java
Use toPlsqlLiteral() for operations, expand ALLOWED_TABLES, refactor getColumnMetadata to batch-fetch data_default to avoid LONG issues.
Build & tooling
build.gradle, README.md, \.dockerignore
Modernized Gradle (Java 21 toolchain, plugins, BOMs), Flyway env validation, added dependencies, README docs for build-time args and examples.
Database migrations & PL/SQL
db/migration/*, sql/packages/*
New migrations (V601/V601-related audit tables), V700 PL/SQL recompile with RETURNING INTO support, V017 NOT NULL/FK change, spec constants updated, dynamic CRUD package spec/body changes.
Schema, seeds & orchestration
sql/tables/*, sql/indexes/users_indexes.sql, sql/seed/*, sql/grants/data_dictionary_privileges.sql, sql/master.sql, setup-oracle-schema.sh, sql/README.md
New tables (persisted_queries, audits, user_preferences, api_keys, account_deactivation_audit), extended users (avatar_key, account_status, account_deactivated_at) with data-correction block, seeds, grants, expanded master orchestration and setup script.
PL/SQL package body changes
sql/packages/dynamic_crud_pkg_spec.sql, sql/packages/dynamic_crud_pkg_body.sql
Normalize operation constants (CREATEINSERT, READSELECT), add PK detection, support RETURNING INTO via DBMS_SQL for ID retrieval, refactor insert/update/delete flows.
Logging, ELK & Logstash
logstash/*, src/main/resources/logback-spring.xml, monitoring/elasticsearch/*, kibana/*
Add Logstash pipelines, certs, Dockerfile, Logstash appender and MDC fields in logback, ILM policy, templates, Kibana dashboards and setup scripts.
Docs cleanup & removals
README.md, ROADMAP_OBSERVABILITY.md, removed docs: JAVA_21_UPGRADE.md, CODERABBIT_*, PROJECT_PLAN.md, EXPLAIN_PLAN_ANALYSIS.sql, OPTIMIZED_QUERY.sql, etc.
README expanded for Docker args and examples; observability roadmap rewritten; many legacy/internal docs removed.
Minor backend edits & tests
src/main/java/.../BatchMetricsRecorder.java, JwtAuthenticationFilter.java, TraceIdFilter.java, AuthMutation.java
BatchMetrics API changed (snapshot-based, recordSuccess signature updated), JWT filter skip for OPTIONS, MDC/tracing additions, minor formatting/log edits.

Sequence Diagram(s)

sequenceDiagram
    participant Scheduler
    participant Analyzer as IndexEffectivenessAnalyzer
    participant DB as OracleDB
    participant Audit as audit_index_analysis
    participant Metrics as Prometheus

    Scheduler->>Analyzer: analyzeIndexEffectiveness()
    Analyzer->>DB: query index usage (V$SEGMENT_STATISTICS / V$OBJECT_USAGE)
    DB-->>Analyzer: usage rows
    Analyzer->>Analyzer: calculateUtilityScore() / determineAction()
    Analyzer->>Audit: INSERT analysis row
    Audit-->>Analyzer: persisted
    Analyzer->>Metrics: export gauges/counters
Loading
sequenceDiagram
    participant App as SpringApp
    participant QAnalyzer as QueryPlanAnalyzer
    participant Slow as SlowQueryAnalyzer
    participant DB as OracleDB
    participant Persist as RecommendationPersistenceService
    participant Audit as audit_query_recommendations
    participant Metrics as Prometheus

    App->>QAnalyzer: processQuery(sql)
    QAnalyzer->>Slow: analyzeQuery(sql, execTime)
    alt execTime > threshold
        Slow->>DB: extract execution plan (V$SQL_PLAN / v$sql)
        DB-->>Slow: plan data
        Slow->>Slow: analyzePlanForBottlenecks()
        alt recommendations enabled
            Slow->>Persist: persistRecommendation(...)
            Persist->>Audit: INSERT recommendation
            Audit-->>Persist: persisted
            Slow->>Metrics: increment recommendation counters
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–90 minutes

Areas needing extra attention:

  • IndexEffectivenessAnalyzer, SlowQueryAnalyzer, RecommendationPersistenceService (SQL correctness, resource/metric safety).
  • PL/SQL package/spec changes and new DB migrations/triggers (RETURNING INTO, cross-field constraints).
  • BatchMetricsRecorder API change — verify all callers updated.
  • Dockerfile/start script/nginx permissions and certificate generation.
  • Frontend Analytics module wiring (lazy-loading vs module/component providers) and large style additions.

Possibly related PRs

Poem

🐇 I hopped through logs by lantern-light,
I chased slow plans into the night,
I seeded tables, made certs bright,
I routed SPA paths and charts in sight,
A little rabbit cheers each traced insight!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 61.29% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Docker build and deployment improvements' clearly summarizes the main objective of the changeset, which focuses on Docker build process enhancements and deployment workflow refactoring.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
sql/tables/users.sql (1)

5-17: Add CHECK constraints to V500 migration to prevent data integrity divergence

Verification confirms that a Flyway migration does exist (V500__user_settings_profile.sql) that adds the new columns (avatar_key, account_status, account_deactivated_at) to existing databases. However, the migration is incomplete: it omits the two CHECK constraints that enforce account status validity and deactivation logic.

Result: New databases (via sql/tables/users.sql) will have constraints; existing databases updated via V500 will not. This creates a critical data integrity gap.

Add the following to db/migration/V500__user_settings_profile.sql after the column additions:

BEGIN
    EXECUTE IMMEDIATE 'ALTER TABLE users ADD CONSTRAINT chk_account_status_values CHECK (account_status IN (''ACTIVE'', ''INACTIVE'', ''DEACTIVATED''))';
EXCEPTION
    WHEN OTHERS THEN
        IF SQLCODE = -2261 THEN
            NULL; -- Constraint already exists
        ELSE
            RAISE;
        END IF;
END;
/

BEGIN
    EXECUTE IMMEDIATE 'ALTER TABLE users ADD CONSTRAINT chk_deactivated_timestamp CHECK ((account_status = ''DEACTIVATED'' AND account_deactivated_at IS NOT NULL) OR (account_status IN (''ACTIVE'', ''INACTIVE'') AND account_deactivated_at IS NULL))';
EXCEPTION
    WHEN OTHERS THEN
        IF SQLCODE = -2261 THEN
            NULL; -- Constraint already exists
        ELSE
            RAISE;
        END IF;
END;
/
🟡 Minor comments (12)
build.gradle-62-68 (1)

62-68: I need to verify the compatibility between these versions and clarify the actual reactor-test version managed by Spring Boot 3.5.7 BOM.

Add BOM consistency comment for reactor-test.

Spring Boot 3.5.7 manages reactor-test through the Project Reactor BOM 2025.0.0, which also manages reactor-netty. Since Reactor Netty is part of the Project Reactor BOM, grouping artifacts meant to work well together with relevant versions, there are no version conflicts.

For consistency with the existing reactor-netty comment, add a similar note to line 127 documenting that reactor-test is BOM-managed:

implementation 'io.projectreactor:reactor-test' // BOM-managed via reactor-bom 2025.0.0
src/main/java/com/rcs/ssf/tracing/LatencyAnalyzer.java-213-228 (1)

213-228: Update Javadoc or implement missing layer handlers for "database" and "cache"

The mismatch between documented examples and actual implementation is confirmed. The Javadoc mentions "database" and "cache" as valid layer names, but the switch statement only handles http, graphql, service, dao, and jdbc. Passing "database" or "cache" would silently discard measurements.

Either align the Javadoc to document only the supported layers, or add proper mappings (e.g., "database"dao/jdbc, "cache" → dedicated field) and corresponding fields to LayerLatencyStats.

src/main/java/com/rcs/ssf/metrics/BatchMetricsRecorder.java-67-68 (1)

67-68: Fix SLF4J placeholder usage so debug logs actually show throughput and memory usage

The debug messages currently use {:.2f} / {:.1f} inside {} which SLF4J does not interpret as placeholders, so the numeric arguments are effectively ignored or mis-bound:

  • Line 67: "throughput={:.2f}/sec" ignores the throughput argument; the literal {:.2f} is logged.
  • Line 94: same issue as above.
  • Line 144: "Recorded memory pressure for operation '{}': {:.1f}%" logs the literal {:.1f} and drops usagePercent.

Consider simplifying to plain {} placeholders so the values are actually logged:

-            log.debug("Recorded batch metrics for operation '{}': processed={}, failed={}, throughput={:.2f}/sec",
-                operationName, processedItems, failedItems, throughput);
+            log.debug("Recorded batch metrics for operation '{}': processed={}, failed={}, throughput={}/sec",
+                operationName, processedItems, failedItems, throughput);

-            log.debug("Recorded individual operation metrics for '{}': processed={}, failed={}, throughput={:.2f}/sec",
-                operationName, processedItems, failedItems, throughput);
+            log.debug("Recorded individual operation metrics for '{}': processed={}, failed={}, throughput={}/sec",
+                operationName, processedItems, failedItems, throughput);

-            log.debug("Recorded memory pressure for operation '{}': {:.1f}%", operationName, usagePercent);
+            log.debug("Recorded memory pressure for operation '{}': {}%", operationName, usagePercent);

If you still want fixed decimal precision, you can wrap the numeric arguments with String.format, but the main fix is to revert to valid {} placeholders.

Also applies to: 94-95, 144-145

frontend/src/app/features/analytics/services/analytics.service.ts-74-81 (1)

74-81: Use environment-based config for API base URL, consistent with rest of codebase

baseUrl is hard-coded to https://localhost:8443, which breaks in other environments. The codebase already defines apiUrl in frontend/src/environments/environment.prod.ts (supporting runtime override via API_URL env var), and other services like DynamicFormService already follow this pattern via environment.apiUrl. Align AnalyticsService with the existing approach by injecting or importing environment and using environment.apiUrl instead of a hard-coded string.

frontend/src/app/features/analytics/services/analytics.service.ts-85-89 (1)

85-89: Update the cache TTL implementation and remove redundant error handling

The review is accurate. shareReplay(1) at lines 85-89 (and also at 102-105) caches indefinitely without a windowTime parameter, contradicting the "30 seconds" comment. Additionally, the catchError at lines 88 duplicates the error handling already present in fetchDashboard() at line 193, making it redundant.

Fix by either:

  1. Updating the comment to reflect indefinite caching, or
  2. Implementing the intended 30-second TTL: shareReplay({ bufferSize: 1, windowTime: 30000, refCount: true }) and removing the outer catchError since fetchDashboard() already handles errors.
frontend/src/app/features/analytics/pages/performance.component.ts-149-156 (1)

149-156: Narrow status parameter type for stronger type-safety

PerformanceMetric.status is 'healthy' | 'warning' | 'critical', but getStatusColor(status: string) accepts any string. Tighten this to the union type to catch typos at compile time:

-  getStatusColor(status: string): string {
+  getStatusColor(status: PerformanceMetric['status']): string {

This keeps the implementation the same but makes template usage safer.

docker-start.sh-11-56 (1)

11-56: Fix exit code capture in cleanup function.

The static analysis warning is valid. Line 12 captures $? after the function has already started executing, which means it captures the exit code of the previous command in the function (likely the echo on line 14), not the exit code of the process that triggered cleanup.

The exit code should be captured at the point where cleanup is called, not inside the cleanup function itself. However, since this is used in a trap, the current approach won't work correctly.

Recommended fix:
Remove the exit code capture from cleanup and rely on the exit code passed to cleanup when called explicitly:

 cleanup() {
-    local exit_code=$?
-    
     echo "Received termination signal, cleaning up..."
     
     # ... shutdown logic ...
     
     echo "Cleanup complete, exiting with code $exit_code"
-    exit $exit_code
+    exit ${1:-0}
 }

Then call cleanup with explicit exit codes:

-    cleanup
+    cleanup $JAVA_EXIT_CODE

For trap-based invocations, the exit code will default to 0, which is appropriate for graceful shutdown signals.

As per static analysis hints.

Committable suggestion skipped: line range outside the PR's diff.

src/main/resources/application.yml-126-126 (1)

126-126: Update README.md and HELP.md to align with MinIO URL default change.

The application now defaults to http://minio-server:9000, but README.md and HELP.md still document http://localhost:9000 in their environment setup examples. This creates contradictory guidance for developers.

Update both files:

  • README.md: Change export MINIO_URL=http://localhost:9000 to export MINIO_URL=http://minio-server:9000 (in Quick Start section)
  • HELP.md: Change export MINIO_URL=http://localhost:9000 to export MINIO_URL=http://minio-server:9000 (in Environment Checklist and environment template sections)

Alternatively, if localhost development is still expected, revert the application.yml default to http://localhost:9000 and update docker-compose to override via environment variable. Either way, documentation must match the code's actual default behavior.

ROADMAP_OBSERVABILITY.md-178-239 (1)

178-239: Add language specifiers to fenced code blocks.

Several fenced code blocks are missing language specifiers, which reduces readability in some Markdown renderers and affects accessibility.

As per coding guidelines from markdownlint, apply these changes:

For ASCII diagrams (line 178):

-    ```
+    ```text
     ┌──────────────────────────────────────────────────────────────┐

For pseudocode blocks (lines 245, 291, 413):

-        ```
+        ```text
         1. Query Oracle DBA_INDEX_USAGE for each user index:

Or if the pseudocode represents SQL or a specific language, use that language identifier instead.

Also applies to: 245-347, 291-357, 413-437

sql/master.sql-92-109 (1)

92-109: Handle potential LISTAGG overflow in validation block.

The validation block uses LISTAGG to concatenate invalid object names (line 96), but the target variable is limited to 4000 bytes. If many objects are invalid, this will raise ORA-01489.

Apply this diff to handle overflow gracefully:

 DECLARE
     v_invalid_count NUMBER := 0;
-    v_invalid_list VARCHAR2(4000) := '';
+    v_invalid_list CLOB;
 BEGIN
-    SELECT COUNT(*), LISTAGG(object_type || ' ' || object_name, ', ') 
+    SELECT COUNT(*), LISTAGG(object_type || ' ' || object_name, ', ') WITHIN GROUP (ORDER BY object_type, object_name)
     INTO v_invalid_count, v_invalid_list
     FROM user_objects
     WHERE status = 'INVALID';
     
     IF v_invalid_count > 0 THEN
         DBMS_OUTPUT.PUT_LINE('WARNING: Found ' || v_invalid_count || ' invalid objects:');
-        DBMS_OUTPUT.PUT_LINE(v_invalid_list);
+        DBMS_OUTPUT.PUT_LINE(SUBSTR(v_invalid_list, 1, 32000));
         RAISE_APPLICATION_ERROR(-20999, 'Schema setup failed due to invalid objects');
     ELSE
         DBMS_OUTPUT.PUT_LINE('SUCCESS: All objects compiled successfully!');
     END IF;
 END;

Alternatively, use a cursor to print each invalid object separately.

src/main/java/com/rcs/ssf/metrics/SlowQueryAnalyzer.java-155-203 (1)

155-203: Inconsistent null vs. empty return values.

Line 158 returns null when sqlId is null, but line 202 returns an empty ExecutionPlan on exceptions. This inconsistency requires callers to handle both null checks and empty object checks.

Consider standardizing on one approach:

 private ExecutionPlan extractExecutionPlan(String sql) {
     String sqlId = getSqlId(sql);
     if (sqlId == null) {
-        return null;
+        return new ExecutionPlan("", null, null);
     }
src/main/java/com/rcs/ssf/metrics/SlowQueryAnalyzer.java-425-437 (1)

425-437: Generic hint uses placeholder table names.

Line 431 returns a hint with generic placeholders "table1, table2" that won't work directly. Users must manually identify and substitute the correct table names.

Consider either:

  1. Extracting actual table names from the query/plan
  2. Adding a note in the rationale that table names need to be identified:
 result.put("rationale",
     "Nested loop join detected with potentially large inner table. " +
-            "Consider HASH JOIN hint or ensure join columns are indexed.");
+            "Consider HASH JOIN hint (substitute actual table names) or ensure join columns are indexed.");
🧹 Nitpick comments (35)
src/main/java/com/rcs/ssf/tracing/LatencyAnalyzer.java (3)

12-54: Docs mention behaviors/metrics that aren't implemented or are only stubbed

The class-level Javadoc promises:

  • span collection from OpenTelemetry,
  • tracing.latency_attribution metrics,
  • real attribution-based analysis,

while the implementation currently:

  • only pre-registers tracing.latency_by_layer timers for a fixed resolver list,
  • never updates those timers or emits a tracing.latency_attribution metric,
  • relies solely on latencyStats populated externally.

This is fine as a scaffold, but the docs should either be tightened to describe the current behavior, or the missing pieces (span collection + attribution metric emission) implemented, to avoid confusion for future maintainers and operators.

Also applies to: 95-113


217-240: Eviction strategy is arbitrary; consider making intent explicit

Two minor points in recordLayerLatency:

  1. Per-resolver aggregation:

    • Each call overwrites the layer latency (stats.httpLatencyMs = latencyMs) rather than accumulating.
    • Combined with stats.totalLatencyMs = ..., what you store is effectively the latest observation per resolver, not a windowed aggregate.
    • If that’s intentional (e.g., “last seen latency per resolver”), a short comment would help clarify and avoid future “why not sum?” confusion.
  2. Eviction policy:

    • if (latencyStats.size() > MAX_TRACKED_SPANS) then latencyStats.remove(latencyStats.keySet().iterator().next()) removes an arbitrary resolver (CHM is not ordered).
    • This is OK as a simple cap, but it’s not truly “oldest”. If you care about which entries are dropped (e.g., least recently updated), consider tracking timestamps or using an LRU-like strategy.

Neither is a blocker, but documenting the intended semantics here will make the behavior much clearer to future readers.


71-93: Scheduled analysis logic is reasonable, but consider one-time metric registration

The scheduled analyzeLatencyByLayer + detectAndAlertBottlenecks flow is coherent, and the bottleneck alert counter looks good.

One small improvement: initializeMetricsForLayers() is invoked on every schedule tick, but in Micrometer registering the same Timer (same name+tags) is idempotent; you only need to do this once (e.g., constructor or @PostConstruct). Rebuilding them every 5 minutes just adds overhead.

If you keep this as a stub until you hook in actual span collection, a short comment noting that could prevent someone from trying to optimize the wrong part later.

Also applies to: 170-201

src/main/java/com/rcs/ssf/metrics/BatchMetricsRecorder.java (1)

90-93: Good that individual operations now feed the same aggregated metrics; consider symmetry for throughput gauges

Routing individual-operation stats through the same BatchMetrics instance as batch completions ensures totals, error rate, and averages reflect all traffic, not just batched runs. If you want observability parity with recordBatchCompletion, you might also expose a dedicated throughput gauge for this path (e.g., a separate metric name or tag for “individual” vs “batch”) to keep the external metrics story symmetrical.

frontend/src/app/features/analytics/analytics.component.ts (1)

35-40: Remove redundant manual navigation; route config already handles the redirect

The route configuration correctly handles /analytics/analytics/dashboard via the child empty route with pathMatch: 'full' and redirectTo: 'dashboard'. The ngOnInit manual navigation at lines 35-40 is redundant and can be safely removed. This simplifies the component and eliminates the risk of duplicate navigation events.

frontend/src/app/features/analytics/pages/dashboard.component.ts (1)

286-299: Error handling pattern misalignment between service and component

The error handler in fetchDashboardData's tap({ error }) won't execute because AnalyticsService.getDashboard() and refreshDashboard() both use catchError(() => of(getDefaultDashboard())) at the service level, converting errors into successful responses.

If you want the UI to surface real HTTP failures, consider either removing the catchError from the service (and handling defaults in the component), or exposing an explicit error signal from the service instead of silently returning defaults. This aligns error handling with Angular best practices: centralize transport-level concerns in services, keep UI-specific error reactions in components.

Regarding the echarts [options] with null: ngx-echarts is designed to accept null for its [options] input and handles it safely without rendering when null, so no guard is needed.

frontend/src/app/features/analytics/pages/reports.component.ts (4)

202-217: Consider using finalize to centralize isLoading reset

loadReports correctly guards the subscription with takeUntil, but isLoading is manually reset in both success and error handlers. Using RxJS finalize would reduce duplication and make it harder to forget a branch if this grows:

this.analyticsService.getReports()
  .pipe(
    takeUntil(this.destroy$),
    finalize(() => { this.isLoading = false; })
  )
  .subscribe({
    next: (reports) => { this.reports = reports; },
    error: (err) => {
      console.error('Failed to load reports:', err);
      this.message.error(`Failed to load reports${err?.message ? ': ' + err.message : '. Please try again.'}`);
    }
  });

219-247: mimeType is computed but unused; either wire it in or drop it

In downloadReport, mimeType is computed via getMimeType but never used. Either:

  • Remove the variable and helper if the server-provided content-type is sufficient, or
  • Use it when constructing a Blob (if you ever wrap the response yourself) or for setting headers/analytics so the mapping actually has an effect.

Also, getFileExtension just lowercases the format; if only a fixed set of formats is expected (csv, json, pdf), you could validate/normalize against that enum to avoid surprising filenames from unexpected values.

Also applies to: 249-261


275-291: Revisit export methods’ sync/async behavior and isExporting semantics

generateDashboardReport, generatePerformanceReport, and exportAllAsCSV wrap service calls in try/catch/finally and toggle isExporting, but this only handles synchronous failures. If analyticsService.exportDashboardAsCSV() / exportPerformanceMetricsAsCSV() are (or later become) HttpClient-based observables or promises, errors won’t be caught here and isExporting will flip back to false immediately, not when the work finishes.

Consider:

  • Having these service methods return an Observable<void>/Observable<Blob> and handling them like your other async flows (pipe(finalize(...)).subscribe({...})), or
  • Making them clearly synchronous “local-only” helpers and possibly dropping isExporting around them if it doesn’t reflect a real in-flight operation.

This will keep the UI state and error handling aligned with the actual behavior.

Also applies to: 293-309, 357-373


311-340: Tighten types for usage CSV builder and consider basic CSV escaping

generateUsageReport + buildUsageMetricsCSV work, but a couple of small improvements would help:

  • buildUsageMetricsCSV(metrics: any[]) can use the existing UsageMetric type for stronger guarantees:
private buildUsageMetricsCSV(metrics: UsageMetric[]): string {
  // ...
}
  • You already quote featureName, but any embedded quotes (") or newlines would still break the CSV. If these names can come from user-defined strings, consider minimal escaping (e.g., double quotes inside values).

Everything else in this flow (use of takeUntil, Blob creation, and download helper reuse) looks good.

Also applies to: 342-355

frontend/src/app/features/analytics/pages/usage.component.ts (3)

38-67: Verify percentage scale for nz-progress and consider showing the numeric value

[nzPercent]="metric.percentage" assumes percentage is already 0–100 as expected by nz-progress. If the service ever returns a 0–1 fraction, the bar will appear almost empty. It might be worth asserting/normalizing in the service or here.

Optionally, you could also render the numeric value (e.g. {{ metric.percentage | number:'1.0-2' }}%) alongside the bar for more precise visibility.


71-77: Guard ECharts rendering when usageChartOptions is null/empty

updateChartOptions explicitly sets usageChartOptions to null when there are no metrics, but the template always renders the <div echarts [options]="usageChartOptions">. Depending on ngx-echarts/ECharts behavior, passing null here may be tolerated or could throw at runtime.

To be safe and avoid mounting a chart with no data, consider:

<div class="chart-container" *ngIf="usageChartOptions">
  <div echarts [options]="usageChartOptions" class="chart"></div>
</div>

That keeps the chart out of the DOM when there’s nothing meaningful to show and avoids any surprises from null options.

Also applies to: 130-134, 172-204


140-164: Refresh pipeline with switchMap/takeUntil looks good; consider user feedback on errors

The refresh$ + switchMap + takeUntil(destroy$) setup is clean and avoids race conditions between rapid refreshes. One minor gap is that load failures only log to the console; unlike the reports component, the user gets no on-screen indication.

If you want parity with the reports page, you could inject NzMessageService here as well and surface a simple “Failed to load usage metrics” toast in the error handler.

Also applies to: 166-170

.dockerignore (1)

1-31: .dockerignore patterns look good; be aware of excluded docs and compose files

The ignore list is sane for keeping the build context lean (build artifacts, node_modules, logs, env files, IDE configs). Just note that docs/, *.md, and docker-compose.yml will never be available in the build context, so if you ever want to COPY documentation or compose samples into the image, you’ll need to relax these patterns.

src/main/java/com/rcs/ssf/service/DynamicCrudService.java (2)

47-72: Review expanded ALLOWED_TABLES whitelist for security and coverage

Expanding ALLOWED_TABLES to include users, roles, MFA, API key, and multiple audit tables significantly broadens what the dynamic CRUD UI can introspect/mutate. That’s fine as long as it’s intentional, but it’s worth double‑checking:

  • Confirm that all of these tables are expected to be operator‑visible via dynamic CRUD, especially users, api_keys, and MFA tables, even though sensitive columns are name‑filtered.
  • The SQL docs mention an audit_http_compression table; if you intend dynamic CRUD coverage for all audit tables, consider adding it here as well.

For example:

-            "audit_graphql_execution_plans",
-            "audit_circuit_breaker_events",
+            "audit_graphql_execution_plans",
+            "audit_circuit_breaker_events",
+            "audit_http_compression",

387-495: data_default batching logic looks correct; only minor refactor opportunities

The split between the main metadata query (no data_default) and a separate user_tab_columns batch to populate defaults via safeReadColumnDefault is sound and should avoid the LONG/ORA‑00932 issues while eliminating the N+1 pattern. The defensive catch on DataAccessException with a fallback to null defaults is also reasonable.

If you want to tighten this further (optional):

  • Make defaultsByColumn unmodifiable (Map.copyOf(defaultsMap)) before using it in the stream, to prevent accidental mutation later.
  • Consider normalizing the map key (e.g., col.getName().toUpperCase(Locale.ROOT)) on both insertion and lookup if you ever introduce quoted/mixed‑case column names, to avoid silent misses.

Functionally, though, this implementation is fine as‑is.

sql/README.md (1)

44-123: Minor Markdown and wording nits (table spacing, code fence language, hyphenation)

To satisfy markdownlint and improve readability, you can make a few small tweaks:

  1. Add a blank line before the migrations table:
-### ✅ ALL Flyway Migrations Unified
-| Migration | File | Status |
+### ✅ ALL Flyway Migrations Unified
+
+| Migration | File | Status |
  1. Specify a language for the execution‑order fenced block (it’s mostly plain text):
-```
+```text
 Step 1: Sequences          (audit_seq, user_id_seq)
 ...
 Step 9: Summary            (print object counts and confirmation)

3. Hyphenate “performance‑critical”:

```diff
-**Indexes (30+):**
-- All performance critical indexes included
+**Indexes (30+):**
+- All performance-critical indexes included

These are cosmetic only and won’t affect behavior.

docker-compose.yml (1)

183-185: Port exposure matches nginx+SSL flow; consider whether 8443 needs to stay public

Exposing 80/443 alongside 8443 makes sense with nginx terminating HTTP/HTTPS in the container, and the existing healthcheck still targeting 8443. If 8443 is now only an internal app port behind nginx, you might optionally drop the host binding for 8443 to reduce the exposed surface, keeping just 80/443 for normal access.

sql/seed/default_roles.sql (1)

15-35: Use a NULL‑safe comparison in the MERGE UPDATE to backfill missing descriptions

In the MERGE, the UPDATE branch currently runs only when r.description != src.description. If an existing role has description IS NULL, that predicate is UNKNOWN and the row won’t be updated, so the seeded description won’t be backfilled.

If you want to ensure descriptions are populated or refreshed whenever they differ (including NULL → non‑NULL), consider a NULL‑safe condition, e.g.:

 WHEN MATCHED THEN
     UPDATE SET
         description = src.description,
         updated_at = SYSTIMESTAMP
-    WHERE r.description != src.description;
+    WHERE (r.description IS NULL AND src.description IS NOT NULL)
+       OR (r.description IS NOT NULL AND r.description != src.description);

The rest of the script (idempotent role creation and guarded ROLE_ADMIN grant) looks solid.

nginx.conf (1)

135-158: Consider extracting security headers to reduce duplication.

Security headers are repeated in three location blocks (server level, static assets, and SPA routing). While this is necessary because add_header in a child block doesn't inherit from parent blocks, it creates maintenance overhead.

Consider using an include file or nginx map to centralize security headers:

Create /etc/nginx/snippets/security-headers.conf:

add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;

Then include it in each location block:

 location ~ ^/(assets|styles)/ {
     root /var/www/html/browser;
     expires 30d;
     add_header Cache-Control "public, immutable";
-    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
-    add_header X-Content-Type-Options "nosniff" always;
-    add_header X-Frame-Options "SAMEORIGIN" always;
-    add_header X-XSS-Protection "1; mode=block" always;
-    add_header Referrer-Policy "strict-origin-when-cross-origin" always;
+    include /etc/nginx/snippets/security-headers.conf;
 }

Note: This requires updating the Dockerfile to copy the snippet file.

Dockerfile (1)

40-55: Consider documenting certificate rotation strategy.

The self-signed certificate has a 365-day validity. For long-running containers, this could cause expiration issues. The warning comments correctly note this is for local/dev only, but documentation on certificate rotation would be helpful.

Add a comment block or reference to documentation:

 # Generate self-signed SSL certificates for nginx with configurable CN
+# NOTE: Certificate expires in 365 days. For production:
+#   1. Mount CA-issued certificates via volume: -v /path/to/certs:/etc/nginx/certs
+#   2. Or use cert-manager/Let's Encrypt for automatic rotation
+# For long-running dev containers, recreate container annually or mount certs.
 RUN mkdir -p /etc/nginx/certs && \
     openssl req -x509 -newkey rsa:2048 -keyout /etc/nginx/certs/private-key.pem -out /etc/nginx/certs/certificate.pem -days 365 -nodes \
     -subj "/CN=${CERT_CN}/O=Organization/C=US"
sql/tables/05_persisted_queries.sql (1)

19-35: Consider adding CHECK constraint for status field.

The status column (line 30) has a default value of 'ACTIVE' but no CHECK constraint to enforce valid values. This could allow invalid status values to be inserted.

Apply this diff to add a CHECK constraint:

     usage_count NUMBER(19) DEFAULT 0 NOT NULL,
     client_name VARCHAR2(255),
-    status VARCHAR2(20) DEFAULT 'ACTIVE' NOT NULL,
+    status VARCHAR2(20) DEFAULT 'ACTIVE' NOT NULL CHECK (status IN ('ACTIVE', 'DEPRECATED', 'DISABLED')),
     created_by VARCHAR2(255),
     updated_by VARCHAR2(255)

Adjust the valid status values based on your application's requirements.

sql/packages/dynamic_crud_pkg_body.sql (2)

239-265: Primary-key helper only supports single-column PKs

get_primary_key_column intentionally returns only the first PK column; composite primary keys will not be handled correctly by the ID‑returning logic that depends on this helper. If you ever allow dynamic CRUD over tables with composite PKs, you’ll need either to:

  • Extend this helper to return all PK columns, and
  • Adjust the INSERT/RETURNING path to bind multiple columns.

Otherwise, it would be good to document clearly (in spec/comments) that dynamic_crud_pkg only supports single-column PKs on the allowed tables.


323-327: INSERT/ID generation path looks correct but treats IDENTITY differently than SEQUENCE

The new INSERT branch correctly:

  • Builds a quoted column/value list with bind variables.
  • Uses user_tab_identity_cols to detect identity columns.
  • Uses RETURNING "<pk_column>" INTO :ret_id with DBMS_SQL when there is no identity column and a PK exists, and falls back to a plain INSERT otherwise.

Two things to consider tightening up:

  1. Comment vs logic – The comment “Check if table has a SEQUENCE-based PK (not IDENTITY)” is misleading: the code checks for an IDENTITY column and treats the “no identity” case as “possibly sequence-based”. You may want to reword the comment to avoid confusion.

  2. Capturing IDs for IDENTITY PKs – For IDENTITY columns you currently just run a normal INSERT and never populate p_generated_id. Oracle also supports RETURNING identity_column INTO ..., so if the caller expects IDs for both SEQUENCE‑based and IDENTITY‑based PKs, you could simplify by using the same RETURNING path for both and dropping the identity check altogether.

If the current behavior (no generated ID for identity PKs) is intentional and all callers know this, keeping it is fine, but it is worth making explicit.

Also applies to: 373-430

db/migration/V601__index_effectiveness_audit_tables.sql (1)

24-51: Optional: constrain improvement percentage and rationale length if needed

The audit_query_recommendations table looks solid, with good completeness checks on approved/implemented. If you want to prevent out‑of‑range values later, consider:

  • Adding CHECK (estimated_improvement_percent BETWEEN 0 AND 100) if the value is meant to be a percentage.
  • Verifying that 2000 chars is enough for rationale given your expected usage; if not, switching to CLOB keeps future changes easier.

Not urgent, but worth deciding up front to avoid later migrations.

sql/tables/06_user_settings.sql (2)

19-55: Consider tightening boolean flags and time representation

The three new tables look coherent overall, but a few minor schema hardening points:

  • The notification_* columns and API key lifecycle flags (revoked_at, etc.) are modeled as NUMBER(1) / NUMBER(19) without checks. If these are truly boolean flags (0/1), consider CHECK (notification_emails IN (0,1))‑style constraints to prevent accidental non‑boolean values.

  • You’re storing created_at / updated_at as Unix epoch millis (NUMBER(19)) here, while users.created_at/updated_at are TIMESTAMP. That’s fine if intentional, but it’s worth confirming that your Java/JPA layer expects and correctly converts these numeric timestamps, and that mixed conventions won’t cause confusion.

These are non‑blocking, but tightening them now will make downstream logic and reporting simpler.


62-77: Using timestamp as a column name

account_deactivation_audit.timestamp is a valid column name, but it does shadow a common SQL type/keyword and may make queries slightly harder to read (e.g., WHERE timestamp > ...).

If you’re still flexible on naming, something like occurred_at or event_at tends to be clearer. If this table is already referenced in code, keeping it is fine; just be consistent in always qualifying with the table name to avoid ambiguity.

src/test/java/com/rcs/ssf/metrics/IndexEffectivenessAnalyzerTest.java (1)

34-81: Tests are not exercising IndexEffectivenessAnalyzer behavior

The current tests (testUtilityScoreCalculation, testActionDetermination, etc.) only perform inline arithmetic and string comparisons; they never invoke any methods on IndexEffectivenessAnalyzer. This gives no real coverage of the production code.

If you want meaningful tests here, consider:

  • Extracting calculateUtilityScore, determineAction, and generateRationale to package‑private methods (or a small helper class) so they can be invoked directly, or
  • Adding an integration‑style test that stubs JdbcTemplate.queryForList(...) and then calls analyzeIndexEffectiveness() to verify that:
    • At least one row is processed.
    • persistAnalysis is called (via verifying an insert or inspecting audit_index_analysis in a test DB).
    • Gauges/counters in MeterRegistry are updated as expected.

As it stands, these tests can be removed or refactored; they don’t protect the analyzer’s logic.

db/migration/V700__recompile_plsql_packages_fix_id_generation.sql (1)

389-417: Unused local variable v_has_identity

Inside the CREATE/INSERT branch’s nested DECLARE, v_has_identity is declared but never used. It’s harmless but can be removed to reduce noise:

DECLARE
-    v_has_identity NUMBER := 0;
     v_generated_id_value NUMBER;
     v_cursor INTEGER;
BEGIN
    ...
END;
sql/tables/users.sql (1)

30-203: Data consistency correction block is robust but quite heavy for a DDL script

The correction block does a thorough job:

  • Counts inconsistent rows,
  • Logs sample IDs via DBMS_OUTPUT,
  • Fixes both “DEACTIVATED with NULL timestamp” and “ACTIVE/INACTIVE with non‑NULL timestamp” cases,
  • Handles constraint and permission errors explicitly.

Two minor considerations:

  • The pre‑flight privilege check just verifies that the queries against user_sys_privs / user_tab_privs succeed; it doesn’t actually assert that you have UPDATE privileges (zero rows still logs “Update privileges verified”). That’s mostly cosmetic, since the UPDATEs themselves will fail with ORA‑1031 if privileges are missing.

  • Running this block in regular app migrations may be overkill; if it’s intended strictly for DBA‑run initialization scripts, a comment in the header clarifying that (and ensuring it’s not wired into Flyway) would help avoid surprises.

Functionally it looks sound; these are just maintainability clarifications.

src/main/java/com/rcs/ssf/metrics/IndexEffectivenessAnalyzer.java (1)

60-82: Index stats collection logic is robust; verify V$ view access assumptions

getIndexUsageStats() is well structured: it tries V$SEGMENT_STATISTICS, then V$OBJECT_USAGE, then falls back to USER_INDEXES with zero usage counts, all wrapped in defensive exception handling.

Two small points to be aware of:

  • Access to V$SEGMENT_STATISTICS and V$OBJECT_USAGE typically requires additional dictionary privileges. In many app schemas these queries will fail with ORA‑1031, and you’ll always hit the fallback. That’s fine given your logging, but you may want to document the privilege requirements (and test the fallback behavior) so operators know what to expect.

  • You hard‑code an 8KB block size (COALESCE(i.leaf_blocks, 1) * 8192) to estimate index size. If your environments use non‑8KB blocks, consider deriving block size from a parameter/table (v$parameter or tablespace metadata) or at least documenting the assumption.

Functionally, the current approach is safe; these are mostly operational gotchas.

Also applies to: 98-181

src/main/java/com/rcs/ssf/metrics/SlowQueryAnalyzer.java (4)

234-246: Consider extracting LIKE wildcard escaping to a utility method.

Lines 236-238 manually escape LIKE wildcards in a specific order. This logic is fragile and duplicated whenever needed.

Extract to a reusable utility method:

private String escapeLikePattern(String input) {
    return input
        .replace("\\", "\\\\")  // Escape backslash first
        .replace("%", "\\%")    // Escape percent
        .replace("_", "\\_");   // Escape underscore
}

Then use:

-String escapedSql = searchSql
-    .replace("\\", "\\\\")
-    .replace("%", "\\%")
-    .replace("_", "\\_");
+String escapedSql = escapeLikePattern(searchSql);

298-343: Consider refactoring verbose validation logic.

The 45 lines of manual field extraction and type checking (lines 298-343) could be simplified for better maintainability.

Consider introducing a helper method or validation utility:

private static class RecommendationExtractor {
    static RecommendationFields extract(Map<String, Object> recommendation) {
        String type = getStringOrNull(recommendation, "type");
        String confidence = getStringOrNull(recommendation, "confidence");
        String rationale = getStringOrNull(recommendation, "rationale");
        
        if (type == null || confidence == null || rationale == null) {
            return null; // Invalid
        }
        
        Double improvement = getDoubleOrNull(recommendation, "improvement_percent");
        String indexDef = getStringOrNull(recommendation, "index_definition");
        String hint = getStringOrNull(recommendation, "hint_suggestion");
        
        return new RecommendationFields(type, confidence, rationale, 
                                       improvement, indexDef, hint);
    }
    
    private static String getStringOrNull(Map<String, Object> map, String key) {
        Object val = map.get(key);
        return val instanceof String ? (String) val : null;
    }
    
    private static Double getDoubleOrNull(Map<String, Object> map, String key) {
        Object val = map.get(key);
        return val instanceof Number ? ((Number) val).doubleValue() : null;
    }
}

389-420: Extract hardcoded improvement percentage to a constant.

Line 403 uses a magic number 80.0 for the estimated improvement percentage. This should be a named constant for clarity and maintainability.

+private static final double MISSING_INDEX_IMPROVEMENT_PERCENT = 80.0;
+private static final double INEFFICIENT_JOIN_IMPROVEMENT_PERCENT = 50.0;
+private static final double CORRELATED_SUBQUERY_IMPROVEMENT_PERCENT = 60.0;
+
 private Map<String, Object> analyzeFullTableScan(String sql, ExecutionPlan plan) {
     Map<String, Object> result = new HashMap<>();
     // ...
-    result.put("improvement_percent", 80.0);
+    result.put("improvement_percent", MISSING_INDEX_IMPROVEMENT_PERCENT);

Apply similar changes to lines 430 and 447 for consistency.


458-470: Fragile table name extraction from execution plan.

Line 464 assumes parts[3] contains the table name after splitting on whitespace. This is brittle and may fail if Oracle changes the plan format or if operations have different structures.

Consider more robust parsing with validation:

 private String extractTableFromPlan(String planText) {
     String[] lines = planText.split("\n");
     for (String line : lines) {
         if (line.contains("TABLE ACCESS FULL")) {
             String[] parts = line.trim().split("\\s+");
-            if (parts.length >= 4) {
-                return parts[3];
+            // Look for table name: expect pattern like "TABLE ACCESS FULL table_name"
+            for (int i = 0; i < parts.length - 1; i++) {
+                if ("FULL".equalsIgnoreCase(parts[i]) && i + 1 < parts.length) {
+                    String tableName = parts[i + 1];
+                    // Validate: table names typically don't contain parentheses or operators
+                    if (!tableName.contains("(") && !tableName.matches("\\d+\\.?")) {
+                        return tableName;
+                    }
+                }
             }
         }
     }
     return null;
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7cefe86 and d4dadba.

📒 Files selected for processing (51)
  • .dockerignore (1 hunks)
  • Dockerfile (2 hunks)
  • JAVA_21_UPGRADE.md (0 hunks)
  • README.md (1 hunks)
  • ROADMAP_OBSERVABILITY.md (1 hunks)
  • build.gradle (1 hunks)
  • db/migration/V017__account_deactivation_audit_reason_code_not_null.sql (1 hunks)
  • db/migration/V500__user_settings_profile.sql (1 hunks)
  • db/migration/V601__index_effectiveness_audit_tables.sql (1 hunks)
  • db/migration/V700__recompile_plsql_packages_fix_id_generation.sql (1 hunks)
  • docker-compose.yml (1 hunks)
  • docker-start.sh (1 hunks)
  • frontend/src/app/app.routes.ts (1 hunks)
  • frontend/src/app/features/analytics/analytics.component.ts (1 hunks)
  • frontend/src/app/features/analytics/pages/dashboard.component.ts (1 hunks)
  • frontend/src/app/features/analytics/pages/performance.component.ts (1 hunks)
  • frontend/src/app/features/analytics/pages/reports.component.ts (1 hunks)
  • frontend/src/app/features/analytics/pages/usage.component.ts (1 hunks)
  • frontend/src/app/features/analytics/services/analytics.service.ts (1 hunks)
  • nginx.conf (4 hunks)
  • setup-oracle-schema.sh (1 hunks)
  • sql/README.md (1 hunks)
  • sql/grants/data_dictionary_privileges.sql (1 hunks)
  • sql/indexes/users_indexes.sql (1 hunks)
  • sql/master.sql (2 hunks)
  • sql/packages/dynamic_crud_pkg_body.sql (5 hunks)
  • sql/packages/dynamic_crud_pkg_spec.sql (1 hunks)
  • sql/seed/assign_rcs_admin_role.sql (1 hunks)
  • sql/seed/default_roles.sql (1 hunks)
  • sql/tables/05_persisted_queries.sql (1 hunks)
  • sql/tables/06_user_settings.sql (1 hunks)
  • sql/tables/07_audit_index_analysis.sql (1 hunks)
  • sql/tables/08_audit_query_recommendations.sql (1 hunks)
  • sql/tables/user_roles.sql (1 hunks)
  • sql/tables/users.sql (2 hunks)
  • src/main/java/com/rcs/ssf/SsfApplication.java (1 hunks)
  • src/main/java/com/rcs/ssf/dynamic/DynamicCrudGateway.java (3 hunks)
  • src/main/java/com/rcs/ssf/graphql/AuthMutation.java (1 hunks)
  • src/main/java/com/rcs/ssf/metrics/BatchMetricsRecorder.java (3 hunks)
  • src/main/java/com/rcs/ssf/metrics/IndexEffectivenessAnalyzer.java (1 hunks)
  • src/main/java/com/rcs/ssf/metrics/QueryNormalizationUtils.java (1 hunks)
  • src/main/java/com/rcs/ssf/metrics/QueryPlanAnalyzer.java (4 hunks)
  • src/main/java/com/rcs/ssf/metrics/RecommendationPersistenceService.java (1 hunks)
  • src/main/java/com/rcs/ssf/metrics/SlowQueryAnalyzer.java (1 hunks)
  • src/main/java/com/rcs/ssf/security/JwtAuthenticationFilter.java (1 hunks)
  • src/main/java/com/rcs/ssf/service/DynamicCrudService.java (6 hunks)
  • src/main/java/com/rcs/ssf/tracing/LatencyAnalyzer.java (1 hunks)
  • src/main/java/com/rcs/ssf/tracing/OtelConfig.java (1 hunks)
  • src/main/resources/application-dev.yml (1 hunks)
  • src/main/resources/application.yml (2 hunks)
  • src/test/java/com/rcs/ssf/metrics/IndexEffectivenessAnalyzerTest.java (1 hunks)
💤 Files with no reviewable changes (1)
  • JAVA_21_UPGRADE.md
🧰 Additional context used
🧬 Code graph analysis (5)
frontend/src/app/features/analytics/pages/dashboard.component.ts (1)
frontend/src/app/features/analytics/services/analytics.service.ts (2)
  • AnalyticsDashboard (29-39)
  • LoginAttemptTrendPoint (41-45)
src/main/java/com/rcs/ssf/metrics/QueryNormalizationUtils.java (1)
src/main/java/com/rcs/ssf/service/DynamicCrudService.java (1)
  • Slf4j (37-616)
frontend/src/app/features/analytics/pages/performance.component.ts (1)
frontend/src/app/features/analytics/services/analytics.service.ts (1)
  • PerformanceMetric (6-12)
frontend/src/app/features/analytics/pages/usage.component.ts (1)
frontend/src/app/features/analytics/services/analytics.service.ts (1)
  • UsageMetric (14-19)
frontend/src/app/app.routes.ts (1)
frontend/src/app/core/guards/auth.guard.ts (1)
  • authGuard (21-40)
🪛 GitHub Actions: CI
src/test/java/com/rcs/ssf/metrics/IndexEffectivenessAnalyzerTest.java

[error] 31-31: IndexEffectivenessAnalyzer constructor mismatch. Passed (MeterRegistry, DataSource, JdbcTemplate) but constructor expects (MeterRegistry, JdbcTemplate). Remove DataSource argument or adjust constructor.

build.gradle

[error] 1-1: Gradle build failed: see console output for details. Command: /home/runner/work/spring-graphql/spring-graphql/gradlew clean test jacocoTestReport

🪛 LanguageTool
sql/README.md

[grammar] ~123-~123: Use a hyphen to join words.
Context: ...t` Indexes (30+): - All performance critical indexes included **Packages (2...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.18.1)
sql/README.md

45-45: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


83-83: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

ROADMAP_OBSERVABILITY.md

178-178: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


245-245: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


291-291: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


413-413: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Shellcheck (0.11.0)
docker-start.sh

[warning] 12-12: This $? refers to echo/printf, not a previous command. Assign to variable to avoid it being overwritten.

(SC2320)

Comment thread db/migration/V700__recompile_plsql_packages_fix_id_generation.sql
Comment thread db/migration/V700__recompile_plsql_packages_fix_id_generation.sql
Comment thread Dockerfile
Comment thread frontend/src/app/app.routes.ts Outdated
Comment thread frontend/src/app/features/analytics/services/analytics.service.ts
Comment thread src/main/java/com/rcs/ssf/metrics/QueryNormalizationUtils.java
Comment thread src/main/java/com/rcs/ssf/security/JwtAuthenticationFilter.java Outdated
Comment thread src/main/java/com/rcs/ssf/tracing/LatencyAnalyzer.java
Comment thread src/main/java/com/rcs/ssf/tracing/LatencyAnalyzer.java
- Add su-exec to Dockerfile for privilege dropping, enabling non-root execution of the Java application
- Update docker-start.sh to run Java as appuser with pre-flight validations for su-exec and appgroup
- Fix healthcheck regex in docker-compose.yml for more robust Elasticsearch cluster status parsing
- Add reactor-test dependency in build.gradle for testing Spring reactive components
- Correct v_operation normalization in PL/SQL package for proper audit logging in both success and error cases
- Add validation for non-empty execution plans before generating recommendations to prevent errors
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (13)
frontend/src/styles.css (1)

254-277: Consider documenting component-to-style mappings.

The usage comments are helpful for discovering available styles. To improve developer ergonomics, consider adding a comment block that maps these styles to specific analytics components (e.g., "Use .card-glass for metric summary cards in AnalyticsDashboard"). This would reduce guesswork when implementing UI features.

docker-start.sh (1)

1-153: Consider failure recovery and process monitoring.

The script orchestrates nginx and Java processes, but if nginx crashes unexpectedly (lines 84–103 only detect startup failure, not runtime crashes), the Java process continues running without a reverse proxy. Similarly, if the Java process crashes, nginx continues serving requests that cannot be routed.

For production deployments, consider:

  • Continuous monitoring of both processes with automatic restart or container exit on critical process failure
  • A supervisor tool (e.g., supervisord) or init system (e.g., s6, runit) to manage multiple services with automatic restarts
  • Health check endpoints on both services with periodic verification in the main loop

This is acceptable for development/testing containers, but a production image may benefit from more robust orchestration.

src/test/java/com/rcs/ssf/metrics/IndexEffectivenessAnalyzerTest.java (1)

82-113: Consider more precise mock matching and value assertions.

This test correctly invokes the analyzer, but could be more robust:

  1. Line 98: The mock uses (Object[]) any() which is permissive and might miss signature mismatches.
  2. Lines 110-112: Assertions check for non-empty metrics but don't verify tag values match the test data (e.g., INDEX_NAME = "IDX_USERS_EMAIL").
  3. No verification of the actual persisted values or error handling.

Consider these refinements:

-        when(jdbcTemplate.update(any(String.class), (Object[]) any())).thenReturn(1);
+        // Capture the actual arguments to verify persisted values
+        ArgumentCaptor<Object[]> argsCaptor = ArgumentCaptor.forClass(Object[].class);
+        when(jdbcTemplate.update(any(String.class), argsCaptor.capture())).thenReturn(1);
-        assertFalse(meterRegistry.get("oracle.index.utility_score").gauges().isEmpty());
+        // Verify metrics are tagged with the correct index name
+        assertNotNull(meterRegistry.get("oracle.index.utility_score")
+            .tag("index_name", "IDX_USERS_EMAIL")
+            .gauge());

Additionally, consider adding a test case for error handling (e.g., when jdbcTemplate.queryForList() throws an exception).

src/main/java/com/rcs/ssf/metrics/SlowQueryAnalyzer.java (2)

397-431: Hardcoded improvement percentages should be documented as estimates.

The improvement_percent value of 80% (line 414) is a fixed estimate rather than a measured value. While this heuristic is reasonable for full table scans, users should understand these are approximations.

Consider adding a comment or documentation clarifying that improvement percentages are heuristic estimates, not guarantees.


469-481: Fragile string parsing for table name extraction.

The extractTableFromPlan method relies on a specific Oracle plan format ("TABLE ACCESS FULL table_name") with whitespace splitting. This could break if Oracle changes the plan text format in future versions.

Consider using a more robust pattern-based approach:

 private String extractTableFromPlan(String planText) {
-    // Simple extraction - look for TABLE ACCESS FULL followed by table name
     String[] lines = planText.split("\n");
+    Pattern tablePattern = Pattern.compile("TABLE ACCESS FULL\\s+(\\w+)", Pattern.CASE_INSENSITIVE);
     for (String line : lines) {
-        if (line.contains("TABLE ACCESS FULL")) {
-            String[] parts = line.trim().split("\\s+");
-            if (parts.length >= 4) {
-                return parts[3]; // TABLE ACCESS FULL table_name
-            }
+        Matcher matcher = tablePattern.matcher(line);
+        if (matcher.find()) {
+            return matcher.group(1);
         }
     }
     return null;
 }
build.gradle (4)

42-44: Consolidate split configuration blocks for clarity.

The configurations block is split into two locations (lines 42-44 for Flyway, and lines 197-201 for Tomcat exclusion). Consolidate these into a single block early in the build script to improve readability and maintainability.

- configurations {
-     create('flyway')
- }
+ configurations {
+     create('flyway')
+     all {
+         exclude group: 'org.springframework.boot', module: 'spring-boot-starter-tomcat'
+     }
+ }

- configurations {
-     all {
-         exclude group: 'org.springframework.boot', module: 'spring-boot-starter-tomcat'
-     }
- }

Also applies to: 197-201


160-196: Simplify and harden the Flyway task graph validation logic.

Line 167 uses graph.allTasks.any { it.name.toLowerCase().startsWith('flyway') } to detect Flyway tasks, which relies on Gradle's internal task naming convention. This approach is fragile and could break if Flyway changes its task naming. Consider using Gradle's task type matching instead for robustness.

Additionally, the validation occurs within the flyway configuration block, meaning credentials are only checked when Flyway block is evaluated (not necessarily when other tasks run). Consider moving this to a separate task or using a different hook if early validation is critical.

- gradle.taskGraph.whenReady { graph ->
-     if (graph.allTasks.any { it.name.toLowerCase().startsWith('flyway') }) {
+ gradle.taskGraph.whenReady { graph ->
+     if (graph.allTasks.any { it instanceof org.flywaydb.gradle.task.FlywayTask }) {

71-76: Centralize resilience4j dependency versions in dependency-management block.

Lines 71-76 manually pin all resilience4j modules to 2.1.0. For consistency with the BOM-driven approach used for OpenTelemetry and Spring Boot managed dependencies, consider centralizing resilience4j versions in the dependencyManagement block rather than repeating the version across all modules.

Add to the dependencyManagement block (after line 31):

  dependencyManagement {
      imports {
          mavenBom "io.opentelemetry:opentelemetry-bom:1.40.0"
          mavenBom "io.opentelemetry.instrumentation:opentelemetry-instrumentation-bom:2.6.0"
      }
      dependencies {
          dependency 'io.opentelemetry.semconv:opentelemetry-semconv:1.37.0'
+         // Resilience4j version centralization
+         dependency 'io.github.resilience4j:resilience4j-spring-boot3:2.1.0'
+         dependency 'io.github.resilience4j:resilience4j-circuitbreaker:2.1.0'
+         dependency 'io.github.resilience4j:resilience4j-retry:2.1.0'
+         dependency 'io.github.resilience4j:resilience4j-bulkhead:2.1.0'
+         dependency 'io.github.resilience4j:resilience4j-timelimiter:2.1.0'
+         dependency 'io.github.resilience4j:resilience4j-micrometer:2.1.0'
      }
  }

Then update lines 71-76 to remove versions:

  // Resilience & Observability
- implementation 'io.github.resilience4j:resilience4j-spring-boot3:2.1.0'
- implementation 'io.github.resilience4j:resilience4j-circuitbreaker:2.1.0'
- implementation 'io.github.resilience4j:resilience4j-retry:2.1.0'
- implementation 'io.github.resilience4j:resilience4j-bulkhead:2.1.0'
- implementation 'io.github.resilience4j:resilience4j-timelimiter:2.1.0'
- implementation 'io.github.resilience4j:resilience4j-micrometer:2.1.0'
+ implementation 'io.github.resilience4j:resilience4j-spring-boot3'
+ implementation 'io.github.resilience4j:resilience4j-circuitbreaker'
+ implementation 'io.github.resilience4j:resilience4j-retry'
+ implementation 'io.github.resilience4j:resilience4j-bulkhead'
+ implementation 'io.github.resilience4j:resilience4j-timelimiter'
+ implementation 'io.github.resilience4j:resilience4j-micrometer'

199-199: Add explanatory comment for Tomcat exclusion.

Line 199 excludes spring-boot-starter-tomcat without context. Given that line 54 explicitly adds spring-boot-starter-jetty, the exclusion is intentional but unclear to future readers. Add a brief comment explaining the rationale.

  configurations {
      all {
-         exclude group: 'org.springframework.boot', module: 'spring-boot-starter-tomcat'
+         // Use Jetty (line 54) as the embedded servlet container instead of the default Tomcat
+         exclude group: 'org.springframework.boot', module: 'spring-boot-starter-tomcat'
      }
  }
frontend/src/app/features/analytics/analytics.module.ts (1)

1-15: AnalyticsModule wiring is fine; some imports may be redundant

Given that AnalyticsComponent and the child pages are standalone and routing is handled via AnalyticsRoutingModule, CommonModule and importing AnalyticsComponent here are likely unnecessary unless you plan to use them in non-standalone declarations later. You could simplify to only import AnalyticsRoutingModule and keep AnalyticsService in providers, but the current setup is functionally correct.

frontend/src/app/features/analytics/pages/dashboard.component.ts (1)

272-368: Data-loading flow is robust; consider a simple loading state for UX

The init/refresh pipeline (loadDatafetchDashboardData with takeUntil, state updates, and guarded catchError that preserves the last good dashboard while clearing the chart) is solid. As an optional improvement, you might add an isLoading flag to disable the Refresh button and/or show a spinner while a request is in flight to avoid overlapping refresh calls and give clearer feedback on slow responses.

src/main/java/com/rcs/ssf/tracing/LatencyAnalyzer.java (1)

30-38: Thread-safety design with immutable snapshots and ConcurrentHashMap.compute looks solid

The combination of:

  • Immutable LayerLatencyStats with final fields and a constructor-computed totalLatencyMs, and
  • Atomic updates via latencyStats.compute(resolver, (key, old) -> old.withLayerLatency(...))

addresses the earlier data-race concerns: readers in logLatencyAttributions/detectAndAlertBottlenecks now observe consistent per-resolver snapshots without torn long reads.

One minor doc nit: the Javadoc on recordLayerLatency still mentions a “compareAndSet pattern”, but the implementation now relies on ConcurrentHashMap.compute(...) for atomic read‑modify‑write. It might be worth rephrasing that comment to match the current approach.

Also applies to: 216-224, 269-323

src/main/java/com/rcs/ssf/metrics/BatchMetricsRecorder.java (1)

189-251: Excellent thread-safe refactoring with atomic snapshots.

The refactoring from mutable fields to AtomicReference<MetricsSnapshot> significantly improves thread safety and consistency:

  1. Eliminates TOCTOU races: All getters (getErrorRate, getAverageThroughput, etc.) now read from a consistent snapshot, preventing time-of-check to time-of-use issues where multiple fields could be read at different points in time.

  2. Lock-free atomic updates: The CAS loop in recordSuccess (lines 197-210) correctly implements lock-free updates, avoiding synchronization overhead.

  3. On-demand derived metrics: getAverageThroughput() and getErrorRate() now compute from the atomic snapshot, ensuring they're always consistent with the underlying data.

  4. Proper zero-division protection: Lines 226, 238-240, and 246-248 correctly handle edge cases.

Minor consideration: Under very high contention, the CAS loop could spin and create multiple short-lived MetricsSnapshot objects. For metrics recording, this is an acceptable tradeoff for the benefits of lock-free operation.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4dadba and bad2328.

📒 Files selected for processing (22)
  • Dockerfile (2 hunks)
  • build.gradle (1 hunks)
  • db/migration/V700__recompile_plsql_packages_fix_id_generation.sql (1 hunks)
  • docker-compose.yml (2 hunks)
  • docker-start.sh (1 hunks)
  • frontend/src/app/app.component.html (1 hunks)
  • frontend/src/app/app.routes.ts (1 hunks)
  • frontend/src/app/features/analytics/analytics-routing.module.ts (1 hunks)
  • frontend/src/app/features/analytics/analytics.component.ts (1 hunks)
  • frontend/src/app/features/analytics/analytics.module.ts (1 hunks)
  • frontend/src/app/features/analytics/pages/dashboard.component.ts (1 hunks)
  • frontend/src/app/features/analytics/pages/reports.component.ts (1 hunks)
  • frontend/src/app/features/analytics/pages/usage.component.ts (1 hunks)
  • frontend/src/app/features/analytics/services/analytics.service.ts (1 hunks)
  • frontend/src/styles.css (1 hunks)
  • setup-oracle-schema.sh (1 hunks)
  • src/main/java/com/rcs/ssf/metrics/BatchMetricsRecorder.java (4 hunks)
  • src/main/java/com/rcs/ssf/metrics/SlowQueryAnalyzer.java (1 hunks)
  • src/main/java/com/rcs/ssf/security/JwtAuthenticationFilter.java (1 hunks)
  • src/main/java/com/rcs/ssf/service/DynamicCrudService.java (6 hunks)
  • src/main/java/com/rcs/ssf/tracing/LatencyAnalyzer.java (1 hunks)
  • src/test/java/com/rcs/ssf/metrics/IndexEffectivenessAnalyzerTest.java (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
  • frontend/src/app/app.routes.ts
  • src/main/java/com/rcs/ssf/security/JwtAuthenticationFilter.java
  • Dockerfile
  • frontend/src/app/features/analytics/services/analytics.service.ts
  • frontend/src/app/features/analytics/pages/reports.component.ts
  • docker-compose.yml
  • frontend/src/app/features/analytics/pages/usage.component.ts
  • src/main/java/com/rcs/ssf/service/DynamicCrudService.java
🧰 Additional context used
🧬 Code graph analysis (4)
frontend/src/app/features/analytics/analytics.component.ts (1)
frontend/src/app/features/analytics/analytics-routing.module.ts (1)
  • Component (7-16)
frontend/src/app/features/analytics/analytics-routing.module.ts (4)
frontend/src/app/features/analytics/analytics.component.ts (1)
  • Component (12-32)
frontend/src/app/features/analytics/pages/dashboard.component.ts (1)
  • Component (19-434)
frontend/src/app/app.routes.ts (1)
  • routes (12-123)
frontend/src/app/features/analytics/analytics.module.ts (1)
  • NgModule (7-15)
frontend/src/app/features/analytics/analytics.module.ts (1)
frontend/src/app/features/analytics/analytics-routing.module.ts (1)
  • NgModule (77-81)
frontend/src/app/features/analytics/pages/dashboard.component.ts (1)
frontend/src/app/features/analytics/services/analytics.service.ts (2)
  • AnalyticsDashboard (30-40)
  • LoginAttemptTrendPoint (42-46)
🪛 Biome (2.1.2)
frontend/src/styles.css

[error] 190-190: Unexpected unknown pseudo-elements: ng-deep

See MDN web docs for more details.
Use a known pseudo-elements instead, such as:

(lint/correctness/noUnknownPseudoElement)


[error] 199-199: Unexpected unknown pseudo-elements: ng-deep

See MDN web docs for more details.
Use a known pseudo-elements instead, such as:

(lint/correctness/noUnknownPseudoElement)


[error] 206-206: Unexpected unknown pseudo-elements: ng-deep

See MDN web docs for more details.
Use a known pseudo-elements instead, such as:

(lint/correctness/noUnknownPseudoElement)


[error] 216-216: Unexpected unknown pseudo-elements: ng-deep

See MDN web docs for more details.
Use a known pseudo-elements instead, such as:

(lint/correctness/noUnknownPseudoElement)


[error] 223-223: Unexpected unknown pseudo-elements: ng-deep

See MDN web docs for more details.
Use a known pseudo-elements instead, such as:

(lint/correctness/noUnknownPseudoElement)


[error] 232-232: Unexpected unknown pseudo-elements: ng-deep

See MDN web docs for more details.
Use a known pseudo-elements instead, such as:

(lint/correctness/noUnknownPseudoElement)


[error] 238-238: Unexpected unknown pseudo-elements: ng-deep

See MDN web docs for more details.
Use a known pseudo-elements instead, such as:

(lint/correctness/noUnknownPseudoElement)


[error] 246-246: Unexpected unknown pseudo-elements: ng-deep

See MDN web docs for more details.
Use a known pseudo-elements instead, such as:

(lint/correctness/noUnknownPseudoElement)


[error] 250-250: Unexpected shorthand property border after border-color

(lint/suspicious/noShorthandPropertyOverrides)

🪛 Shellcheck (0.11.0)
docker-start.sh

[warning] 12-12: This $? refers to echo/printf, not a previous command. Assign to variable to avoid it being overwritten.

(SC2320)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (23)
frontend/src/styles.css (2)

8-184: LGTM: Button and card styling are well-crafted.

The gradient, neon, glass, and 3D button styles, along with the glassmorphism, gradient-border, neumorphism, and holographic card styles, are visually cohesive, accessible (clear hover states), and performant (CSS-based animations). The consistent 0.3s transition timing and smooth transforms enhance UX without overhead.


190-251: The provided diff is incorrect and doesn't match the stated fix.

The review correctly identifies that border-color: transparent (line 249) is redundant before border: 2px solid transparent (line 251), but the provided diff removes and re-adds the border shorthand identically, making no actual change.

The correct fix should remove the redundant border-color line:

 .input-animated ::ng-deep .ant-input:focus {
-    border-color: transparent;
     background: linear-gradient(white, white) padding-box,
                             linear-gradient(135deg, #667eea 0%, #764ba2 100%) border-box;
     border: 2px solid transparent;
 }

Regarding ::ng-deep warnings: Biome's CSS linter doesn't recognize this Angular-specific pseudo-element. This is a known false positive when using third-party component libraries (Ant Design) with view encapsulation. Add a CSS rule to biome.json to suppress this warning, or use component-scoped styles to avoid piercing encapsulation entirely.

Likely an incorrect or invalid review comment.

docker-start.sh (2)

84-103: Verify HTTP-only health check aligns with SSL/HTTPS setup.

Line 92 performs the nginx readiness check on http://localhost:80 (HTTP), but the PR description mentions automatic generation of self-signed SSL certificates and HTTPS deployment. Confirm whether:

  • nginx is configured to serve both HTTP and HTTPS, and the HTTP check is sufficient to verify general readiness
  • The health check should instead probe HTTPS (:443), requiring curl to skip certificate validation (-k flag)
  • There's a downstream service (e.g., orchestrator) that performs HTTPS health checks

If HTTPS is the primary protocol, consider updating the health check to validate the secure endpoint:

-    if curl -sf http://localhost:80 > /dev/null 2>&1; then
+    if curl -sfk https://localhost:443 > /dev/null 2>&1; then

134-134: No action required — JAR filename matches correctly.

The Dockerfile (line 58) copies the built JAR oracle-graphql-1.0.0-RELEASE.jar and renames it to app.jar in the container. The docker-start.sh script (line 134) correctly references this renamed file. The filenames match as intended.

setup-oracle-schema.sh (1)

1-112: Good structure and error handling overall.

The script properly uses set -e for fail-fast semantics, validates non-empty inputs, handles sqlplus exit codes gracefully, and provides clear messaging. The issues flagged above are about credential exposure at lines 75 and 98; addressing those will resolve the security concerns.

db/migration/V700__recompile_plsql_packages_fix_id_generation.sql (2)

468-490: Operation normalization for record_audit is now correct.

The code properly normalizes v_norm_operation back to SQL verbs ('INSERT', 'SELECT', 'UPDATE', 'DELETE') before calling record_audit (lines 468–474 and 484–489). This addresses the critical operation enum mismatch flagged in previous review—constants like c_op_create='CREATE' are now correctly mapped to expected SQL verbs before auditing.


398-398: The hard-coded id column is correct for the current implementation.

Verification confirms all three tables in g_allowed_tables use id as their primary key column name:

  • USERS: id NUMBER(19) DEFAULT user_id_seq.NEXTVAL PRIMARY KEY (sequence-based)
  • AUDIT_LOGIN_ATTEMPTS: id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY (identity-based)
  • AUDIT_SESSIONS: id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY (identity-based)

The code successfully achieves the stated PR objective of supporting both SEQUENCE-based and IDENTITY-based primary keys. The hard-coded id in the RETURNING clause works correctly for all allowed tables. No dynamic PK lookup helper exists in the codebase, and none is necessary given the consistent use of id across all allowed tables.

Likely an incorrect or invalid review comment.

src/test/java/com/rcs/ssf/metrics/IndexEffectivenessAnalyzerTest.java (1)

24-32: Constructor mismatch resolved.

The previous constructor issue has been fixed. The test now correctly instantiates IndexEffectivenessAnalyzer with (MeterRegistry, JdbcTemplate), matching the production signature.

src/main/java/com/rcs/ssf/metrics/SlowQueryAnalyzer.java (6)

63-100: Excellent constructor validation.

The defensive parameter validation is thorough and provides clear error messages. The unused planRegressionThresholdPercent field is acceptable given the TODO at lines 291-296 explaining the deferred baseline comparison feature.


115-150: LGTM!

The main analysis flow is well-structured with appropriate null checks, fast-path optimization for queries under the threshold, and proper error handling. The conditional execution based on feature flags (executionPlanCaptureEnabled, recommendationGenerationEnabled) is a good design choice.


483-603: Excellent dual-strategy approach for WHERE clause extraction.

The combination of JSqlParser for robust parsing (lines 483-533) with a regex-based fallback (lines 535-603) provides both accuracy and resilience. The visitor pattern, parameter filtering, and defensive error handling are well-implemented.


605-624: Correct use of Counter instead of Gauge for recommendation metrics.

Using Counter (line 614) instead of Gauge is the right choice to avoid memory leaks from repeated meter registration. The tagging strategy with query_hash, type, and confidence enables effective metric filtering.


628-670: LGTM!

The hash generation using SHA-256 is appropriate for query deduplication, and the fallback to hashCode() provides resilience. The ExecutionPlan inner class is cleanly designed as an immutable data holder.


206-258: The getSqlId method already includes performance optimizations; environmental validation is appropriate.

The code implements a two-stage lookup strategy that mitigates the LIKE performance concern:

  1. Exact match first (lines 222–228): Attempts canonicalized SQL comparison before falling back to prefix matching.
  2. Intentional 500-character prefix (line 235): The code explicitly uses 500 characters (not 100) to reduce false positives, as documented in the comments.
  3. Proper query ordering (lines 226, 245): Orders by last_active_time DESC and fetches only the first row, reducing unnecessary result processing.
  4. Correct ESCAPE handling (lines 236–239): LIKE wildcards are properly escaped to prevent unintended pattern matching.

The design shows thoughtful optimization. Performance impact will be environment-dependent (tied to v$sql size and query frequency in your system). Monitoring the query execution times for this lookup would be appropriate to validate behavior under production load, but no code changes are required at this time.

build.gradle (3)

34-40: SQL directory structure verified and correctly mapped.

The sql/ directory exists at the project root and contains a well-organized structure with database resources (cleanup scripts, grants, indexes, packages, sequences, and tables). The srcDir 'sql' configuration is valid and properly maps these SQL files as build resources.


194-196: Directory is properly committed and contains migration files—no action needed.

Verification confirms the db/migration directory exists and is properly committed to the repository with multiple SQL migration files present. The failOnMissingLocations = true setting will not cause unexpected failures on initial project setup.


68-68: I need to verify the compatibility claims by checking official Spring Boot and reactor-netty documentation.

I need to check the official Spring Boot 3.5.7 BOM documentation to verify the exact versions.

Based on my verification of the reactor-netty and R2DBC dependency ecosystem, I can now provide the rewritten review comment.

No action required—BOM-managed reactor-netty version is correctly configured.

Omitting a version to allow pickup from the BOM is the standard approach, and the inline comment accurately notes Spring Boot 3.5.7 provides reactor-netty 1.1.8. The web search confirms oracle-r2dbc is interoperable with r2dbc-pool, and no documented conflicts exist between reactor-netty 1.1.8, oracle-r2dbc 1.1.1, and r2dbc-pool 1.0.1 in production environments. The dependency declaration is correct as-is.

frontend/src/app/app.component.html (1)

36-64: Analytics submenu wiring looks consistent with feature routing

The new submenu cleanly maps to the /analytics/{dashboard,performance,usage,reports} routes and matches the existing menu structure; no functional issues spotted.

frontend/src/app/features/analytics/analytics-routing.module.ts (1)

6-75: Lazy-loaded analytics routing with safe fallback looks solid

The safeLoadComponent wrapper plus AnalyticsLoadErrorComponent gives a clear, resilient fallback for failed lazy-loads, and the child routes under AnalyticsComponent (dashboard/performance/usage/reports with redirects to dashboard) line up cleanly with the intended /analytics/* structure.

frontend/src/app/features/analytics/analytics.component.ts (1)

5-31: Minimal analytics container is appropriate

This standalone AnalyticsComponent is a straightforward routing host with scoped layout styling; it keeps the feature boundary clear without adding unnecessary logic.

frontend/src/app/features/analytics/pages/dashboard.component.ts (1)

375-433: Defensive helpers around login counts and dates look good

getSuccessfulLoginAttempts safely clamps negatives, the health color/icon helpers centralize status mapping, and formatDate gracefully falls back on invalid or unexpected date strings—these should make the dashboard resilient to mildly inconsistent backend data without blowing up the UI.

src/main/java/com/rcs/ssf/metrics/BatchMetricsRecorder.java (2)

173-187: LGTM! Clean immutable snapshot implementation.

The MetricsSnapshot class correctly implements an immutable data holder with final fields, supporting the thread-safe atomic update pattern used in BatchMetrics. The private visibility is appropriate since this is an internal implementation detail.


65-65: No verification needed—the concern is not supported by evidence in the codebase.

The review comment raises a hypothetical concern about external callers using the old recordSuccess signature with 4 parameters. However, codebase verification reveals:

  • Zero calls to recordSuccess with 4 parameters anywhere in the codebase
  • All internal calls (lines 65, 106) correctly use the new 3-parameter signature
  • Test file references BatchMetrics only through getMetrics(), never directly invoking recordSuccess
  • No evidence of external code calling recordSuccess with the legacy signature

While BatchMetrics is indeed public static (line 193) and theoretically accessible externally, the actual usage patterns show it is not called directly from outside BatchMetricsRecorder. The refactoring is safe.

Likely an incorrect or invalid review comment.

Comment thread build.gradle
Comment on lines +26 to +62
CREATE OR REPLACE PACKAGE dynamic_crud_pkg AS
-- Supported operation names
c_op_create CONSTANT VARCHAR2(10) := 'CREATE';
c_op_read CONSTANT VARCHAR2(10) := 'READ';
c_op_update CONSTANT VARCHAR2(10) := 'UPDATE';
c_op_delete CONSTANT VARCHAR2(10) := 'DELETE';

SUBTYPE t_operation IS VARCHAR2(10);

-- Metadata helpers
FUNCTION is_table_allowed(p_table_name IN VARCHAR2) RETURN BOOLEAN;
FUNCTION normalize_table_name(p_table_name IN VARCHAR2) RETURN VARCHAR2;

-- Execute a single-row CRUD operation
PROCEDURE execute_operation(
p_table_name IN VARCHAR2,
p_operation IN t_operation,
p_column_names IN dyn_column_name_nt,
p_column_values IN dyn_column_value_nt,
p_filters IN dyn_filter_nt DEFAULT NULL,
p_audit IN dyn_audit_ctx_rec DEFAULT NULL,
p_message OUT VARCHAR2,
p_generated_id OUT VARCHAR2,
p_affected_rows OUT NUMBER
);

-- Execute a bulk CRUD operation using collection payload
PROCEDURE execute_bulk(
p_table_name IN VARCHAR2,
p_operation IN t_operation,
p_rows IN dyn_row_op_nt,
p_filters IN dyn_filter_nt DEFAULT NULL,
p_audit IN dyn_audit_ctx_rec DEFAULT NULL,
p_message OUT VARCHAR2,
p_affected OUT NUMBER
);
END dynamic_crud_pkg;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Embedded package copy diverges from canonical standalone; risks deployment drift.

This migration recompiles the entire dynamic_crud_pkg spec and body inline (lines 26–531), duplicating sql/packages/dynamic_crud_pkg_body.sql. While the past review suggested either extracting a canonical source or ensuring byte-for-byte parity, the current approach remains an embedded copy.

Risk: Whichever file is applied last to the database "wins." If future changes are made to the standalone file, this migration becomes stale unless manually synchronized. The past review example shows operation constants and audit logic already diverging between the two versions.

Recommendation: Either:

  1. Replace the embedded package with a call to source the standalone file (if Flyway/SQL*Plus supports it), or
  2. Generate this migration from the canonical standalone file as part of the build/CI pipeline to ensure they stay synchronized.

Check whether the standalone package file and this migration are already divergent by comparing operation handling and PK lookup logic:

[/request_verification]

Also applies to: 66-531


🏁 Script executed:

#!/bin/bash
# Compare key sections between standalone and migration
echo "=== Operation constants in standalone ===" && \
  rg -n "c_op_create|c_op_read|c_op_update|c_op_delete" sql/packages/dynamic_crud_pkg_spec.sql -A 1

echo -e "\n=== Operation constants in migration ===" && \
  rg -n "c_op_create|c_op_read|c_op_update|c_op_delete" db/migration/V700__recompile_plsql_packages_fix_id_generation.sql -A 1 | head -20

echo -e "\n=== Checking if standalone has PK lookup helper ===" && \
  rg -n "FUNCTION.*get.*pk\|FUNCTION.*pk.*column\|find.*primary" -i sql/packages/dynamic_crud_pkg_body.sql

Length of output: 1161


I can see the scripts confirmed a critical divergence. The operation constant values differ fundamentally between the two files. Let me verify the impact of this mismatch on the execution logic:


Embedded package constants diverge critically from standalone version—migration will not execute correctly.

The verification confirms the operation constants differ fundamentally and create a broken normalization path:

  • Standalone file (sql/packages/dynamic_crud_pkg_spec.sql): c_op_create := 'INSERT', c_op_read := 'SELECT'
  • Migration file (lines 28–31): c_op_create := 'CREATE', c_op_read := 'READ'

The migration's normalization logic (lines 346–348) converts input strings 'INSERT' and 'SELECT' to the package constants c_op_create and c_op_read, but these constants do not match the input values. Subsequent conditional checks (lines 372, 421) that evaluate WHEN c_op_create will fail because the assignment never completes—the normalized variables remain unset, causing the ELSE clause to trigger and the operation to fail silently or error.

Fix required: Align the migration's operation constants with the standalone file values ('INSERT', 'SELECT'), or rewrite the normalization logic to match the new constant definitions.

🤖 Prompt for AI Agents
In db/migration/V700__recompile_plsql_packages_fix_id_generation.sql around
lines 26 to 62, the package operation constants currently use 'CREATE' and
'READ' but the normalization and conditional logic expect 'INSERT' and 'SELECT'
(as in sql/packages/dynamic_crud_pkg_spec.sql); change c_op_create to 'INSERT'
and c_op_read to 'SELECT' (or alternatively update the normalization/conditional
checks to map to the new names) so the normalized operation values match the
constants used in the WHEN branches and downstream logic.

Comment thread docker-start.sh
Comment on lines +11 to +12
cleanup() {
local exit_code=$?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix the exit code capture in trap handler.

Line 12 uses $? immediately after the function is defined, which captures the exit status of the local command (always 0), not any meaningful exit code. When a signal-based trap (SIGTERM/SIGINT) is triggered, there is no command exit code to preserve. The cleanup function will always exit with code 0, losing any error context.

Consider one of these approaches:

  • Remove $? and use a fixed exit code (e.g., exit 1 on error, exit 0 on success)
  • Accept the exit code as a parameter: cleanup() { local exit_code=${1:-0}; ... }
  • Use a global variable to track error states explicitly
-cleanup() {
-    local exit_code=$?
-    
+cleanup() {
+    local exit_code=${1:-0}
+    

Then update the trap invocation to pass exit codes explicitly when needed, or use a global variable to track error states from the main script.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cleanup() {
local exit_code=$?
cleanup() {
local exit_code=${1:-0}
🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 12-12: This $? refers to echo/printf, not a previous command. Assign to variable to avoid it being overwritten.

(SC2320)

🤖 Prompt for AI Agents
In docker-start.sh around lines 11 to 12, the trap handler captures the exit
code incorrectly by reading $? inside the cleanup function immediately after
declaring a local variable (which yields 0); change cleanup to accept an
explicit exit code parameter or use a global status variable and update trap
invocations to pass the intended code. For example, redefine cleanup to take
exit_code=${1:-0} and call trap 'cleanup 1' for signals or call cleanup "$?" on
normal exit, or alternatively set a global error flag in the script and have
cleanup use that flag to decide exit 0 vs 1; ensure trap and normal-exit paths
pass the correct code so cleanup exits with meaningful status.

Comment thread setup-oracle-schema.sh
echo "=========================================================================="
echo ""
echo "Executing from: $SQL_DIR"
echo "Connection: $CONN_STRING"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Remove password from output—credentials printed to terminal and logs.

Command-line parameters can be read by anyone via /proc/<pid>/cmdline, and using sqlplus /nolog avoids exposing the password to process monitoring. Line 75 echoes the full CONN_STRING containing the password to the terminal and any captured output/logs.

Remove the connection string from output, or redact the password:

 echo ""
 echo "=========================================================================="
 echo "SETTING UP SCHEMA"
 echo "=========================================================================="
 echo ""
 echo "Executing from: $SQL_DIR"
-echo "Connection: $CONN_STRING"
+echo "Connection: ${ORACLE_USER}@${ORACLE_CONN}"
 echo ""

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In setup-oracle-schema.sh around line 75, the script currently echoes the full
CONN_STRING which exposes credentials; remove the echo or redact the password
before printing (e.g., replace password portion with ****) and prefer invoking
sqlplus with /nolog and then connecting with a secure CONNECT operation so
credentials are not visible in process listings or logs.

Comment thread setup-oracle-schema.sh
Comment on lines +98 to +102
sqlplus "$CONN_STRING" << 'EOF'
SET HEADING ON FEEDBACK ON TIMING ON ECHO ON
@master.sql
EXIT
EOF
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Avoid passing password on sqlplus command line—use /nolog with heredoc connection.

The recommended pattern for sqlplus is sqlplus /nolog with credentials passed via heredoc, which prevents the password from showing in process listings. Currently, the password is visible to any process monitor running ps.

Refactor to connect within the sqlplus session:

 # Run sqlplus script
 # Temporarily disable errexit to capture sqlplus exit code and handle failures gracefully
 set +e
-sqlplus "$CONN_STRING" << 'EOF'
+sqlplus /nolog << 'EOF'
+CONNECT ${ORACLE_USER}/${ORACLE_PASSWORD}@${ORACLE_CONN}
 SET HEADING ON FEEDBACK ON TIMING ON ECHO ON
 @master.sql
 EXIT
 EOF

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In setup-oracle-schema.sh around lines 98-102, the script invokes sqlplus with
the full connection string on the command line exposing the password; change to
use sqlplus /nolog and perform CONNECT inside the heredoc so credentials are not
visible in process listings: replace the sqlplus "$CONN_STRING" invocation with
sqlplus /nolog <<EOF (note: remove the quoted 'EOF' so variables expand) and add
a first line inside the heredoc that runs CONNECT "$CONN_STRING" before the
SET/@master.sql/EXIT lines; keep the rest of the heredoc intact.

Comment thread src/main/java/com/rcs/ssf/tracing/LatencyAnalyzer.java
Comment thread src/main/java/com/rcs/ssf/tracing/LatencyAnalyzer.java
Comment on lines +66 to +80
@Test
@DisplayName("Should handle zero total indexes gracefully")
void testZeroTotalIndexes() {
// If no indexes exist, utility should be 0
double utility = (0.0 * 100.0) / (0.0 + 1.0);
assertEquals(0.0, utility, 0.01);
}

@Test
@DisplayName("Should handle high usage count")
void testHighUsageCount() {
// High usage should result in high utility score
double utility = (1000.0 * 100.0) / (50.0 + 1.0);
assertTrue(utility > 95.0);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Tests don't invoke analyzer methods.

Both testZeroTotalIndexes and testHighUsageCount perform inline arithmetic calculations without calling any methods on the analyzer. Like testUtilityScoreCalculation, these tests provide false coverage and should be rewritten to verify the analyzer's actual behavior.

Consider consolidating these edge cases into parameterized tests that actually invoke the analyzer with mocked data:

@ParameterizedTest
@CsvSource({
    "0, 0, 0.0",      // zero total indexes
    "1000, 50, 96.08" // high usage count
})
@DisplayName("Should handle edge cases in utility calculation")
void testUtilityEdgeCases(long accessCount, int totalIndexes, double expectedMinUtility) {
    when(jdbcTemplate.queryForList(any(String.class))).thenReturn(List.of(
        Map.of(
            "INDEX_NAME", "EDGE_CASE_IDX",
            "TABLE_NAME", "TEST",
            "TOTAL_SIZE_BYTES", new java.math.BigDecimal("1000"),
            "ACCESS_COUNT", new java.math.BigDecimal(accessCount),
            "READ_COUNT", new java.math.BigDecimal("0"),
            "WRITE_COUNT", new java.math.BigDecimal("0")
        )
    ));
    when(jdbcTemplate.update(any(String.class), (Object[]) any())).thenReturn(1);
    
    analyzer.analyzeIndexEffectiveness();
    
    // Verify utility score was calculated
    assertDoesNotThrow(() -> meterRegistry.get("oracle.index.utility_score")
        .tag("index_name", "EDGE_CASE_IDX")
        .gauge());
}
🤖 Prompt for AI Agents
In src/test/java/com/rcs/ssf/metrics/IndexEffectivenessAnalyzerTest.java around
lines 66-80, the two tests perform inline math instead of invoking the analyzer;
rewrite them to actually call analyzer.analyzeIndexEffectiveness() with mocked
JDBC results and metric updates. Replace the two inline tests with a
parameterized test (CsvSource for accessCount, totalIndexes, expectedMinUtility)
that stubs jdbcTemplate.queryForList(...) to return a Map for an index
(ACCESS_COUNT as BigDecimal, TOTAL_SIZE_BYTES, etc.), stubs
jdbcTemplate.update(...) as needed, calls analyzer.analyzeIndexEffectiveness(),
and then asserts the metric was recorded (e.g. retrieve the gauge from
meterRegistry by name and tags and assert it exists or meets the expected
value). Ensure mocks are reset/verified per case and keep display name and
assertions robust to floating-point variance.

- Deleted `EXPLAIN_PLAN_ANALYSIS.sql` and `OPTIMIZED_QUERY.sql` as they are no longer needed for query optimization analysis.
- Removed `PROJECT_PLAN.md` which contained outdated project roadmap and implementation details.
- Eliminated `generate_bcrypt.py` script for generating bcrypt passwords, as it is no longer in use.
- Introduced new TLS/SSL certificates and keys for secure communication.
- Configured Logstash to accept TCP input with mTLS for Spring Boot logs.
- Added Beats input for Nginx/Infrastructure logs with SSL enabled.
- Implemented JDBC input for fetching audit logs from Oracle database.
- Updated Logback configuration to include Logstash TCP appender for structured logging.
- Added necessary drivers and configuration files for JDBC connectivity.
@gitguardian
Copy link
Copy Markdown

gitguardian bot commented Nov 24, 2025

⚠️ GitGuardian has uncovered 3 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
22718269 Triggered Generic Private Key 99d6ea3 logstash/certs/client.pkcs8.key View secret
22718270 Triggered Generic Private Key 99d6ea3 logstash/certs/server.key View secret
22718271 Triggered Generic Private Key 99d6ea3 logstash/certs/ca.key View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (1)
logstash/config/pipelines.conf (1)

23-35: Consider JDBC connection reliability improvements.

The JDBC input configuration could benefit from additional resilience and flexibility settings:

Recommendations:

  1. Parameterize the connection string for flexibility:

    -    jdbc_connection_string => "jdbc:oracle:thin:@oracle:1521/FREEPDB1"
    +    jdbc_connection_string => "${JDBC_CONNECTION_STRING:-jdbc:oracle:thin:@oracle:1521/FREEPDB1}"
  2. Add connection validation and pooling to prevent stale connections:

    jdbc {
      # ... existing config ...
      jdbc_validate_connection => true
      jdbc_validation_timeout => 5
      jdbc_pool_timeout => 30
      connection_retry_attempts => 3
      connection_retry_attempts_wait_time => 5
    }
  3. Evaluate the polling schedule (* * * * * = every minute). For audit logs, this may be aggressive. Consider reducing frequency if real-time ingestion isn't required:

    schedule => "*/5 * * * *"  # Every 5 minutes
  4. Add error handling to prevent pipeline failures on database issues:

    jdbc {
      # ... existing config ...
      jdbc_fetch_size => 100
      clean_run => false  # Preserve state across restarts
    }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e851dd6 and 99d6ea3.

⛔ Files ignored due to path filters (1)
  • logstash/drivers/ojdbc11.jar is excluded by !**/*.jar
📒 Files selected for processing (16)
  • ROADMAP_OBSERVABILITY.md (1 hunks)
  • docker-compose.yml (3 hunks)
  • logstash/Dockerfile (1 hunks)
  • logstash/certs/ca.crt (1 hunks)
  • logstash/certs/ca.key (1 hunks)
  • logstash/certs/ca.srl (1 hunks)
  • logstash/certs/client.crt (1 hunks)
  • logstash/certs/client.csr (1 hunks)
  • logstash/certs/client.key (1 hunks)
  • logstash/certs/client.pkcs8.key (1 hunks)
  • logstash/certs/server.crt (1 hunks)
  • logstash/certs/server.csr (1 hunks)
  • logstash/certs/server.key (1 hunks)
  • logstash/certs/server.pkcs8.key (1 hunks)
  • logstash/config/pipelines.conf (1 hunks)
  • src/main/resources/logback-spring.xml (5 hunks)
✅ Files skipped from review due to trivial changes (9)
  • logstash/certs/client.crt
  • logstash/certs/server.crt
  • logstash/certs/client.csr
  • logstash/certs/ca.srl
  • logstash/certs/server.pkcs8.key
  • logstash/certs/client.pkcs8.key
  • logstash/certs/server.csr
  • logstash/certs/ca.crt
  • logstash/certs/server.key
🧰 Additional context used
🪛 Gitleaks (8.29.0)
logstash/certs/client.key

[high] 1-52: Identified a Private Key, which may compromise cryptographic security and sensitive data encryption.

(private-key)

logstash/certs/ca.key

[high] 1-52: Identified a Private Key, which may compromise cryptographic security and sensitive data encryption.

(private-key)

🪛 markdownlint-cli2 (0.18.1)
ROADMAP_OBSERVABILITY.md

178-178: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


244-244: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


290-290: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


412-412: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (4)
logstash/Dockerfile (1)

1-6: LGTM: Secure privilege escalation pattern.

The Dockerfile correctly follows security best practices by temporarily elevating to root only for the necessary file operations, then returning to the unprivileged logstash user. The file permissions (644) are appropriately restrictive for a JAR file.

docker-compose.yml (2)

138-156: LGTM: Kibana service properly configured.

The Kibana service is correctly configured with proper health checks and dependency management on Elasticsearch.


85-85: Good improvement: More permissive Elasticsearch healthcheck.

The updated healthcheck regex accepts both green and yellow cluster status, which is appropriate for single-node development environments that cannot achieve green status (no replicas).

src/main/resources/logback-spring.xml (1)

24-27: Good addition: MDC trace correlation fields.

Flattening MDC keys for trace_id, user_id, and request_id improves log correlation and integration with distributed tracing systems.

Comment thread docker-compose.yml
Comment thread docker-compose.yml
Comment thread logstash/certs/ca.key
Comment thread logstash/certs/client.key
Comment thread logstash/config/pipelines.conf
Comment thread ROADMAP_OBSERVABILITY.md
Comment on lines +178 to 525
```
┌──────────────────────────────────────────────────────────────┐
│ Index Effectiveness & Optimization Pipeline │
└──────────────────────────────────────────────────────────────┘

┌─ COLLECTION PHASE (Scheduled, 24h cycle) ──────────────────┐
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Oracle DBA Metadata Query │ │
│ │ - DBA_INDEX_USAGE (index read/write stats) │ │
│ │ - DBA_INDEXES (size, status, compression) │ │
│ │ - DBA_TAB_COLUMNS (column stats, NDV) │ │
│ │ - DBA_SQL_PLAN_BASELINES (baseline plans, usage) │ │
│ │ - V$SQL + V$SQL_PLAN (runtime execution plans) │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ SlowQueryAnalyzer Component (Java) │ │
│ │ - Intercept queries >1000ms via JDBC/R2DBC listeners │ │
│ │ - Extract execution plan from V$SQL_PLAN │ │
│ │ - Identify bottleneck: full table scans, missing indexes │ │
│ │ - Correlate with QueryPlanAnalyzer baseline │ │
│ │ - Generate recommendations: new index, hint rewrite │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ AUDIT_INDEX_ANALYSIS table (persist findings) │ │
│ │ - analysis_id, table_name, index_name, utility_score │ │
│ │ - action (REMOVE, ADD, REBUILD), rationale │ │
│ │ - analyzed_at, recommended_by (component) │ │
│ └──────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘

┌─ ANALYTICS PHASE (Real-time query analysis) ────────────────┐
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ On Each Slow Query (>1s): │ │
│ │ 1. Extract plan from Oracle V$SQL_PLAN │ │
│ │ 2. Parse for: TABLE ACCESS FULL, INDEX RANGE SCAN │ │
│ │ 3. Compare to QueryPlanAnalyzer baseline │ │
│ │ 4. If regression (>5% slower than baseline): │ │
│ │ - Log alert: "Query plan regression detected" │ │
│ │ - Suggest index or query rewrite │ │
│ │ - Store in AUDIT_QUERY_RECOMMENDATIONS │ │
│ │ 5. Emit Prometheus metric: query_plan_regression │ │
│ └──────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘

┌─ DASHBOARD PHASE (Visualization & Remediation) ────────────┐
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Grafana Dashboard: index-effectiveness-dashboard.json │ │
│ │ - Panel 1: Index Utility Score (bar chart) │ │
│ │ Top unused indexes (utility <10%) │ │
│ │ - Panel 2: Missing Indexes (table) │ │
│ │ Columns with high query count, no matching index │ │
│ │ - Panel 3: Slow Query Analysis (table) │ │
│ │ Query hash, plan changes, recommendations │ │
│ │ - Panel 4: Index Size vs Usage (scatter) │ │
│ │ Identify bloated indexes (large size, low utility) │ │
│ │ - Panel 5: Query Regression Alerts (timeline) │ │
│ │ Plan changes, execution time variance │ │
│ └──────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
- **Detailed Tasks**:
- [ ] **Index Usage Analysis (24h Scheduled Task)**:
- **Component**: `com.rcs.ssf.metrics.IndexEffectivenessAnalyzer`
- **Trigger**: `@Scheduled(fixedRate = 86400000)` (24 hours)
- **Implementation**:
```
1. Query Oracle DBA_INDEX_USAGE for each user index:
SELECT index_name, table_name, object_id,
leaf_blocks, distinct_keys,
btree_space, used_space,
access_count (total reads + writes)
FROM dba_index_usage
WHERE owner = 'APP_USER' AND index_type = 'NORMAL'

2. For each index, calculate utility metric:
- Total index size (bytes) = leaf_blocks × 8192
- Read weight = 1.0 (beneficial)
- Write weight = 2.0 (maintenance cost)
- utility = (access_count × read_weight - write_count × write_weight) / total_size

3. Flag candidates:
- REMOVE: utility < 10% AND access_count < 100 per day
(extremely low usage, maintenance cost not justified)
- MONITOR: utility 10-30%
(potential candidate if business case weak)
- KEEP: utility >= 30% (actively used, maintain)

4. For each table, cross-check DBA_TAB_COLUMNS against existing indexes:
- Identify high-cardinality columns (NDV > 100) without index
- Check query patterns in QueryPlanAnalyzer for those columns
- If query count > 1000/day AND no index: recommend new composite

5. Persist findings in AUDIT_INDEX_ANALYSIS:
INSERT INTO AUDIT_INDEX_ANALYSIS (
table_name, index_name, utility_score,
size_mb, access_count, action, rationale,
analyzed_at, created_at
)
```
- **Metrics Exported**:
- Gauge: `oracle.index.utility_score{table, index}` (0-100%)
- Gauge: `oracle.index.size_mb{table, index}`
- Counter: `oracle.index.action{table, index, action=REMOVE|ADD|MONITOR}`
- **Dashboard Integration**:
- Bar chart: Top 20 indexes by utility (descending)
- Table: Unused indexes (utility <10%, sortable by size, action recommended)
- Table: Missing indexes (table, column_name, estimated_impact_percent)
- [ ] **Slow Query Capture & Analysis (Real-time)**:
- **Component**: `com.rcs.ssf.metrics.SlowQueryAnalyzer` (NEW)
- **Trigger**: JDBC/R2DBC listener on every query execution (O(1) fast path for <1s queries)
- **Implementation**:
```
1. Intercept query at JDBC/R2DBC execution layer:
- Capture: SQL text, execution_time_ms, rows_affected
- Query threshold: 1000ms (configurable via application.yml)
- Fast path: Skip analysis for <1000ms (99% of queries)

2. For slow queries (>1000ms):
- Extract Oracle V$SQL_PLAN for query plan:
SELECT * FROM v$sql_plan
WHERE sql_id = ?
ORDER BY plan_line_id

- Parse plan for indicators:
* TABLE ACCESS FULL (missing index)
* FILTER operations (correlated subqueries)
* NESTED LOOP with large row count (join inefficiency)
* SORT (not optimal for order)

- Extract execution statistics:
SELECT * FROM v$sql_stat
WHERE sql_id = ?

3. Correlation Analysis (with QueryPlanAnalyzer baseline):
- Get baseline for query from queryBaselines map
- Calculate variance: |current - baseline| / baseline
- Detect regression: if variance > 5% threshold:
- Flag as potential problem
- Recommend investigation
- Store in AUDIT_QUERY_RECOMMENDATIONS

4. Generate Recommendations:
IF plan shows TABLE ACCESS FULL:
- Recommend composite index: (leading_where_col, sort_col)
- Estimate improvement: cardinality × index_selectivity

IF plan shows NESTED LOOP with large inner table:
- Recommend HASH JOIN hint or index on join column
- Estimate improvement: rows × inner_table_scans_avoided

IF plan shows correlation (multiple FILTER):
- Recommend denormalization or materialized view
- Estimate improvement: cascade effect on other queries

Confidence levels:
- HIGH (>80%): Missing simple index, obvious inefficiency
- MEDIUM (50-80%): Needs DBA verification
- LOW (<50%): Speculative, requires expert review

5. Persistence:
INSERT INTO AUDIT_QUERY_RECOMMENDATIONS (
query_hash, execution_plan, recommendation_type,
confidence, estimated_improvement_percent,
index_definition, hint_suggestion, rationale,
created_at
)

6. Alert:
IF regression detected:
- Log WARN: "Query plan regression for [query_hash]"
- Emit counter: query.plan.regression
- If critical (>50% slower): Emit ALERT to monitoring
```
- **Metrics Exported**:
- Counter: `query.slow_detected{query_hash}` (incremented for >1s queries)
- Counter: `query.plan.regression{query_hash}` (plan deviation >5%)
- Timer: `query.recommendation.generation_ms` (recommendation calc time)
- Gauge: `query.recommendation.confidence{query_hash, type}` (HIGH/MEDIUM/LOW)
- **Database Tables Required**:
```sql
CREATE TABLE AUDIT_INDEX_ANALYSIS (
id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
table_name VARCHAR2(255) NOT NULL,
index_name VARCHAR2(255),
utility_score NUMBER(5,2), -- 0-100%
size_mb NUMBER(10,2),
access_count NUMBER(19),
read_count NUMBER(19),
write_count NUMBER(19),
action VARCHAR2(20), -- REMOVE, ADD, MONITOR, KEEP
rationale VARCHAR2(1000),
analyzed_at TIMESTAMP NOT NULL,
created_at TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL
);
CREATE INDEX idx_analysis_action ON AUDIT_INDEX_ANALYSIS(action, analyzed_at DESC);
CREATE INDEX idx_analysis_table ON AUDIT_INDEX_ANALYSIS(table_name, analyzed_at DESC);

CREATE TABLE AUDIT_QUERY_RECOMMENDATIONS (
id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
query_hash VARCHAR2(64) NOT NULL,
execution_plan CLOB,
plan_cost NUMBER(10),
rows_estimated NUMBER(19),
rows_actual NUMBER(19),
recommendation_type VARCHAR2(50), -- MISSING_INDEX, HINT, REWRITE, DENORMALIZE
confidence VARCHAR2(20), -- HIGH, MEDIUM, LOW
estimated_improvement_percent NUMBER(5,2),
index_definition VARCHAR2(1000),
hint_suggestion VARCHAR2(500),
rationale VARCHAR2(2000),
approved CHAR(1) DEFAULT 'N' NOT NULL,
approved_by VARCHAR2(255),
approved_at TIMESTAMP,
implemented CHAR(1) DEFAULT 'N' NOT NULL,
implemented_at TIMESTAMP,
created_at TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL
);
CREATE INDEX idx_recommendations_hash ON AUDIT_QUERY_RECOMMENDATIONS(query_hash, approved);
CREATE INDEX idx_recommendations_type ON AUDIT_QUERY_RECOMMENDATIONS(recommendation_type, created_at DESC);
```
- **Configuration** (application.yml):
```yaml
observability:
slow_query:
threshold_ms: 1000 # Alert on queries >1s
plan_regression_threshold_percent: 5 # Variance >5% is regression
execution_plan_capture_enabled: true
recommendation_generation_enabled: true
index_analysis_schedule_hours: 24
```
- [ ] **Query Plan Regression Detection & Alerting**:
- **Integration Point**: SlowQueryAnalyzer + QueryPlanAnalyzer baseline
- **Logic**:
```
When slow query detected (>1s):
1. Extract normalized query from QueryPlanAnalyzer
2. Get baseline execution time from queryBaselines map
3. If query has no baseline yet:
- First 100 executions establish baseline via BASELINE_SAMPLES
- No alert during baseline period
4. Once baseline established:
- Calculate: variance_percent = |current - baseline| / baseline × 100
- If variance_percent > ANOMALY_THRESHOLD_PERCENT (5%):
* Plan likely changed (rewrite, stats change, environment)
* Log WARN with query, current time, baseline time, variance
* Emit counter: oracle.query.regression{query_type, variance_percent}
* Extract V$SQL_PLAN and compare to last known good plan
* Store comparison in AUDIT_PLAN_CHANGES table
* Suggest recompilation or index addition

Alert Severity:
- CRITICAL (>50% slower): Likely missing index or plan corruption
→ Immediate investigation, escalate to DBA
- WARNING (20-50%): Significant regression
→ Schedule analysis, update recommendation
- INFO (5-20%): Minor variance, possibly environment difference
→ Monitor trend, investigate if persists >24h
```
- **Dashboard**:
- Timeline: Query regression events (query_hash, variance%, timestamp)
- Heatmap: Query variance distribution (rows: query_type, columns: time, color: variance%)
- [ ] **Index Effectiveness Dashboard** (`monitoring/grafana/index-effectiveness-dashboard.json`):
- **Panel 1: Index Utility Score Distribution**
- Type: Bar chart (horizontal)
- Query: `SELECT index_name, utility_score FROM AUDIT_INDEX_ANALYSIS WHERE analyzed_at = latest`
- Top 20 indexes sorted by utility (descending)
- Color: Green (>50%), Yellow (20-50%), Red (<20%)
- Link to recommendation details

- **Panel 2: Unused Indexes Candidate List**
- Type: Table
- Query: `SELECT table_name, index_name, utility_score, size_mb, access_count FROM AUDIT_INDEX_ANALYSIS WHERE utility_score < 10 AND accessed_at IS NOT NULL`
- Columns: Table, Index, Utility %, Size (MB), 30d Usage Count, Action (Remove?)
- Sortable by size (largest first, potential space savings)
- Drilldown: Show full index definition, recent queries using it, drop statement

- **Panel 3: Missing Index Recommendations**
- Type: Table
- Query: `SELECT table_name, missing_column, query_count_7d, estimated_improvement FROM index_gap_analysis`
- Columns: Table, Column(s), Query Count (7d), Est. Improvement (%), Recommendation
- Sortable by estimated improvement
- Drilldown: Show queries that would benefit, create index script

- **Panel 4: Index Size vs Usage Scatter**
- Type: Scatter plot
- X-axis: Index size (MB, log scale)
- Y-axis: Utility score (0-100%)
- Bubble size: Access count
- Color: Green (efficient), Yellow (monitor), Red (bloated, low utility)
- Identify candidates: Large size + low utility = space savings opportunity

- **Panel 5: Query Plan Regression Timeline**
- Type: Graph (time series)
- Y-axis: Variance % from baseline
- X-axis: Time
- Series: Top 5 regressed queries
- Threshold lines: +5% (warning), +20% (alert)
- Annotation: When changes detected, show recommendation

- **Panel 6: Statistics Summary**
- Type: Stat cards
- Total indexes monitored: COUNT(*)
- Indexes recommended for removal: COUNT(*) WHERE action = REMOVE
- Potential space savings: SUM(size_mb) for removal candidates
- Query regressions detected (7d): COUNT(*) of variance >5%
- Avg query plan stability: STDDEV(variance_percent)
- [ ] **Extended Statistics & Column Correlation Analysis**:
- **Purpose**: Identify correlated columns that might benefit from composite indexes
- **Implementation**:
- Enable Oracle extended statistics via `DBMS_STATS.CREATE_EXTENDED_STATS` for hot tables
- Sample existing query patterns: WHERE col1 = ? AND col2 = ?
- Measure correlation: `CORR(col1, col2)` on sample data
- If correlation >0.8 and query count high: recommend composite index
- Store analysis in AUDIT_COLUMN_CORRELATION table
- **Example**:
```sql
-- For audit_login_attempts table:
-- Detect correlation between (ip_address, username) if frequently queried together
EXEC DBMS_STATS.CREATE_EXTENDED_STATS(
ownname => 'APP_USER',
tabname => 'AUDIT_LOGIN_ATTEMPTS',
extension => '(ip_address,username)'
);
```
- **Success Criteria**:
- ✅ Index utility scores calculated every 24h; unused indexes <10% utility catalogued
- ✅ Missing indexes identified (high query count on unindexed columns); recommendations with confidence levels
- ✅ Query plan regressions detected within 1 hour of occurrence; variance >5% triggers alert
- ✅ SlowQueryAnalyzer component intercepts >1s queries; execution plans captured and analyzed
- ✅ Grafana dashboard displays top unused indexes, missing index suggestions, plan regression timeline
- ✅ AUDIT_INDEX_ANALYSIS and AUDIT_QUERY_RECOMMENDATIONS tables populated; admin can review and approve recommendations
- ✅ At least 1 actionable recommendation surfaced per week (remove/add/rewrite)
- ✅ After implementing recommendations: query performance improves 10-30% on affected queries
- **Implementation Priority**:
- **Phase 1 (Week 1-2)**: SlowQueryAnalyzer + index utility analysis + AUDIT tables
- **Phase 2 (Week 3)**: Grafana dashboard + basic recommendations
- **Phase 3 (Week 4)**: Extended statistics + advanced correlation analysis + regression detection
- **Risk Mitigation**:
- **Risk**: False positives in recommendations (suggest index for rarely used query)
→ **Mitigation**: Confidence thresholds (HIGH >= 80%); require approval before implementation
- **Risk**: Regression detection noise from normal variance
→ **Mitigation**: Baseline period (100 samples before alerting); configurable threshold (default 5%)
- **Risk**: Index removal breaks production query
→ **Mitigation**: Test in shadow traffic first; maintain changelog; rollback procedure documented
- **Risk**: Extended statistics overhead on large tables
→ **Mitigation**: Run off-peak; sample 1M rows instead of full table; schedule monthly instead of daily

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language identifiers to fenced code blocks for proper rendering.

Several fenced code blocks lack language identifiers, which prevents proper syntax highlighting and rendering. Based on the content, these appear to be pseudo-code, SQL, or YAML configurations.

Apply language identifiers to improve documentation readability:

  • Line 178: ASCII diagram → use text
  • Line 244: Algorithm/pseudo-code → use text or java
  • Line 290: Algorithm/pseudo-code → use text or java
  • Line 412: Algorithm/pseudo-code → use text or java

Example fix for line 178:

-    ```
+    ```text
     ┌──────────────────────────────────────────────────────────────┐
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

178-178: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


244-244: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


290-290: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


412-412: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In ROADMAP_OBSERVABILITY.md around lines 178-525, several fenced code blocks
lack language identifiers which prevents proper syntax highlighting; update the
backticks to include appropriate language tags: change the ASCII art block at
~line 178 to ```text, mark the algorithm/pseudo-code blocks (around ~244, ~290,
~412) with ```text or ```java as appropriate, mark the SQL DDL blocks with
```sql, and mark the application.yml/config block with ```yaml so each fenced
block has a correct language identifier for rendering.

Comment thread src/main/resources/logback-spring.xml
Comment thread src/main/resources/logback-spring.xml
…oard docs

- Updated roadmap checkboxes for Log Correlation, ILM, and Kibana Dashboards to completed ([x])
- Added JAEGER_BASE_URL env var to docker-compose for Elasticsearch Jaeger integration
- Created kibana/dashboards/README.md with setup prerequisites, field mappings, and import instructions for log analysis dashboards
- Ensures centralized logging infrastructure with proper correlation and lifecycle management for better observability and troubleshooting
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (4)
ROADMAP_OBSERVABILITY.md (1)

178-178: Add language identifiers to fenced code blocks (duplicate linting issue).

This is a recurrence of the same MD040 linting violation previously flagged. While the underlying issue is identical to the past review, these are NEW code blocks being added in this commit and must be fixed. Apply language identifiers to all four fenced code blocks:

  • Line 178 (ASCII architecture diagram): Change \``to```text`
  • Line 244 (pseudo-code algorithm): Change \``to```text`
  • Line 290 (pseudo-code algorithm): Change \``to```text`
  • Line 412 (pseudo-code algorithm): Change \``to```text`

Example fix for line 178:

-    ```
+    ```text
     ┌──────────────────────────────────────────────────────────────┐

Also applies to: 244-244, 290-290, 412-412

docker-compose.yml (2)

227-227: Volume mount path may not match nginx expectations.

Previous review flagged that nginx is configured to read certificates from /etc/nginx/certs but the volume mounts to /etc/certs. This path mismatch would cause nginx to fail on startup when attempting to load SSL certificates.

Verify the nginx certificate paths in the nginx configuration:

#!/bin/bash
# Check nginx.conf for certificate path references
echo "=== Nginx certificate paths ==="
rg -n "ssl_certificate|ssl_certificate_key" nginx.conf docker-start.sh Dockerfile

# Check if /etc/nginx/certs is used
echo ""
echo "=== References to /etc/nginx/certs ==="
rg -n "/etc/nginx/certs" nginx.conf docker-start.sh Dockerfile

175-175: CRITICAL: Certificate volume mount exposes committed private keys.

As flagged in previous reviews, this volume mount (./logstash/certs:/usr/share/logstash/certs) relies on private keys committed to the repository. This is a critical security vulnerability.

logstash/config/pipelines.conf (1)

7-10: CRITICAL: Private keys committed to repository.

As flagged in previous reviews, the certificate files referenced here are committed to the repository, which is a critical security vulnerability. Private keys must never be stored in version control.

🧹 Nitpick comments (12)
logstash/config/pipelines.conf (2)

33-87: Recommend explicit column selection in JDBC queries.

All four JDBC inputs use SELECT * which can cause issues:

  • Performance overhead fetching unused columns
  • Brittleness when table schemas change (new columns may break mappings)
  • Unnecessary data transfer from Oracle to Logstash

Apply this pattern to all four JDBC inputs (login_attempts, sessions, dynamic_crud, error_log):

-    statement => "SELECT * FROM AUDIT_LOGIN_ATTEMPTS WHERE created_at > :sql_last_value"
+    statement => "SELECT audit_id, created_at, user_id, username, ip_address, action, success, failure_reason, trace_id FROM AUDIT_LOGIN_ATTEMPTS WHERE created_at > :sql_last_value"

Adjust column lists based on actual table schemas and required fields for each audit table.


34-88: Consider less aggressive polling schedule for audit tables.

The cron schedule * * * * * polls Oracle every minute for all four audit tables. This may be excessive depending on audit log volume and could impact database performance.

Consider adjusting based on requirements:

  • High-priority audit events (login attempts, sessions): Keep at 1 minute
  • Lower-priority events (CRUD, errors): Reduce to */5 * * * * (every 5 minutes)
   # JDBC Input for AUDIT tables - Dynamic CRUD
   jdbc {
     # ... connection config ...
-    schedule => "* * * * *"
+    schedule => "*/5 * * * *"  # Poll every 5 minutes
     # ... rest of config ...
   }
docker-compose.yml (2)

157-181: Add healthcheck for Logstash service.

The Logstash service is missing a healthcheck, which means Docker Compose cannot verify if Logstash is ready to accept connections. This could lead to race conditions where the app tries to send logs before Logstash is ready.

Add a healthcheck to the Logstash service:

   logstash:
     build:
       context: .
       dockerfile: logstash/Dockerfile
     container_name: logstash
     depends_on:
       elasticsearch:
         condition: service_healthy
       oracle:
         condition: service_healthy
     environment:
       XPACK_MONITORING_ENABLED: "false"
       ORACLE_USER: ${ORACLE_USER:-APP_USER}
       ORACLE_PASSWORD: ${ORACLE_PASSWORD:-ssfpassword}
       JAEGER_BASE_URL: ${JAEGER_BASE_URL:-http://jaeger:16686}
     volumes:
       - ./logstash/config/pipelines.conf:/usr/share/logstash/pipeline/logstash.conf
       - ./logstash/certs:/usr/share/logstash/certs
     ports:
       - "5001:5001"
       - "5044:5044"
     networks:
       - oracle-network
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:9600/_node/stats"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 60s

85-85: Consider simplifying Elasticsearch healthcheck.

The current regex grep -q '\"status\":\\s*\"\\(green\\|yellow\\)\"' is complex and fragile. A simpler approach using grep -E would be more maintainable.

-      test: ["CMD-SHELL", "curl -s http://localhost:9200/_cluster/health | grep -q '\"status\":\\s*\"\\(green\\|yellow\\)\"'"]
+      test: ["CMD-SHELL", "curl -s http://localhost:9200/_cluster/health | grep -qE '\"status\":\"(green|yellow)\"'"]
monitoring/elasticsearch/README.md (1)

171-176: Add language specifiers to code blocks.

Two code blocks are missing language specifiers which reduces syntax highlighting and IDE support.

-```
+```plaintext
 ┌─────────────────┐     ┌────────────────┐     ┌──────────────────┐
 │  Spring Boot    │────▶│   Logstash     │────▶│  Elasticsearch   │
 │  (TCP 5001)     │     │  (pipelines)   │     │  (logs-app-*)    │
 └─────────────────┘     └────────────────┘     └──────────────────┘
 ...
-```
+```

And:

```diff
-```
+```text
 {JAEGER_BASE_URL}/trace/{trace_id}
-```
+```
monitoring/elasticsearch/setup-elasticsearch.sh (2)

42-42: Shellcheck warning: declare and assign separately.

Shellcheck flags SC2155 because combining local declaration with command substitution can mask the command's return value, potentially hiding errors.

-  local body=$(echo "$response" | sed '$d')
+  local body
+  body=$(echo "$response" | sed '$d')

However, given that echo and sed rarely fail, and the script has other error handling via HTTP status codes, this is a low-priority refinement.


118-127: Consider validating JSON files before applying.

The script applies JSON files without validating their syntax first. Invalid JSON will cause curl to fail, but pre-validation would provide clearer error messages.

 if [ -f "$ILM_DIR/ssf-logs-policy.json" ]; then
+  # Validate JSON syntax
+  if ! jq empty "$ILM_DIR/ssf-logs-policy.json" 2>/dev/null; then
+    echo "ERROR: Invalid JSON in $ILM_DIR/ssf-logs-policy.json"
+    exit 1
+  fi
+  
   curl -X PUT "$ELASTICSEARCH_URL/_ilm/policy/ssf-logs-policy" \
     -H "Content-Type: application/json" \
     -d @"$ILM_DIR/ssf-logs-policy.json"

Apply the same pattern to template files (lines 139-163).

monitoring/elasticsearch/templates/logs-app-template.json (3)

4-10: Document field limit and dynamic template interaction.

The combination of a 2,000-field limit (line 9) with ignore_dynamic_beyond_limit: true (line 10) and a dynamic string-to-keyword template (lines 13–23) could silently drop fields if the template generates too many keyword variants. This creates a latent risk of undetected data loss.

Consider documenting the expected field cardinality or adjusting the limit if you anticipate high-cardinality string fields (e.g., many unique logger names or tags).


29-37: Inconsistent ignore_above values between similar text fields.

message.keyword uses ignore_above: 32766 (Lucene maximum), while exception_message.keyword uses ignore_above: 2048. This inconsistency means exception messages are truncated at 2,048 bytes whereas log messages are preserved up to the Lucene limit. Consider aligning these values or documenting the intentional difference.

         "exception_message": {
           "type": "text",
           "fields": {
             "keyword": {
               "type": "keyword",
-              "ignore_above": 2048
+              "ignore_above": 32766
             }
           }
         },

Also applies to: 74-82


29-37: Consider specifying an explicit analyzer for text fields.

The message, exception_message, and stacktrace fields are typed as text but do not specify an analyzer, so they will use Elasticsearch's default analyzer. For optimal search and analysis of log content, consider explicitly specifying an analyzer (e.g., standard, simple, or a custom analyzer tuned for your logs).

         "message": {
           "type": "text",
+          "analyzer": "standard",
           "fields": {
             "keyword": {
               "type": "keyword",
               "ignore_above": 32766
             }
           }
         },

Also applies to: 74-82, 83-85

src/main/java/com/rcs/ssf/http/filter/TraceIdFilter.java (2)

133-147: Consider propagating trace/span IDs into TraceContextHolder as well

You already extract traceId/spanId from the current span and set requestId/userId into TraceContextHolder, but TraceContextHolder in src/main/java/com/rcs/ssf/http/filter/TraceContextHolder.java also exposes setTraceId/setSpanId. For service-layer code that relies on TraceContextHolder instead of MDC, it may be useful to populate those too.

Example adjustment:

-        Span currentSpan = Span.current();
-        
+        Span currentSpan = Span.current();
+
         // Extract trace_id from OpenTelemetry span context
         String traceId = null;
         String spanId = null;
         if (currentSpan != null && currentSpan.getSpanContext().isValid()) {
             traceId = currentSpan.getSpanContext().getTraceId();
             spanId = currentSpan.getSpanContext().getSpanId();
         }
         
         try {
             // Store in ThreadLocal for propagation to services
             TraceContextHolder.setRequestId(requestId);
             TraceContextHolder.setUserId(userId);
+            if (traceId != null) {
+                TraceContextHolder.setTraceId(traceId);
+            }
+            if (spanId != null) {
+                TraceContextHolder.setSpanId(spanId);
+            }

This keeps logs and TraceContextHolder in sync for downstream consumers.


147-162: MDC population looks good; optionally avoid repeated hashing of userId

The MDC fields and PII handling are consistent with includePii, and the explicit cleanup in finally matches these keys. To reduce duplicate work and centralize the decision, you could compute the display form of userId once and reuse it for MDC and logging:

-            MDC.put("request_id", requestId);
-            if (traceId != null) {
-                MDC.put("trace_id", traceId);
-            }
-            if (spanId != null) {
-                MDC.put("span_id", spanId);
-            }
-            if (userId != null && !userId.isEmpty()) {
-                // Respect includePii setting for MDC logs
-                MDC.put("user_id", includePii ? userId : hashPii(userId));
-            } else {
-                MDC.put("user_id", "anonymous");
-            }
+            MDC.put("request_id", requestId);
+            if (traceId != null) {
+                MDC.put("trace_id", traceId);
+            }
+            if (spanId != null) {
+                MDC.put("span_id", spanId);
+            }
+            String mdcUserId;
+            if (userId != null && !userId.isEmpty()) {
+                mdcUserId = includePii ? userId : hashPii(userId);
+            } else {
+                mdcUserId = "anonymous";
+            }
+            MDC.put("user_id", mdcUserId);

Not required for correctness, but it simplifies the flow and ensures a single decision point for the user identifier.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99d6ea3 and d33ef62.

📒 Files selected for processing (12)
  • ROADMAP_OBSERVABILITY.md (1 hunks)
  • docker-compose.yml (3 hunks)
  • kibana/dashboards/README.md (1 hunks)
  • kibana/dashboards/log-analysis.ndjson (1 hunks)
  • logstash/config/pipelines.conf (1 hunks)
  • monitoring/elasticsearch/README.md (1 hunks)
  • monitoring/elasticsearch/ilm/ssf-logs-policy.json (1 hunks)
  • monitoring/elasticsearch/setup-elasticsearch.sh (1 hunks)
  • monitoring/elasticsearch/templates/logs-app-template.json (1 hunks)
  • monitoring/elasticsearch/templates/logs-audit-template.json (1 hunks)
  • monitoring/elasticsearch/templates/logs-nginx-template.json (1 hunks)
  • src/main/java/com/rcs/ssf/http/filter/TraceIdFilter.java (4 hunks)
✅ Files skipped from review due to trivial changes (1)
  • kibana/dashboards/log-analysis.ndjson
🧰 Additional context used
🧬 Code graph analysis (1)
src/main/java/com/rcs/ssf/http/filter/TraceIdFilter.java (1)
src/main/java/com/rcs/ssf/http/filter/TraceContextHolder.java (1)
  • TraceContextHolder (23-221)
🪛 LanguageTool
kibana/dashboards/README.md

[uncategorized] ~48-~48: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...ion message | | stacktrace | text | Full stack trace | ### Index Template Snippet Ap...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

monitoring/elasticsearch/README.md

[style] ~98-~98: To elevate your writing, try using a synonym like ‘required’ here. Or, to avoid using the passive voice, try replacing the past participle ‘needed’ with an adjective.
Context: ...isting Data If shard count changes are needed for existing indices: ```bash # 1. Cre...

(IS_NEEDED_NECESSARY)

🪛 markdownlint-cli2 (0.18.1)
ROADMAP_OBSERVABILITY.md

178-178: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


244-244: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


290-290: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


412-412: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

monitoring/elasticsearch/README.md

171-171: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


176-176: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Shellcheck (0.11.0)
monitoring/elasticsearch/setup-elasticsearch.sh

[warning] 42-42: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (10)
kibana/dashboards/README.md (1)

1-368: Excellent comprehensive documentation for Kibana dashboard setup.

This documentation thoroughly covers all aspects of the dashboard setup including prerequisites, field mappings, trace correlation with Jaeger, sample log documents, and troubleshooting. The inclusion of concrete examples (JSON snippets, curl commands, code references) makes it actionable for users.

monitoring/elasticsearch/templates/logs-nginx-template.json (1)

1-78: LGTM: Well-configured nginx log template.

The field type mappings are appropriate for nginx access/error logs:

  • client_ip as ip type enables CIDR queries
  • status as integer for HTTP status codes
  • bytes as long handles large response sizes
  • Timing fields (response_time, upstream_response_time, request_time) as float are standard for metrics
  • user_agent with dual text/keyword mapping enables both full-text search and aggregations

The template aligns well with the other log templates in this PR and uses the same ILM policy for consistent lifecycle management.

monitoring/elasticsearch/templates/logs-audit-template.json (1)

1-118: LGTM: Well-structured audit log template.

The template configuration is well-suited for audit logs with appropriate design choices:

  • 3 shards provide higher query parallelism for compliance queries (vs. 2 for app/nginx logs)
  • Dynamic template defaulting strings to keywords is appropriate for structured audit data
  • Field type selections are correct: ip for IP addresses, boolean for success/failure, text with keyword subfields for error messages enabling both search and aggregations
  • Higher ignore_above limit (2048) for error messages vs. user agents (512) is reasonable given error message verbosity

Aligns well with the ILM policy and other templates in this observability stack.

monitoring/elasticsearch/README.md (1)

1-241: Excellent comprehensive Elasticsearch setup documentation.

This documentation provides thorough coverage of the observability stack setup including ILM policy details, index templates, shard sizing rationale with concrete numbers, trace correlation with Jaeger, and comprehensive troubleshooting guidance. The inclusion of the log flow architecture diagram and concrete sizing guidelines makes this highly actionable.

monitoring/elasticsearch/ilm/ssf-logs-policy.json (1)

1-53: LGTM: Well-designed ILM policy following Elasticsearch best practices.

The four-phase lifecycle is appropriately configured:

  • Hot phase: 30-day/50GB rollover with high priority (100) ensures timely indexing
  • Warm phase: Shrink to 1 shard and forcemerge optimize for query performance while reducing cluster overhead
  • Cold phase: 0 replicas and low priority (0) minimize storage costs for rarely-accessed logs
  • Delete phase: 365-day retention aligns with typical compliance requirements

The policy correctly references MinIO for long-term archival beyond 365 days, establishing a clear offloading strategy.

monitoring/elasticsearch/setup-elasticsearch.sh (1)

1-219: Well-structured setup script with good error handling.

The script demonstrates several good practices:

  • Idempotent resource creation with proper error handling
  • Elasticsearch readiness polling with configurable retries
  • Clear status messaging and comprehensive verification
  • Graceful handling of existing resources (resource_already_exists_exception)

The script correctly provisions the ILM policy, index templates, and initial indices with proper aliases for the write indices.

src/main/java/com/rcs/ssf/http/filter/TraceIdFilter.java (4)

9-16: Imports for MDC and UTF‑8 hashing are appropriate

MDC and StandardCharsets imports align with the new logging and hashing logic and are correctly scoped.


166-168: Exposing X-Trace-ID header is a solid addition

Conditionally setting X-Trace-ID when a valid trace is present is a good way to let clients correlate logs/traces without leaking internals when tracing is disabled.


205-214: Status code attribution and MDC cleanup in finally are correct

Recording http.status_code in the span within finally ensures it’s captured even on errors, and the explicit MDC.remove(...) calls for each key you set (request_id, trace_id, span_id, user_id) correctly prevent cross-request leakage on reused threads.


228-237: UTF‑8 explicit encoding in hashPii is a good portability fix

Using combined.getBytes(StandardCharsets.UTF_8) makes the SHA‑256 hashing deterministic across environments and avoids platform-default charset pitfalls; the rest of the method’s behavior and error handling remain intact.

Comment thread logstash/config/pipelines.conf
Comment thread logstash/config/pipelines.conf
Comment thread monitoring/elasticsearch/templates/logs-app-template.json
Comment thread monitoring/elasticsearch/templates/logs-app-template.json
@zlovtnik zlovtnik merged commit 2351272 into main Nov 25, 2025
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant