Skip to content

feat/ add log forwarding in HadesLogManager and ContinueOnError step support#360

Open
paoxin wants to merge 30 commits intomainfrom
feat/hades-artemis-adapter
Open

feat/ add log forwarding in HadesLogManager and ContinueOnError step support#360
paoxin wants to merge 30 commits intomainfrom
feat/hades-artemis-adapter

Conversation

@paoxin
Copy link
Copy Markdown
Contributor

@paoxin paoxin commented Jan 26, 2026

✨ What is the change?

  • Job UUID is now shared across containers and steps in docker mode.
  • HadesLogManager now publishes the aggregated logs to the configured URL . (to the adapter)
  • add new ContinueOnError field to HadesAPI Step struct: to tell the Scheduler whether to continue the next execution step if the current step fails.
  • add Dockerfiles to HadesLogManager and updated docker compose files as well.
  • add new bruno request to test build trigger including result parse step (for Artemis)

these changes are made in accordance to the adapter: https://github.com/ls1intum/hades-artemis-adapter

📌 Reason for the change / Link to issue

Closes #269 #261 #270

🧪 Steps for Testing

  1. Start Hades
  2. Start Adapter from here https://github.com/ls1intum/hades-artemis-adapter
  3. Start Artemis (with hades profile): in this branch
  4. trigger a build via pushing to the repository or clicking the rebuild button on Artemis

✅ PR Checklist

  • I have tested these changes locally or on the dev environment
  • Code is clean, readable, and documented
  • Tests added or updated (if needed)
  • Documentation updated (if relevant)

Summary by CodeRabbit

  • New Features

    • Added HadesLogManager service with HTTP endpoints for retrieving job logs and status information.
    • Introduced step-level error handling with configurable continue-on-error capability for build workflows.
  • Documentation

    • Updated configuration documentation for environment variables and setup instructions.
  • Chores

    • Updated dependencies and container infrastructure configurations.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 26, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Introduces HadesLogManager service for monitoring job logs and syncing them to an external API endpoint. Adds per-job watcher lifecycle synchronization via WaitGroup, HTTP endpoints for log/status retrieval, step-level error handling with optional continuation on failure, and deploys the service across Docker Compose environments with necessary configuration and CI/CD integration.

Changes

Cohort / File(s) Summary
HadesLogManager Core Service
HadesLogManager/log_subscriber.go, HadesLogManager/main.go, HadesLogManager/processor.go
Implements per-job watcher synchronization, HTTP API routes for job logs/status, and SendJobLogs/GetJobStatus methods to marshal logs to JSON and POST via HTTP with 10s timeout to configured API endpoint.
HadesLogManager Containerization
HadesLogManager/Dockerfile, HadesLogManager/go.mod
Multi-stage Docker build using golang:1.25-alpine builder, module path updated to github.com/ls1intum/hades/hadesLogManager, dependency versions bumped (nats.go v1.48→v1.49, env v11.3.1→v11.4.0).
HadesScheduler Error Handling
HadesScheduler/docker/job.go, HadesScheduler/docker/step.go
Adds per-step UUID injection and step-level error accumulation via errors.Join; honors ContinueOnError flag to continue workflow on failure. Refactors container cleanup to defer pattern.
Job Workflow Configuration
shared/payload/payload.go
Introduces ContinueOnError bool field to Step struct (JSON: continue_on_error) enabling step-level failure recovery policies.
Service Deployment
compose.yml, docker-compose.deploy.yml, docker-compose.dev.yml, docker-compose.k8s.yml
Adds hadesLogManager service to all environments with appropriate port mappings, environment variables (HADESLOGMANAGER_API_PORT, API_ENDPOINT), network attachment, and Traefik routing labels for production.
Configuration & Documentation
.env.example, Readme.md
Renames API_PORT to HADESAPI_API_PORT, adds HadesLogManager Options block with HADESLOGMANAGER_API_PORT and API_ENDPOINT; updates documentation reference.
API Test Specifications
docs/api/Create Build Job (Test Succeed) with Result Parse Step.bru, docs/api/Create Build Job (Test Fail) with Result Parse Step.bru
Documents multi-step build job workflows including Clone, Execute (with CONTINUE_ON_ERROR support), and Parse Results steps with shared volume mounts and metadata configuration.
CI/CD Pipeline
.github/workflows/ci.yml, HadesScheduler/HadesOperator/go.mod
Adds build-log-manager job to CI using shared build-and-push-docker-image v1.2.0; updates HadesOperator dependencies (nats.go, env, crypto, sys, etc.).

Sequence Diagram(s)

sequenceDiagram
    participant Scheduler as HadesScheduler
    participant LogMgr as HadesLogManager
    participant NATS as NATS Broker
    participant API as Artemis API

    Scheduler->>NATS: Publish job.running event
    NATS->>LogMgr: Receive job.running
    LogMgr->>LogMgr: Create watcher with WaitGroup
    LogMgr->>NATS: Subscribe to job logs
    
    loop Streaming Logs
        NATS->>LogMgr: Publish log entries
        LogMgr->>LogMgr: Accumulate logs in memory
    end
    
    Scheduler->>NATS: Publish job.completed event
    NATS->>LogMgr: Receive job.completed
    LogMgr->>LogMgr: Wait for watcher completion
    LogMgr->>LogMgr: Marshal logs to JSON
    LogMgr->>API: POST /adapter/logs (10s timeout)
    API->>LogMgr: 200 OK
    LogMgr->>LogMgr: Mark job completed
    LogMgr->>LogMgr: Cleanup watcher entry
Loading
sequenceDiagram
    participant Job as Job.execute()
    participant Step as Step Processing
    participant StepExec as Container Execution
    
    Job->>Job: Initialize stepErr = nil
    
    loop For each step
        Job->>Step: Get step with UUID
        Job->>StepExec: execute(ctx, step)
        StepExec->>StepExec: defer cleanup container
        alt Step Success
            StepExec-->>Job: nil
        else Step Fails
            StepExec-->>Job: error
            alt ContinueOnError = true
                Job->>Job: errors.Join(stepErr, stepErr)
                Job->>Job: Log condition, continue
            else ContinueOnError = false
                Job->>Job: Return error immediately
            end
        end
    end
    
    Job->>Job: Return accumulated stepErr
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Logs hop through NATS to journeys new,
Where managers watch and endpoints true,
Steps can stumble, yet carry on—
Each job's tale reaches Artemis by dawn!
📜✨

🚥 Pre-merge checks | ✅ 1 | ❌ 4

❌ Failed checks (2 warnings, 2 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 63.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The PR description covers main changes, provides testing steps and issue links, but omits some critical template sections. Add 'Screenshots' section and ensure 'Tests added or updated' checkbox is addressed. Clarify testing requirements for different components.
Linked Issues check ❓ Inconclusive Only issue #269 details are available; full requirements for #261 and #270 are not provided, making comprehensive compliance validation impossible. Provide details or objectives from all linked issues (#269, #261, #270) to fully assess whether code changes meet stated requirements.
Out of Scope Changes check ❓ Inconclusive Without complete linked issue requirements, it is difficult to definitively assess whether all changes align with scope; go.mod dependency updates appear supportive but need clarification. Clarify whether dependency version bumps in go.mod files are intentional scope additions or side effects of other changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The PR title mentions 'add log forwarding in HadesLogManager and ContinueOnError step support', which accurately reflects the main changes across the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/hades-artemis-adapter
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@paoxin paoxin self-assigned this Jan 26, 2026
@paoxin paoxin changed the title HadesArtemisAdapter feat/ add HadesArtemisAdapter Jan 26, 2026
@paoxin paoxin marked this pull request as ready for review February 15, 2026 16:11
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
HadesLogManager/log_subscriber.go (1)

224-235: ⚠️ Potential issue | 🔴 Critical

Blocking HTTP call while holding mutex; error ignored.

SendJobLogs performs a synchronous HTTP POST (with no timeout on the client, as noted in processor.go). This call is made while dlm.mu is held, which blocks all concurrent startWatchingJobLogs and stopWatchingJobLogs calls for the duration of the network round-trip. If the adapter is down or slow, this effectively deadlocks the log manager.

Additionally, the returned error is silently discarded.

🐛 Proposed fix: move the send outside the lock and handle the error
 func (dlm *DynamicLogManager) stopWatchingJobLogs(jobID string) {
 	dlm.mu.Lock()
-	defer dlm.mu.Unlock()
-
+	var shouldSend bool
 	if watcher, exists := dlm.watchers[jobID]; exists {
 		delete(dlm.watchers, jobID)
 		slog.Info("Stopping log watch", "job_id", jobID)
 		watcher.cancel()
-
 		dlm.logAggregator.MarkJobCompleted(jobID)
-		dlm.logAggregator.SendJobLogs(jobID)
+		shouldSend = true
+	}
+	dlm.mu.Unlock()
+
+	if shouldSend {
+		if err := dlm.logAggregator.SendJobLogs(jobID); err != nil {
+			slog.Error("Failed to send job logs", "job_id", jobID, "error", err)
+		}
 	}
 }
🤖 Fix all issues with AI agents
In `@HadesArtemisAdapter/artemis_adapter.go`:
- Around line 93-122: checkAndSendIfReady can run concurrently from StoreLogs
and StoreResults and cause duplicate sends because aa.logs.Load and
aa.results.Load are not atomic with the subsequent deletes; protect the entire
check-and-send-and-delete sequence with mutual exclusion. Add a per-job mutex
map (or a single mutex) on ArtemisAdapter (e.g., a field like jobLocks
map[string]*sync.Mutex or a sync.Mutex lock) and acquire the appropriate lock at
the start of checkAndSendIfReady (or have StoreLogs/StoreResults lock per job
before calling it) so the load, conditional, sendToArtemis(results) and
aa.logs.Delete/aa.results.Delete execute under the same lock, then release the
lock after cleanup to ensure exactly-once delivery for a jobID.
- Around line 125-149: The sendToArtemis function currently logs errors but
continues, leading to nil-pointer dereferences and always returning nil; update
sendToArtemis (in artemis_adapter.go) to check and return errors immediately
after json.Marshal, http.NewRequest, and aa.httpClient.Do (wrap each error with
context, e.g., fmt.Errorf("marshal request: %w", err)), only call defer
resp.Body.Close() after confirming resp != nil, and return the error to callers
instead of nil so checkAndSendIfReady can detect failures; ensure success path
still returns nil.
- Around line 65-68: Replace the hardcoded Artemis constants with configuration
loaded from environment/config to avoid committing secrets and localhost URLs:
remove the literal artemisAuthToken and artemisBaseURL values and read them from
env/config (e.g., ARTEMIS_BASE_URL, ARTEMIS_AUTH_TOKEN) into the adapter's
config struct (same pattern as other files using `env` tags), keep
newResultEndpoint as a constant path if desired but compose full URL from the
configured base, and move requestTimeout into the config (or use a sensible
default) so no secrets or deploy-specific URLs remain in
artemisBaseURL/artemisAuthToken in artemis_adapter.go.
- Around line 78-84: StoreLogs currently assumes logs[1] exists and does
execution_logs := logs[1].Logs which panics if logs has fewer than 2 entries;
update StoreLogs to check slice length and nil/empty elements before accessing
index 1, return a clear error if execution logs are missing (or skip storing and
return nil depending on caller expectations), and rename execution_logs to
executionLogs to follow Go naming conventions; ensure you still call
aa.logs.Store(jobID, executionLogs) and return aa.checkAndSendIfReady(jobID)
only after a successful bounds-checked extraction.

In `@HadesArtemisAdapter/go.mod`:
- Line 36: The project currently pins the indirect dependency
"github.com/quic-go/quic-go v0.54.0" which has HIGH-severity CVEs; update the
dependency to v0.57.0 or later by replacing the version entry for
github.com/quic-go/quic-go in go.mod and then run go get
github.com/quic-go/quic-go@v0.57.0 (or a newer stable release) followed by go
mod tidy to refresh the module graph and vendor files; verify no other
dependency requires the older version and run your test suite to confirm
compatibility.
- Around line 1-3: Update the module declaration in go.mod to match the
repository naming convention by replacing the current module name
`Hades-ArtemisAdapter` with `github.com/ls1intum/hades/artemis-adapter`; ensure
imports across the module (packages referencing this module) are updated to the
new path if needed and keep the existing Go version (go 1.25.3) unchanged.

In `@HadesArtemisAdapter/main.go`:
- Around line 138-148: The POST /test-results handler currently calls
aa.StoreResults(newResults.UUID, newResults) and ignores its returned error, so
failures (including from sendToArtemis) still return 201; update the handler to
capture the error from StoreResults, log it (slog.Error) with context (uuid and
error), and respond with an appropriate HTTP status (e.g., 500) and error
message when non-nil; keep successful behavior unchanged (log debug and return
201 with the payload).
- Around line 100-102: The 500ms shutdown timeout in the context.WithTimeout
call (where shutdownCtx, shutdownCancel are created) is too short and will kill
in-flight requests like sendToArtemis; change the timeout to a more reasonable
value (e.g., 15–30 seconds) by replacing 500*time.Millisecond with a longer
duration (suggest 30*time.Second) or wire a configurable ShutdownTimeout
constant/flag so main.go uses the same 30s policy as HadesLogManager when
calling context.WithTimeout to create shutdownCtx and shutdownCancel.
- Around line 125-135: The handler reading newLogs via c.BindJSON can panic
because it accesses newLogs[0].JobID without checking the slice length and it
ignores the error returned by aa.StoreLogs; update the POST "/logs" handler to
validate that newLogs is non-empty after BindJSON (return a 400/422 with a
logged error if empty), use newLogs[0].JobID only after that check, call
aa.StoreLogs(...) and handle its returned error (log it and return a
500/appropriate status on failure), and ensure the response (c.IndentedJSON) is
only sent after successful storage; refer to the variables/funcs newLogs,
c.BindJSON, aa.StoreLogs, and c.IndentedJSON to locate the change.

In `@HadesLogManager/processor.go`:
- Line 17: The constant APIendpoint in processor.go is hardcoded to
"http://localhost:8082/adapter/logs" which breaks non-local deployments; change
it to be configurable via an environment-driven config (e.g., add a field to
AggregatorConfig such as LogsAPIEndpoint or create a dedicated config struct)
and replace usages of APIendpoint with that config field, initializing the value
from an environment variable (with a sensible default) during program startup.
- Around line 199-233: In SendJobLogs, stop continuing after failures: after
json.Marshal(logs) return the error (with a logged message) if err != nil; after
http.NewRequest("POST", APIendpoint, ...) return the error if err != nil instead
of proceeding to use req; after client.Do(req) check resp and err and if err !=
nil return it (after logging) and only then use resp; move defer
resp.Body.Close() immediately after the nil-check that confirms resp != nil; and
ensure SendJobLogs returns non-nil errors (propagate or wrap them) instead of
always returning nil so callers can handle failures.
🧹 Nitpick comments (4)
HadesScheduler/docker/job.go (1)

31-31: Inconsistent naming: env var "UUID" vs context key "job_id".

The environment variable is called "UUID" (line 31) while the context key is "job_id" (line 42). Consider aligning these — e.g., use "JOB_ID" (or "HADES_JOB_ID") for the env var to reduce ambiguity and avoid collisions with the very generic name "UUID".

Suggested diff
-		envs["UUID"] = d.ID.String()
+		envs["JOB_ID"] = d.ID.String()

Also applies to: 42-42

docs/api/Create Build Job (Test Succeed) with Result Parse Step.bru (1)

75-78: host.docker.internal may not resolve on Linux Docker hosts.

This hostname works out-of-the-box on Docker Desktop (macOS/Windows) but requires --add-host=host.docker.internal:host-gateway on Linux. Consider adding a comment or using a Bruno variable (e.g., {{adapter_host}}) so the endpoint is easily configurable across environments.

♻️ Suggested change
         "metadata": {
-          "API_ENDPOINT": "http://host.docker.internal:8082/adapter/test-results",
+          "API_ENDPOINT": "http://{{adapter_host}}/adapter/test-results",
           "INGEST_DIR": "/shared/example/build/test-results/test"
         }

Then add adapter_host to the vars:pre-request section:

 vars:pre-request {
   user: 
   password: 
   test_repo: https://github.com/Mtze/Artemis-Java-Test.git
   assignment_repo: https://github.com/Mtze/Artemis-Java-Solution.git
+  adapter_host: host.docker.internal:8082
 }
HadesLogManager/processor.go (1)

219-219: http.Client created per call with no timeout.

A new http.Client{} is instantiated on every SendJobLogs call with zero timeout (waits forever). Reuse a client stored on the struct and set an explicit timeout.

HadesArtemisAdapter/artemis_adapter.go (1)

70-76: ctx parameter is unused in NewAdapter.

The context.Context parameter is accepted but never used. Either remove it or wire it into the HTTP client (e.g., for cancellation support on in-flight requests).

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In @.env.example:
- Line 28: The environment variable name in .env.example (ARTEMIS_ADAPTER_PORT)
doesn't match the struct tag read by AdapterConfig in
HadesArtemisAdapter/main.go (env:"API_PORT"); update .env.example to rename
ARTEMIS_ADAPTER_PORT=8082 to API_PORT=8082 so AdapterConfig picks it up, or
alternatively change the AdapterConfig struct tag from env:"API_PORT" to
env:"ARTEMIS_ADAPTER_PORT" to keep the current .env.example—choose one
consistent fix and apply it to ensure the env name and AdapterConfig match.

In `@HadesArtemisAdapter/artemis_adapter.go`:
- Around line 78-90: StoreLogs currently logs an Error but proceeds to store an
empty executionLogs and calls checkAndSendIfReady, causing results to be sent
without logs; update StoreLogs to not proceed when execution logs are missing:
inside StoreLogs (function StoreLogs) when len(logs) < 2, either (preferred)
return a descriptive error (e.g., fmt.Errorf("missing execution logs for job
%s", jobID)) and do not call aa.logs.Store or checkAndSendIfReady, keeping or
strengthening the Error log, or if degraded behavior is intentional change the
slog.Error to slog.Warn and return nil as before; ensure the
aa.logs.Store(jobID, ...) and call to checkAndSendIfReady(jobID) only happen in
the successful branch where executionLogs is populated.

In `@HadesArtemisAdapter/main.go`:
- Around line 131-136: The error message in the /logs handler is a copy-paste:
when binding into newLogs (type []buildlogs.Log) the slog.Error call uses
"Failed to bind test results JSON"; update that log to accurately reflect this
handler (e.g., "Failed to bind logs JSON") so the message matches the endpoint
and variable names (jobs.POST("/logs" handler, newLogs, c.BindJSON, slog.Error).
- Around line 34-35: LoadConfig's returned error is being ignored which can
leave AdapterConfig (cfg) zero-valued; change the call to capture and handle the
error from utils.LoadConfig(&cfg) in main: call err := utils.LoadConfig(&cfg)
and if err != nil log the error (including err) and exit (or return) so we don't
continue with empty fields like ArtemisBaseURL or ArtemisAuthToken; reference
symbols: AdapterConfig, utils.LoadConfig, cfg, ArtemisBaseURL, ArtemisAuthToken.

In `@HadesLogManager/processor.go`:
- Around line 202-227: SendJobLogs currently treats any HTTP response as success
because it never checks resp.StatusCode; update SendJobLogs to inspect
resp.StatusCode after client.Do(req) and treat non-2xx codes as errors by
reading resp.Body (to include adapter error details) and returning a formatted
error (e.g., include status code and body). Keep the existing defer
resp.Body.Close(), but add a small read of resp.Body (use ioutil.ReadAll or
io.ReadAll) for the error case and return fmt.Errorf with both resp.StatusCode
and the response body string; only return nil when resp.StatusCode is in the
200–299 range.
🧹 Nitpick comments (4)
.env.example (1)

34-34: Double quotes inside the value will be treated as literal characters by some dotenv loaders.

The godotenv library (used via shared/utils/config.go) strips surrounding quotes, so this specific case should be fine. However, the static analysis hint flags this as potentially confusing. For consistency with other unquoted entries (like line 29–31), consider removing the quotes.

Proposed fix
-API_ENDPOINT="http://localhost:8082/adapter/logs"
+API_ENDPOINT=http://localhost:8082/adapter/logs
HadesLogManager/processor.go (2)

209-209: Logging the full JSON payload at Debug level can be expensive for large log sets.

string(jsonData) forces allocation of the entire serialized payload into a log line. Consider logging only the byte length instead, or gating behind a more verbose level.

Proposed fix
-	slog.Debug("Parsed logs to JSON", "json", string(jsonData))
+	slog.Debug("Marshaled logs to JSON", "job_id", jobID, "bytes", len(jsonData))

50-56: Minor: APIendpoint field should follow Go naming conventions → APIEndpoint.

Go convention treats acronyms as single words (API), and Endpoint is a separate word, so the idiomatic name is APIEndpoint.

HadesArtemisAdapter/artemis_adapter.go (1)

69-76: ctx parameter is accepted but unused.

NewAdapter accepts a context.Context but neither stores it nor passes it onward. Either use it (e.g., for request contexts in sendToArtemis via http.NewRequestWithContext) or remove it to avoid a misleading API.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @.env.example:
- Around line 27-28: The .env file defines API_PORT twice which will cause one
service to be overwritten; rename the second key to a unique name like
ARTEMIS_ADAPTER_PORT (or ARTEMIS_API_PORT) in .env.example and then update the
AdapterConfig struct tag in HadesArtemisAdapter/main.go to read that new env key
(e.g., `env:"ARTEMIS_ADAPTER_PORT"`), adjusting any references in
HadesArtemisAdapter code that read the old API_PORT env var to use the new
field/name.

In `@HadesArtemisAdapter/artemis_adapter.go`:
- Around line 124-159: The checkAndSendIfReady path leaves entries in aa.logs,
aa.results, and aa.jobLocks when only one side arrives; add TTL eviction similar
to LogManager.cleanupCompletedJobs: augment stored entries (or add parallel
maps) with a timestamp when first stored, add a configurable timeout to the
ArtemisAdapter config, and spawn a background goroutine (e.g., startCleanupLoop)
that periodically scans aa.logs/aa.results/aa.jobLocks under proper locking and
deletes entries older than the timeout; ensure deletions use the same job lock
semantics (delete jobLocks only after acquiring it or using a separate safe
removal method) and remove both log/result/lock entries together to avoid leaks.
🧹 Nitpick comments (3)
.env.example (1)

33-34: Same duplication issue: API_ENDPOINT is a generic name.

API_ENDPOINT is overly generic for a flat .env file and risks collisions as more services are added. Consider a scoped name like LOG_MANAGER_API_ENDPOINT (and update AggregatorConfig in HadesLogManager/processor.go accordingly).

HadesArtemisAdapter/artemis_adapter.go (2)

95-106: StoreLogs unconditionally proceeds with empty logs when len(logs) < 2.

When execution logs are missing (< 2 entries), the function still stores empty logs and triggers checkAndSendIfReady. This means results will be sent to Artemis with no build logs attached. If this degraded behavior is intentional, it's fine as-is (and the Warn level is appropriate). If missing execution logs should block sending, return an error instead.

Also, the hardcoded index logs[1] assumes a specific ordering convention for the log entries. Consider documenting this contract or using a more robust lookup (e.g., by container ID or step name).


19-25: Consider documenting the sync.Map value types more precisely.

The inline comments say -> results and -> []buildlogs.LogEntry but the actual stored types could benefit from being explicitly named (e.g., -> ResultDTO). Minor readability nit.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Fix all issues with AI agents
Before applying any fix, first verify the finding against the current code and
decide whether a code change is actually needed. If the finding is not valid or
no change is required, do not modify code for that item and briefly explain why
it was skipped.

In @.env.example:
- Around line 29-31: Add clear inline comments next to the ARTEMIS_* entries to
explain each variable's purpose and expected format: annotate ARTEMIS_BASE_URL
as the Artemis server root URL (include scheme and port),
ARTEMIS_NEW_RESULT_ENDPOINT as the relative API path used by the adapter to post
new programming exercise results, and ARTEMIS_AUTH_TOKEN as the bearer/token
used to authenticate requests (mark it sensitive and suggest using a secure
secret store). Keep the existing example values but append brief guidance on
format and security for each variable.
- Line 35: Add an inline comment next to the API_ENDPOINT entry to clarify its
purpose: indicate that API_ENDPOINT is the URL where the Log Manager (or
adapter) sends job logs/telemetry (e.g., "Log Manager/adapter log ingestion
endpoint for job logs"). Update the .env.example line containing API_ENDPOINT to
include that brief descriptive comment so users understand its role when
configuring environments.

In `@HadesLogManager/processor.go`:
- Line 240: Remove the leftover commented-out duplicate function signature for
GetJobLogs; delete the line "// func (la *NATSLogAggregator) GetJobLogs(jobID
string) []buildlogs.Log {" (the commented signature that duplicates the actual
GetJobLogs method) so only the real implementation remains in the
NATSLogAggregator type.
- Around line 68-78: NewLogAggregator currently doesn't validate
AggregatorConfig.APIendpoint causing runtime failures; change NewLogAggregator
to return (LogAggregator, error), check that config.APIendpoint (and any other
required fields) is non-empty at the start, and return a descriptive error if
validation fails; update callers to handle the (LogAggregator, error) return and
keep starting the cleanup goroutine only after successful validation and
creation of the NATSLogAggregator instance (referencing NewLogAggregator,
NATSLogAggregator, AggregatorConfig.APIendpoint, and SendJobLogs).
- Line 55: Field APIendpoint on the config struct is missing a default and
startup validation and should be renamed to APIEndpoint (Go acronym convention);
update the struct tag to include an envDefault for API_ENDPOINT or validate in
NewLogAggregator (check la.cfg.APIendpoint / la.cfg.APIEndpoint) and return an
error or log a clear warning if empty, and ensure SendJobLogs checks for a
non-empty APIEndpoint before attempting requests to avoid silent failures.
🧹 Nitpick comments (5)
🤖 Fix all nitpicks with AI agents
Before applying any fix, first verify the finding against the current code and
decide whether a code change is actually needed. If the finding is not valid or
no change is required, do not modify code for that item and briefly explain why
it was skipped.

In @.env.example:
- Around line 29-31: Add clear inline comments next to the ARTEMIS_* entries to
explain each variable's purpose and expected format: annotate ARTEMIS_BASE_URL
as the Artemis server root URL (include scheme and port),
ARTEMIS_NEW_RESULT_ENDPOINT as the relative API path used by the adapter to post
new programming exercise results, and ARTEMIS_AUTH_TOKEN as the bearer/token
used to authenticate requests (mark it sensitive and suggest using a secure
secret store). Keep the existing example values but append brief guidance on
format and security for each variable.
- Line 35: Add an inline comment next to the API_ENDPOINT entry to clarify its
purpose: indicate that API_ENDPOINT is the URL where the Log Manager (or
adapter) sends job logs/telemetry (e.g., "Log Manager/adapter log ingestion
endpoint for job logs"). Update the .env.example line containing API_ENDPOINT to
include that brief descriptive comment so users understand its role when
configuring environments.

In `@HadesLogManager/processor.go`:
- Line 240: Remove the leftover commented-out duplicate function signature for
GetJobLogs; delete the line "// func (la *NATSLogAggregator) GetJobLogs(jobID
string) []buildlogs.Log {" (the commented signature that duplicates the actual
GetJobLogs method) so only the real implementation remains in the
NATSLogAggregator type.
- Around line 68-78: NewLogAggregator currently doesn't validate
AggregatorConfig.APIendpoint causing runtime failures; change NewLogAggregator
to return (LogAggregator, error), check that config.APIendpoint (and any other
required fields) is non-empty at the start, and return a descriptive error if
validation fails; update callers to handle the (LogAggregator, error) return and
keep starting the cleanup goroutine only after successful validation and
creation of the NATSLogAggregator instance (referencing NewLogAggregator,
NATSLogAggregator, AggregatorConfig.APIendpoint, and SendJobLogs).
- Line 55: Field APIendpoint on the config struct is missing a default and
startup validation and should be renamed to APIEndpoint (Go acronym convention);
update the struct tag to include an envDefault for API_ENDPOINT or validate in
NewLogAggregator (check la.cfg.APIendpoint / la.cfg.APIEndpoint) and return an
error or log a clear warning if empty, and ensure SendJobLogs checks for a
non-empty APIEndpoint before attempting requests to avoid silent failures.
HadesLogManager/processor.go (3)

240-240: Remove commented-out duplicate function signature.

Line 240 is a leftover commented-out signature identical to line 241. Clean it up.

🤖 Prompt for AI Agents
Before applying any fix, first verify the finding against the current code and
decide whether a code change is actually needed. If the finding is not valid or
no change is required, do not modify code for that item and briefly explain why
it was skipped.
In `@HadesLogManager/processor.go` at line 240, Remove the leftover commented-out
duplicate function signature for GetJobLogs; delete the line "// func (la
*NATSLogAggregator) GetJobLogs(jobID string) []buildlogs.Log {" (the commented
signature that duplicates the actual GetJobLogs method) so only the real
implementation remains in the NATSLogAggregator type.

68-78: No validation that required config fields are set.

NewLogAggregator doesn't validate config.APIendpoint. If it's empty, SendJobLogs will fail at runtime with a potentially confusing error. An early check here would surface misconfiguration at startup.

Suggested direction
 func NewLogAggregator(ctx context.Context, hlc *buildlogs.HadesLogConsumer, config AggregatorConfig) LogAggregator {
+	if config.APIendpoint == "" {
+		slog.Warn("API_ENDPOINT is not configured; SendJobLogs will fail")
+	}
 	la := &NATSLogAggregator{
🤖 Prompt for AI Agents
Before applying any fix, first verify the finding against the current code and
decide whether a code change is actually needed. If the finding is not valid or
no change is required, do not modify code for that item and briefly explain why
it was skipped.
In `@HadesLogManager/processor.go` around lines 68 - 78, NewLogAggregator
currently doesn't validate AggregatorConfig.APIendpoint causing runtime
failures; change NewLogAggregator to return (LogAggregator, error), check that
config.APIendpoint (and any other required fields) is non-empty at the start,
and return a descriptive error if validation fails; update callers to handle the
(LogAggregator, error) return and keep starting the cleanup goroutine only after
successful validation and creation of the NATSLogAggregator instance
(referencing NewLogAggregator, NATSLogAggregator, AggregatorConfig.APIendpoint,
and SendJobLogs).

55-55: APIendpoint has no default and no startup validation.

If API_ENDPOINT is unset, la.cfg.APIendpoint will be an empty string, and SendJobLogs will silently fail with an HTTP request error. Consider either providing an envDefault or validating at startup in NewLogAggregator (e.g., log a warning or return an error).

Minor nit: Go convention for acronyms is fully capitalized — APIEndpoint rather than APIendpoint.

🤖 Prompt for AI Agents
Before applying any fix, first verify the finding against the current code and
decide whether a code change is actually needed. If the finding is not valid or
no change is required, do not modify code for that item and briefly explain why
it was skipped.
In `@HadesLogManager/processor.go` at line 55, Field APIendpoint on the config
struct is missing a default and startup validation and should be renamed to
APIEndpoint (Go acronym convention); update the struct tag to include an
envDefault for API_ENDPOINT or validate in NewLogAggregator (check
la.cfg.APIendpoint / la.cfg.APIEndpoint) and return an error or log a clear
warning if empty, and ensure SendJobLogs checks for a non-empty APIEndpoint
before attempting requests to avoid silent failures.
.env.example (2)

29-31: Add inline comments to clarify the purpose of these Artemis configuration variables.

Lines 29-31 lack inline comments explaining their purpose, which could confuse users configuring the adapter. Consider adding descriptive comments similar to line 28.

📝 Suggested inline comments
-ARTEMIS_BASE_URL=http://localhost:8080
-ARTEMIS_NEW_RESULT_ENDPOINT=api/programming/public/programming-exercises/new-result
-ARTEMIS_AUTH_TOKEN=superduperlongtestingsecrettoken
+ARTEMIS_BASE_URL=http://localhost:8080 # Base URL of the Artemis instance to send results to
+ARTEMIS_NEW_RESULT_ENDPOINT=api/programming/public/programming-exercises/new-result # Artemis API endpoint path for posting new results (default: api/programming/public/programming-exercises/new-result)
+ARTEMIS_AUTH_TOKEN=superduperlongtestingsecrettoken # Authentication token for Artemis API (replace with your actual token)
🤖 Prompt for AI Agents
Before applying any fix, first verify the finding against the current code and
decide whether a code change is actually needed. If the finding is not valid or
no change is required, do not modify code for that item and briefly explain why
it was skipped.
In @.env.example around lines 29 - 31, Add clear inline comments next to the
ARTEMIS_* entries to explain each variable's purpose and expected format:
annotate ARTEMIS_BASE_URL as the Artemis server root URL (include scheme and
port), ARTEMIS_NEW_RESULT_ENDPOINT as the relative API path used by the adapter
to post new programming exercise results, and ARTEMIS_AUTH_TOKEN as the
bearer/token used to authenticate requests (mark it sensitive and suggest using
a secure secret store). Keep the existing example values but append brief
guidance on format and security for each variable.

35-35: Add an inline comment to clarify the purpose of API_ENDPOINT.

Line 35 lacks an inline comment explaining what this endpoint is used for. Consider adding a descriptive comment to help users understand this is where the Log Manager sends job logs.

📝 Suggested inline comment
-API_ENDPOINT=http://localhost:8082/adapter/logs
+API_ENDPOINT=http://localhost:8082/adapter/logs # Endpoint where the Log Manager sends job logs (should point to HadesArtemisAdapter)
🤖 Prompt for AI Agents
Before applying any fix, first verify the finding against the current code and
decide whether a code change is actually needed. If the finding is not valid or
no change is required, do not modify code for that item and briefly explain why
it was skipped.
In @.env.example at line 35, Add an inline comment next to the API_ENDPOINT
entry to clarify its purpose: indicate that API_ENDPOINT is the URL where the
Log Manager (or adapter) sends job logs/telemetry (e.g., "Log Manager/adapter
log ingestion endpoint for job logs"). Update the .env.example line containing
API_ENDPOINT to include that brief descriptive comment so users understand its
role when configuring environments.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
Verify each finding against the current code and only fix it if needed.


In @.env.example:
- Around line 27-35: API_ENDPOINT in the .env example conflicts with the
HADESLOGMANAGER_ prefix convention; rename it to HADESLOGMANAGER_API_ENDPOINT in
the .env file and update any references: change struct tag / env lookup in the
HadesLogManager config struct (e.g., the field that currently maps to
API_ENDPOINT) to use "HADESLOGMANAGER_API_ENDPOINT" and update compose files,
startup/env parsing, and any code reading API_ENDPOINT (e.g., in HadesLogManager
init/load functions) to use the new name so the service-specific prefix stays
consistent.
- Line 31: Replace the example secret value for ARTEMIS_AUTH_TOKEN with an
obviously-placeholder string or leave it empty so it cannot be mistaken for a
real token; update the line containing ARTEMIS_AUTH_TOKEN in .env.example to
something like ARTEMIS_AUTH_TOKEN=<your-artemis-auth-token> or
ARTEMIS_AUTH_TOKEN= so contributors must explicitly fill in a real token.

In `@compose.yml`:
- Around line 50-55: The hadesLogManager service currently only depends on
hadesAPI and may start before NATS is ready; update the compose service
definition for hadesLogManager to add a dependency on the nats service with
condition: service_healthy so DynamicLogManager/StartListening will find NATS up
before the log manager starts; keep the existing hadesAPI dependency and add the
nats entry under depends_on with condition: service_healthy.
- Around line 40-55: The hadesLogManager service block is incorrectly indented
under hadesScheduler; outdent the hadesLogManager: block so it is a sibling of
hadesScheduler under the top-level services: section (match the same 2-space
indentation used for other service keys), ensuring its image, build, ports,
networks, depends_on, and environment keys are nested under hadesLogManager with
consistent indentation so YAML parsers treat it as a separate service.
- Around line 66-68: Remove the unnecessary depends_on entry for hadesAPI under
the hadesArtemisAdapter service in compose.yml: delete the hadesAPI block (and
its condition: service_started) so hadesArtemisAdapter no longer waits for
hadesAPI to start; if hadesArtemisAdapter actually requires another service
(e.g., nats) replace hadesAPI with that real dependency or remove depends_on
entirely if there are no runtime service dependencies.

In `@docker-compose.deploy.yml`:
- Around line 38-62: The new docker-compose services hadesLogManager and
hadesArtemisAdapter are routed through Traefik without any authentication
middleware; update their labels to attach the same or appropriate Traefik auth
middleware used by hadesAPI (e.g., add labels referencing an existing basicauth
middleware like traefik.http.routers.hadeslogmanager.middlewares and
traefik.http.routers.hadesadapter.middlewares or create new middleware labels),
or alternatively implement/apply app-level Basic Auth using the same AUTH_KEY
mechanism; ensure you modify the labels block for hadesLogManager and
hadesArtemisAdapter (the traefik.* label entries) to include middleware
references consistent with secured routes.

In `@docker-compose.dev.yml`:
- Around line 27-34: The hadesArtemisAdapter service in docker-compose.dev.yml
omits required environment variables used at runtime; update the
hadesArtemisAdapter environment block to declare ARTEMIS_NEW_RESULT_ENDPOINT and
ARTEMIS_AUTH_TOKEN (using variable substitution like
${ARTEMIS_NEW_RESULT_ENDPOINT} and ${ARTEMIS_AUTH_TOKEN}) alongside the existing
ARTEMIS_BASE_URL so developers know to set them in .env or their environment
before running the service.

In `@HadesArtemisAdapter/Dockerfile`:
- Around line 21-30: Add a non-root user to the runtime stage of the Dockerfile
and switch to it before the CMD: create or use a non-root user/group (referenced
as USER nonroot:nonroot), ensure ownership of /app (WORKDIR) and the copied
binary (COPY --from=builder /app/HadesArtemisAdapter/hades-artemis-adapter .) is
set appropriately, then add the USER nonroot:nonroot instruction prior to CMD
["/app/hades-artemis-adapter"] so the container runs without root privileges.

In `@HadesLogManager/Dockerfile`:
- Around line 21-30: Switch the runtime image to run as a non-root user by
either using the distroless nonroot variant or adding an explicit USER and
making the binary owned/executable by that user: replace FROM
gcr.io/distroless/static-debian12 with the nonroot variant (e.g.,
gcr.io/distroless/static-debian12:nonroot) or add USER nonroot after WORKDIR
/app, and ensure the copied artifact (from COPY --from=builder
/app/HadesLogManager/hades-log-manager .) is owned by the non-root UID (use COPY
--from=builder --chown=65532:65532 ... or chown the file in the builder stage)
so CMD ["/app/hades-log-manager"] runs without root privileges.
🧹 Nitpick comments (5)
🤖 Fix all nitpicks with AI agents
Verify each finding against the current code and only fix it if needed.


In @.env.example:
- Around line 27-35: API_ENDPOINT in the .env example conflicts with the
HADESLOGMANAGER_ prefix convention; rename it to HADESLOGMANAGER_API_ENDPOINT in
the .env file and update any references: change struct tag / env lookup in the
HadesLogManager config struct (e.g., the field that currently maps to
API_ENDPOINT) to use "HADESLOGMANAGER_API_ENDPOINT" and update compose files,
startup/env parsing, and any code reading API_ENDPOINT (e.g., in HadesLogManager
init/load functions) to use the new name so the service-specific prefix stays
consistent.
- Line 31: Replace the example secret value for ARTEMIS_AUTH_TOKEN with an
obviously-placeholder string or leave it empty so it cannot be mistaken for a
real token; update the line containing ARTEMIS_AUTH_TOKEN in .env.example to
something like ARTEMIS_AUTH_TOKEN=<your-artemis-auth-token> or
ARTEMIS_AUTH_TOKEN= so contributors must explicitly fill in a real token.

In `@docker-compose.deploy.yml`:
- Around line 38-62: The new docker-compose services hadesLogManager and
hadesArtemisAdapter are routed through Traefik without any authentication
middleware; update their labels to attach the same or appropriate Traefik auth
middleware used by hadesAPI (e.g., add labels referencing an existing basicauth
middleware like traefik.http.routers.hadeslogmanager.middlewares and
traefik.http.routers.hadesadapter.middlewares or create new middleware labels),
or alternatively implement/apply app-level Basic Auth using the same AUTH_KEY
mechanism; ensure you modify the labels block for hadesLogManager and
hadesArtemisAdapter (the traefik.* label entries) to include middleware
references consistent with secured routes.

In `@HadesArtemisAdapter/Dockerfile`:
- Around line 21-30: Add a non-root user to the runtime stage of the Dockerfile
and switch to it before the CMD: create or use a non-root user/group (referenced
as USER nonroot:nonroot), ensure ownership of /app (WORKDIR) and the copied
binary (COPY --from=builder /app/HadesArtemisAdapter/hades-artemis-adapter .) is
set appropriately, then add the USER nonroot:nonroot instruction prior to CMD
["/app/hades-artemis-adapter"] so the container runs without root privileges.

In `@HadesLogManager/Dockerfile`:
- Around line 21-30: Switch the runtime image to run as a non-root user by
either using the distroless nonroot variant or adding an explicit USER and
making the binary owned/executable by that user: replace FROM
gcr.io/distroless/static-debian12 with the nonroot variant (e.g.,
gcr.io/distroless/static-debian12:nonroot) or add USER nonroot after WORKDIR
/app, and ensure the copied artifact (from COPY --from=builder
/app/HadesLogManager/hades-log-manager .) is owned by the non-root UID (use COPY
--from=builder --chown=65532:65532 ... or chown the file in the builder stage)
so CMD ["/app/hades-log-manager"] runs without root privileges.
HadesArtemisAdapter/Dockerfile (1)

21-30: Run as non-root user in the runtime stage.

Same recommendation as HadesLogManager/Dockerfile — add USER nonroot:nonroot to align with container security best practices.

Proposed fix
 FROM gcr.io/distroless/static-debian12
 
+USER nonroot:nonroot
+
 WORKDIR /app
 
 COPY --from=builder /app/HadesArtemisAdapter/hades-artemis-adapter .
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@HadesArtemisAdapter/Dockerfile` around lines 21 - 30, Add a non-root user to
the runtime stage of the Dockerfile and switch to it before the CMD: create or
use a non-root user/group (referenced as USER nonroot:nonroot), ensure ownership
of /app (WORKDIR) and the copied binary (COPY --from=builder
/app/HadesArtemisAdapter/hades-artemis-adapter .) is set appropriately, then add
the USER nonroot:nonroot instruction prior to CMD ["/app/hades-artemis-adapter"]
so the container runs without root privileges.
.env.example (2)

27-35: Previous API_PORT duplication issue is resolved — nicely done.

The env vars now use service-specific prefixes (HADESADAPTER_API_PORT, HADESLOGMANAGER_API_PORT), which addresses the earlier review feedback.

One inconsistency remains: API_ENDPOINT (Line 35) doesn't follow the HADESLOGMANAGER_ prefix convention used by HADESLOGMANAGER_API_PORT on Line 34. If another service ever needs an API_ENDPOINT, this will collide.

Suggested rename
-API_ENDPOINT=http://localhost:8082/adapter/logs
+HADESLOGMANAGER_API_ENDPOINT=http://localhost:8082/adapter/logs

Also update the corresponding struct tag in the HadesLogManager code and compose files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.env.example around lines 27 - 35, API_ENDPOINT in the .env example
conflicts with the HADESLOGMANAGER_ prefix convention; rename it to
HADESLOGMANAGER_API_ENDPOINT in the .env file and update any references: change
struct tag / env lookup in the HadesLogManager config struct (e.g., the field
that currently maps to API_ENDPOINT) to use "HADESLOGMANAGER_API_ENDPOINT" and
update compose files, startup/env parsing, and any code reading API_ENDPOINT
(e.g., in HadesLogManager init/load functions) to use the new name so the
service-specific prefix stays consistent.

31-31: Avoid committing example secrets that look like real tokens.

ARTEMIS_AUTH_TOKEN=superduperlongtestingsecrettoken is clearly a placeholder, but it's good practice to use an obviously-placeholder value like <your-artemis-auth-token> or leave it empty, to avoid any chance of copy-paste into production.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.env.example at line 31, Replace the example secret value for
ARTEMIS_AUTH_TOKEN with an obviously-placeholder string or leave it empty so it
cannot be mistaken for a real token; update the line containing
ARTEMIS_AUTH_TOKEN in .env.example to something like
ARTEMIS_AUTH_TOKEN=<your-artemis-auth-token> or ARTEMIS_AUTH_TOKEN= so
contributors must explicitly fill in a real token.
docker-compose.deploy.yml (1)

38-62: Both new services are publicly exposed via Traefik without authentication middleware.

hadesAPI relies on AUTH_KEY for Basic Auth. The new hadesLogManager and hadesArtemisAdapter services are exposed to the internet without any Traefik auth middleware. If these endpoints accept or serve sensitive data (build logs, test results, adapter ingestion), consider adding authentication — either via Traefik middleware (e.g., basicauth) or at the application level.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.deploy.yml` around lines 38 - 62, The new docker-compose
services hadesLogManager and hadesArtemisAdapter are routed through Traefik
without any authentication middleware; update their labels to attach the same or
appropriate Traefik auth middleware used by hadesAPI (e.g., add labels
referencing an existing basicauth middleware like
traefik.http.routers.hadeslogmanager.middlewares and
traefik.http.routers.hadesadapter.middlewares or create new middleware labels),
or alternatively implement/apply app-level Basic Auth using the same AUTH_KEY
mechanism; ensure you modify the labels block for hadesLogManager and
hadesArtemisAdapter (the traefik.* label entries) to include middleware
references consistent with secured routes.
HadesLogManager/Dockerfile (1)

21-30: Consider running as a non-root user in the runtime stage.

The distroless static-debian12 image defaults to root. Use the built-in nonroot user to follow the principle of least privilege, which also resolves the Trivy DS-0002 finding.

Proposed fix
 FROM gcr.io/distroless/static-debian12
 
+USER nonroot:nonroot
+
 WORKDIR /app
 
 COPY --from=builder /app/HadesLogManager/hades-log-manager .
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@HadesLogManager/Dockerfile` around lines 21 - 30, Switch the runtime image to
run as a non-root user by either using the distroless nonroot variant or adding
an explicit USER and making the binary owned/executable by that user: replace
FROM gcr.io/distroless/static-debian12 with the nonroot variant (e.g.,
gcr.io/distroless/static-debian12:nonroot) or add USER nonroot after WORKDIR
/app, and ensure the copied artifact (from COPY --from=builder
/app/HadesLogManager/hades-log-manager .) is owned by the non-root UID (use COPY
--from=builder --chown=65532:65532 ... or chown the file in the builder stage)
so CMD ["/app/hades-log-manager"] runs without root privileges.

@paoxin paoxin marked this pull request as draft February 18, 2026 00:19
@paoxin paoxin marked this pull request as ready for review February 23, 2026 16:12
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

> [!CAUTION]

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
HadesLogManager/go.mod (2)

3-3: ⚠️ Potential issue | 🟡 Minor

go 1.24.4 is outdated — later patches include security fixes.

go1.24.4 was released on 2025-06-05 and includes security fixes to the crypto/x509, net/http, and os packages. Subsequent patch releases go1.24.5 (2025-07-08) and go1.24.6 (2025-08-06) include further security fixes to the go command, database/sql, and os/exec packages. Consider bumping the minimum version to the latest go1.24.x patch (or go1.25.x if the project is willing to adopt it) to capture these fixes in the toolchain minimum.

🛠️ Suggested update
-go 1.24.4
+go 1.24.6
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@HadesLogManager/go.mod` at line 3, The go.mod currently pins the toolchain as
"go 1.24.4" which is outdated; update the Go version declaration in go.mod
(replace the "go 1.24.4" line) to a secure patched release such as "go 1.24.6"
(or "go 1.25.0" if you're ready to adopt 1.25) so the module uses the newer
patch-level security fixes; ensure you run go mod tidy/build to verify
compatibility after changing the "go" directive.

1-1: ⚠️ Potential issue | 🟠 Major

module HadesAdapter is incorrect — should be HadesLogManager or follow the full VCS path pattern.

HadesLogManager/go.mod declares module HadesAdapter, but HadesArtemisAdapter actually declares module HadesArtemisAdapter, not HadesAdapter. The HadesLogManager module name is either a residual copy-paste error or a leftover from an earlier template.

Looking at other services in the repository:

  • HadesArtemisAdapter uses module HadesArtemisAdapter
  • HadesScheduler uses module github.com/ls1intum/hades/hadesScheduler
  • HadesAPI uses module github.com/ls1intum/hades/hadesAPI

Align HadesLogManager with one of these patterns (either module HadesLogManager or the full VCS path module github.com/ls1intum/hades/HadesLogManager) to avoid build and import resolution issues.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@HadesLogManager/go.mod` at line 1, The go.mod currently declares module
HadesAdapter which is incorrect; update the module line to the correct module
name (e.g., "HadesLogManager" or the canonical VCS path
"github.com/ls1intum/hades/HadesLogManager") so it matches the package and other
modules, then run go mod tidy to fix imports and module graph; look for and
update any imports referencing HadesAdapter to the new module name if present.
♻️ Duplicate comments (4)
HadesLogManager/processor.go (1)

221-230: ⚠️ Potential issue | 🟠 Major

Non-2xx HTTP responses are still silently treated as success.

After client.Do(req) succeeds, resp.StatusCode is never checked. A 4xx or 5xx from the adapter will be logged at INFO level as if it succeeded, and nil is returned. This was flagged in a prior review and appears unaddressed.

Proposed fix
 	defer resp.Body.Close()
+
+	if resp.StatusCode >= 400 {
+		return fmt.Errorf("adapter returned HTTP %d for job %s", resp.StatusCode, jobID)
+	}
+
 	slog.Info("Sent job logs to adapter", "job_id", jobID, "status", resp.Status)
 	return nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@HadesLogManager/processor.go` around lines 221 - 230, The code treats any
HTTP response as success after client.Do(req) in the function that sends job
logs (see client.Do(req), resp, jobID and slog.Info), but never checks
resp.StatusCode; update the logic to treat non-2xx responses as failures by
inspecting resp.StatusCode, read the response body (if available) for diagnostic
details, log an error with status and body (instead of the current success
slog.Info), and return a non-nil error (e.g. fmt.Errorf with status and body)
when status is not in the 200–299 range so callers can handle adapter errors
correctly.
HadesArtemisAdapter/artemis_adapter.go (1)

124-159: Stale incomplete jobs can leak memory indefinitely.

If only logs or only results arrive for a job (e.g., the test runner crashes), entries in aa.logs, aa.results, and aa.jobLocks are never cleaned up. Consider adding a TTL-based cleanup goroutine (similar to cleanupCompletedJobs in the LogManager) to evict stale entries after a configurable timeout.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@HadesArtemisAdapter/artemis_adapter.go` around lines 124 - 159, Stale job
entries in aa.logs, aa.results and aa.jobLocks can leak memory; modify the code
so that when logs or results are stored you also record a timestamp (e.g., add a
jobTimestamps map or wrap stored values with a struct containing time.Now()),
introduce a configurable TTL field on ArtemisAdapter and start a cleanup
goroutine (similar to cleanupCompletedJobs in LogManager) that periodically
scans these maps and evicts entries older than the TTL (deleting from aa.logs,
aa.results and aa.jobLocks), and ensure the goroutine is cancellable via the
adapter's lifecycle/context to avoid leaking the cleanup routine itself; tie
this to the existing checkAndSendIfReady flow so successful sends still remove
entries immediately.
docker-compose.dev.yml (1)

19-36: New services configuration looks good.

The hadesLogManager and hadesArtemisAdapter services are properly defined with port mappings and environment overrides. The previously missing ARTEMIS_NEW_RESULT_ENDPOINT and ARTEMIS_AUTH_TOKEN variables are now included (lines 35–36).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.dev.yml` around lines 19 - 36, The PR review comment includes
a duplicate marker ([duplicate_comment]) that should be removed to avoid
noisy/ambiguous review metadata; update the reviewer comment to keep only the
approval tag ([approve_code_changes]) and ensure the services hadesLogManager
and hadesArtemisAdapter remain as-is with the ARTEMIS_NEW_RESULT_ENDPOINT and
ARTEMIS_AUTH_TOKEN environment variable overrides intact.
compose.yml (1)

40-57: Past indentation and dependency issues appear resolved.

The hadesLogManager service is now properly indented as a sibling of hadesScheduler under services:, and the depends_on block correctly includes nats with condition: service_healthy. These address the critical indentation and dependency issues flagged in prior reviews.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compose.yml` around lines 40 - 57, The compose snippet for the
hadesLogManager service is mostly fixed but you should ensure the exposed port
mapping and environment variable match runtime expectations: verify the service
definition for hadesLogManager uses the intended internal port
(HADESLOGMANAGER_API_PORT) and that the port-forward "8083:8081" corresponds to
that variable; confirm API_ENDPOINT is set in the environment and not
accidentally empty by validating ${API_ENDPOINT} is defined in the environment
or .env before deployment; if the internal port needs to change, update
HADESLOGMANAGER_API_PORT and the ports mapping together and keep the depends_on
entries (nats with condition: service_healthy, hadesAPI with condition:
service_started) intact.
🧹 Nitpick comments (5)
HadesScheduler/docker/step.go (1)

105-110: Logs are not captured when errCh fires.

The processContainerLogs call only lives inside the statusCh branch. If ContainerWait itself errors (e.g., daemon disconnect, context deadline), the function returns at line 109 without attempting to publish any logs the container may have already produced.

If the goal is to always capture available logs, consider calling processContainerLogs in this path as well (on a best-effort basis).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@HadesScheduler/docker/step.go` around lines 105 - 110, When errCh returns an
error from ContainerWait, we still need to attempt to collect any available
container output; update the errCh branch that currently logs and returns (the
select handling err := <-errCh) to call processContainerLogs(ctx, resp.ID, ...)
on a best-effort basis before returning. Invoke processContainerLogs with the
same parameters used in the statusCh branch, swallow or log any errors from
processContainerLogs (do not let log-collection failures mask the original
ContainerWait error), then return the original err; this ensures logs are always
attempted when ContainerWait fails.
HadesScheduler/docker/job.go (2)

27-27: error(nil) is non-idiomatic; prefer var stepErr error.

error(nil) is a valid but unusual type conversion of untyped nil to the error interface. The idiomatic Go zero-value declaration is var stepErr error.

♻️ Proposed fix
-	stepErr := error(nil)
+	var stepErr error
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@HadesScheduler/docker/job.go` at line 27, The declaration stepErr :=
error(nil) in HadesScheduler/docker/job.go is non-idiomatic; replace it with the
zero-value declaration var stepErr error to declare an error variable named
stepErr. Locate the variable named stepErr (currently set via error(nil)) and
change that line to use var stepErr error so the code uses the idiomatic Go
zero-value for the error type.

27-27: error(nil) is non-idiomatic; prefer var stepErr error.

error(nil) is a type conversion of untyped nil to the error interface — valid but unusual. The canonical Go zero-value declaration is var stepErr error.

♻️ Proposed fix
-	stepErr := error(nil)
+	var stepErr error
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@HadesScheduler/docker/job.go` at line 27, Replace the non-idiomatic
declaration "stepErr := error(nil)" with the canonical zero-value declaration
"var stepErr error" where the variable is introduced (the declaration of stepErr
in job.go); this keeps the variable type as error while using the conventional
Go idiom and avoids the unnecessary type conversion.
HadesLogManager/processor.go (1)

214-220: A new http.Client is created on every SendJobLogs call.

Creating a fresh http.Client per invocation prevents TCP connection reuse. Consider storing a single *http.Client on NATSLogAggregator (initialized in NewLogAggregator) and reusing it across calls, similar to how ArtemisAdapter does it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@HadesLogManager/processor.go` around lines 214 - 220, The SendJobLogs
implementation creates a new http.Client for every call which prevents TCP
connection reuse; add a http.Client field to NATSLogAggregator (e.g., client
*http.Client), initialize that single client in NewLogAggregator (with the
desired Timeout), and update SendJobLogs to use la.client instead of creating a
new client locally; mirror the pattern used by ArtemisAdapter to ensure
connection reuse and avoid per-call client allocation.
docs/api/Create Build Job (Test Fail) with Result Parse Step.bru (1)

78-82: Pre-request variables user and password are empty.

These variables are referenced as {{user}} and {{password}} in the request body but have no values. This is likely intentional (developers fill them in locally), but a brief comment noting this would help avoid confusion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/api/Create` Build Job (Test Fail) with Result Parse Step.bru around
lines 78 - 82, The pre-request variable block (vars:pre-request) defines user
and password as empty but they are referenced as {{user}} and {{password}} in
the request body; add a short inline comment above or next to the user/password
entries indicating they are intentionally left blank for local developer
credentials (or provide clear placeholder values like <USERNAME> / <PASSWORD>)
so readers know to fill them in before running the request and to avoid
confusion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@compose.yml`:
- Around line 55-57: The hadesLogManager service in compose.yml is missing an
explicit NATS_URL env var so it falls back to an incorrect default; add an
environment entry for NATS_URL with the Docker Compose reachable value
(NATS_URL=nats://nats:4222) under the hadesLogManager service's environment
block to match hadesAPI/hadesScheduler and ensure
HadesLogManagerConfig/NatsConfig can connect to NATS.

In `@docs/api/Create` Build Job (Test Fail) with Result Parse Step.bru:
- Around line 50-58: Step 2 ("Execute", id: 2) is missing the shared volume
mount used by its script (cd /shared/example); add a volumeMounts entry to the
"Execute" step that mounts the "shared" volume at /shared so the directory
exists in the container; update the step object for name "Execute" (id 2) to
include volumeMounts referencing the "shared" volume name and mountPath
"/shared".
- Around line 59-73: Update the API_ENDPOINT value used by the "Parse Results"
step so it points to the HadesArtemisAdapter port 8082 instead of 8083: change
the "API_ENDPOINT" field from
"http://host.docker.internal:8083/adapter/test-results" to
"http://host.docker.internal:8082/adapter/test-results" in the "Parse Results"
block, and apply the same change in the equivalent "Create Build Job (Test
Succeed) with Result Parse Step.bru" file to avoid 404/connection errors when
the junit-result-parser posts results.

In `@HadesArtemisAdapter/artemis_adapter.go`:
- Around line 95-106: StoreLogs currently assumes the execution logs are at
logs[1], which is fragile; update StoreLogs to locate the execution LogEntry by
a stable identifier (e.g., container ID or step name) instead of using the
hardcoded index: iterate the incoming []buildlogs.Log for the entry whose
metadata (container ID or step name) matches the execution step, assign its
.Logs to executionLogs, fall back to an empty slice and a warning if none match;
keep storing to aa.logs.Store(jobID, executionLogs) and then call
aa.checkAndSendIfReady(jobID). Ensure you reference the buildlogs.Log and
buildlogs.LogEntry structs and preserve behavior when no matching entry is
found.

In `@HadesLogManager/log_subscriber.go`:
- Around line 243-244: The call to dlm.logAggregator.SendJobLogs(jobID)
currently ignores its returned error; capture its error (e.g., err :=
dlm.logAggregator.SendJobLogs(jobID)) and at minimum log it (use the existing
logger on dlm, e.g., dlm.logger.Errorf or similar) with context including jobID
and the error; optionally add a simple retry/backoff loop here or ensure
SendJobLogs itself implements retries, but do not swallow the error silently.
Ensure you update the code around dlm.logAggregator.MarkJobCompleted(jobID) and
the subsequent SendJobLogs call so the SendJobLogs return value is checked and
logged.

In `@HadesScheduler/docker/job.go`:
- Around line 49-59: The loop currently overwrites stepErr on each failing step
with ContinueOnError, losing earlier errors; update Job execution to import the
"errors" package and accumulate failures by joining the new fmt.Errorf(...) with
the existing stepErr using errors.Join (referencing stepErr and the
ContinueOnError metadata check in the loop where you currently set stepErr =
fmt.Errorf(...)); ensure you only pass the non-nil fmt.Errorf for the current
failure into errors.Join so multiple failures are preserved and returned as a
combined error.
- Around line 49-59: The loop currently overwrites stepErr for each failing step
with ContinueOnError, losing earlier errors; import the standard "errors"
package, replace the non-idiomatic initialization (stepErr := error(nil)) with
var stepErr error, and when a step fails but ContinueOnError == "true" replace
the unconditional assignment stepErr = fmt.Errorf(...) with stepErr =
errors.Join(stepErr, fmt.Errorf("Step %v failed with error but ContinueOnError
is set: %w", step.ID, err)) so all failing step errors are accumulated; keep the
existing return err when not ContinueOnError and return stepErr at the end.

In `@HadesScheduler/docker/step.go`:
- Around line 112-119: processContainerLogs is currently called before checking
the container exit code and its error is returned immediately, which can mask
the real container status; change the flow so you call processContainerLogs(ctx,
s.cli, s.publisher, resp.ID, jobId) but do NOT return on its error—capture the
error (e.g., logErr) and s.logger.Error the failure (including resp.ID and
jobId) and continue to evaluate status.StatusCode; only treat the step as failed
if status.StatusCode != 0 (return that error/exit code), while still recording
the log-publishing error separately so observability failures don’t override
actual container failures.
- Around line 90-94: The deferred cleanup unconditionally calls
removeContainer(ctx, s.cli, resp.ID) even when s.Options.containerAutoremove is
true, causing spurious "container not found" errors; modify the defer in the
function that creates resp to check s.Options.containerAutoremove (or a local
copy if necessary) and only call removeContainer when containerAutoremove is
false, leaving the existing s.logger.Error logging intact for real failures and
avoiding a redundant removal attempt for auto-removed containers.

---

Outside diff comments:
In `@HadesLogManager/go.mod`:
- Line 3: The go.mod currently pins the toolchain as "go 1.24.4" which is
outdated; update the Go version declaration in go.mod (replace the "go 1.24.4"
line) to a secure patched release such as "go 1.24.6" (or "go 1.25.0" if you're
ready to adopt 1.25) so the module uses the newer patch-level security fixes;
ensure you run go mod tidy/build to verify compatibility after changing the "go"
directive.
- Line 1: The go.mod currently declares module HadesAdapter which is incorrect;
update the module line to the correct module name (e.g., "HadesLogManager" or
the canonical VCS path "github.com/ls1intum/hades/HadesLogManager") so it
matches the package and other modules, then run go mod tidy to fix imports and
module graph; look for and update any imports referencing HadesAdapter to the
new module name if present.

---

Duplicate comments:
In `@compose.yml`:
- Around line 40-57: The compose snippet for the hadesLogManager service is
mostly fixed but you should ensure the exposed port mapping and environment
variable match runtime expectations: verify the service definition for
hadesLogManager uses the intended internal port (HADESLOGMANAGER_API_PORT) and
that the port-forward "8083:8081" corresponds to that variable; confirm
API_ENDPOINT is set in the environment and not accidentally empty by validating
${API_ENDPOINT} is defined in the environment or .env before deployment; if the
internal port needs to change, update HADESLOGMANAGER_API_PORT and the ports
mapping together and keep the depends_on entries (nats with condition:
service_healthy, hadesAPI with condition: service_started) intact.

In `@docker-compose.dev.yml`:
- Around line 19-36: The PR review comment includes a duplicate marker
([duplicate_comment]) that should be removed to avoid noisy/ambiguous review
metadata; update the reviewer comment to keep only the approval tag
([approve_code_changes]) and ensure the services hadesLogManager and
hadesArtemisAdapter remain as-is with the ARTEMIS_NEW_RESULT_ENDPOINT and
ARTEMIS_AUTH_TOKEN environment variable overrides intact.

In `@HadesArtemisAdapter/artemis_adapter.go`:
- Around line 124-159: Stale job entries in aa.logs, aa.results and aa.jobLocks
can leak memory; modify the code so that when logs or results are stored you
also record a timestamp (e.g., add a jobTimestamps map or wrap stored values
with a struct containing time.Now()), introduce a configurable TTL field on
ArtemisAdapter and start a cleanup goroutine (similar to cleanupCompletedJobs in
LogManager) that periodically scans these maps and evicts entries older than the
TTL (deleting from aa.logs, aa.results and aa.jobLocks), and ensure the
goroutine is cancellable via the adapter's lifecycle/context to avoid leaking
the cleanup routine itself; tie this to the existing checkAndSendIfReady flow so
successful sends still remove entries immediately.

In `@HadesLogManager/processor.go`:
- Around line 221-230: The code treats any HTTP response as success after
client.Do(req) in the function that sends job logs (see client.Do(req), resp,
jobID and slog.Info), but never checks resp.StatusCode; update the logic to
treat non-2xx responses as failures by inspecting resp.StatusCode, read the
response body (if available) for diagnostic details, log an error with status
and body (instead of the current success slog.Info), and return a non-nil error
(e.g. fmt.Errorf with status and body) when status is not in the 200–299 range
so callers can handle adapter errors correctly.

---

Nitpick comments:
In `@docs/api/Create` Build Job (Test Fail) with Result Parse Step.bru:
- Around line 78-82: The pre-request variable block (vars:pre-request) defines
user and password as empty but they are referenced as {{user}} and {{password}}
in the request body; add a short inline comment above or next to the
user/password entries indicating they are intentionally left blank for local
developer credentials (or provide clear placeholder values like <USERNAME> /
<PASSWORD>) so readers know to fill them in before running the request and to
avoid confusion.

In `@HadesLogManager/processor.go`:
- Around line 214-220: The SendJobLogs implementation creates a new http.Client
for every call which prevents TCP connection reuse; add a http.Client field to
NATSLogAggregator (e.g., client *http.Client), initialize that single client in
NewLogAggregator (with the desired Timeout), and update SendJobLogs to use
la.client instead of creating a new client locally; mirror the pattern used by
ArtemisAdapter to ensure connection reuse and avoid per-call client allocation.

In `@HadesScheduler/docker/job.go`:
- Line 27: The declaration stepErr := error(nil) in HadesScheduler/docker/job.go
is non-idiomatic; replace it with the zero-value declaration var stepErr error
to declare an error variable named stepErr. Locate the variable named stepErr
(currently set via error(nil)) and change that line to use var stepErr error so
the code uses the idiomatic Go zero-value for the error type.
- Line 27: Replace the non-idiomatic declaration "stepErr := error(nil)" with
the canonical zero-value declaration "var stepErr error" where the variable is
introduced (the declaration of stepErr in job.go); this keeps the variable type
as error while using the conventional Go idiom and avoids the unnecessary type
conversion.

In `@HadesScheduler/docker/step.go`:
- Around line 105-110: When errCh returns an error from ContainerWait, we still
need to attempt to collect any available container output; update the errCh
branch that currently logs and returns (the select handling err := <-errCh) to
call processContainerLogs(ctx, resp.ID, ...) on a best-effort basis before
returning. Invoke processContainerLogs with the same parameters used in the
statusCh branch, swallow or log any errors from processContainerLogs (do not let
log-collection failures mask the original ContainerWait error), then return the
original err; this ensures logs are always attempted when ContainerWait fails.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 105983a and 13ca7ec.

⛔ Files ignored due to path filters (1)
  • HadesLogManager/go.sum is excluded by !**/*.sum
📒 Files selected for processing (13)
  • .github/workflows/build.yml
  • HadesArtemisAdapter/artemis_adapter.go
  • HadesArtemisAdapter/main.go
  • HadesLogManager/go.mod
  • HadesLogManager/log_subscriber.go
  • HadesLogManager/main.go
  • HadesLogManager/processor.go
  • HadesScheduler/docker/job.go
  • HadesScheduler/docker/step.go
  • compose.yml
  • docker-compose.dev.yml
  • docs/api/Create Build Job (Test Fail) with Result Parse Step.bru
  • docs/api/Create Build Job (Test Succeed) with Result Parse Step.bru
🚧 Files skipped from review as they are similar to previous changes (3)
  • docs/api/Create Build Job (Test Succeed) with Result Parse Step.bru
  • HadesLogManager/main.go
  • HadesArtemisAdapter/main.go

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
docs/api/Create Build Job (Test Fail) with Result Parse Step.bru (2)

65-79: API_ENDPOINT port (8083) — acknowledged as deployment-dependent.

The port value in line 76 was previously flagged. Per the author's clarification, the endpoint is intentionally flexible and varies depending on whether Artemis is running in the target environment. No change required.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/api/Create` Build Job (Test Fail) with Result Parse Step.bru around
lines 65 - 79, The flagged port in the "Parse Results" step (id: 3, name: "Parse
Results", metadata key "API_ENDPOINT") is intentionally deployment-dependent and
requires no change; leave the API_ENDPOINT value
("http://host.docker.internal:8083/adapter/test-results") as-is in the metadata
and proceed with the current configuration.

50-64: Step 2 volumeMounts fix is now correctly applied.

The previously flagged missing volumeMounts entry for the shared volume in the "Execute" step has been addressed — lines 54–59 now correctly declare the mount, matching the pattern used in Steps 1 and 3.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/api/Create` Build Job (Test Fail) with Result Parse Step.bru around
lines 50 - 64, The "Execute" step already includes the correct volumeMounts for
the shared volume, but the review metadata contains a duplicate/approval marker;
remove the extraneous duplicate comment/approval tokens (e.g.,
"[approve_code_changes]" and "[duplicate_comment]") so the review message isn't
duplicated, and keep the "Execute" step's volumeMounts, "script", and "metadata"
(CONTINUE_ON_ERROR) unchanged.
HadesScheduler/docker/job.go (1)

28-28: Non-idiomatic nil error initialization (previously flagged, still unaddressed).

stepErr := error(nil) is a typed nil assignment; the idiomatic Go form is var stepErr error.

♻️ Proposed fix
-	stepErr := error(nil)
+	var stepErr error
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@HadesScheduler/docker/job.go` at line 28, Replace the typed-nil error
initialization "stepErr := error(nil)" with the idiomatic Go declaration "var
stepErr error"; locate the variable named stepErr in
HadesScheduler/docker/job.go (the place where stepErr is first declared/used)
and change its declaration to use var so the value is an untyped nil error,
leaving subsequent assignments and checks unchanged.
🧹 Nitpick comments (3)
compose.yml (2)

60-69: Consider adding a healthcheck and upstream depends_on for hadesArtemisAdapter.

hadesLogManager and hadesScheduler both start without waiting for hadesArtemisAdapter. If either service attempts to POST results to the adapter immediately on startup, there is a window where the adapter may not yet be accepting connections. Adding a healthcheck to the adapter and declaring hadesArtemisAdapter: condition: service_healthy in those upstream services closes this startup race without requiring application-level retry logic.

💡 Suggested addition
  hadesArtemisAdapter:
    image: ghcr.io/ls1intum/hades/hades-artemis-adapter
    build:
      context: .
      dockerfile: ./HadesArtemisAdapter/Dockerfile
    ports:
      - "8082:8082"
    networks:
      - hades
+   healthcheck:
+     test: ["CMD", "wget", "--spider", "-q", "http://localhost:8082/health"]
+     interval: 10s
+     timeout: 5s
+     retries: 3
    environment:
      - HADESADAPTER_API_PORT=8082
      ...

And in hadesLogManager / hadesScheduler:

    depends_on:
      nats:
        condition: service_healthy
      hadesAPI:
        condition: service_started
+     hadesArtemisAdapter:
+       condition: service_healthy
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compose.yml` around lines 60 - 69, Add a Docker health check to the
hadesArtemisAdapter service and make hadesLogManager and hadesScheduler depend
on its healthy status: define a healthcheck block for hadesArtemisAdapter (e.g.,
using curl or nc against port 8082 and an appropriate
interval/retries/start-period) and update the depends_on in hadesLogManager and
hadesScheduler to include hadesArtemisAdapter with condition: service_healthy so
they wait until the adapter is ready before starting.

3-3: version field is obsolete in Docker Compose v2 — consider removing it.

The top-level version property is defined by the Compose Specification for backward compatibility — it is only informative, and you'll receive a warning message that it is obsolete if used. Compose always uses the most recent schema to validate the Compose file regardless of this field. As of Docker Compose v2.40+, the version declaration is unsupported and will produce an error.

♻️ Proposed change
-# docker compose file for the project with three services
-
-version: '3.7'
-services:
+# docker compose file for the project with three services
+services:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compose.yml` at line 3, Remove the obsolete top-level "version" declaration
(e.g. the line "version: '3.7'") from the Compose YAML; delete that key entirely
so the file relies on the Compose Specification default (Compose v2+) and then
validate the remaining services/networks/volumes sections with docker compose
config to ensure no other schema issues.
HadesLogManager/log_subscriber.go (1)

173-190: stopWatchingJobLogs can inherit the old goroutine's remaining wait time — worth a comment.

Between dlm.mu.Unlock() (line 184) and the go func() launch (line 193), the new watcher—already in the map with wg at count 1—is visible to concurrent callers. A stopWatchingJobLogs call in this window will find the new watcher, cancel jobCtx, and block on wg.Wait() for a goroutine that has not started yet. That goroutine won't start until startWatchingJobLogs finishes oldWatcher.wg.Wait(), so stopWatchingJobLogs effectively serializes through both the old goroutine's exit and the new goroutine's start+exit. The behaviour is correct (because wg.Add(1) precedes map insertion), but the coupling is non-obvious.

💡 Suggested comment to document the intentional ordering
 	dlm.mu.Lock()
 	oldWatcher, exists := dlm.watchers[jobID]
 	dlm.watchers[jobID] = watcherState{
 		ctx:    jobCtx,
 		cancel: cancel,
 		wg:     wg,
 	}
 	dlm.mu.Unlock()
+	// The new watcher (wg already at 1) is now visible in the map before the
+	// goroutine starts.  A concurrent stopWatchingJobLogs will correctly block
+	// on wg.Wait() until the goroutine is launched and exits, but may also
+	// absorb the tail of oldWatcher.wg.Wait() latency below.

 	// Cancel old watcher outside the lock to avoid potential deadlock
 	if exists {
 		oldWatcher.cancel()
 		oldWatcher.wg.Wait()
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@HadesLogManager/log_subscriber.go` around lines 173 - 190, The insertion of
the new watcherState (dlm.watchers[jobID]) occurs after wg.Add(1) so a
concurrent stopWatchingJobLogs may find and cancel the new watcher and block on
its wg.Wait() even before the goroutine has started; this is intentional to
avoid race conditions but non-obvious—add a concise comment next to the block
that creates watcherState (mentioning dlm.watchers, watcherState, wg,
oldWatcher, startWatchingJobLogs and stopWatchingJobLogs) explaining that wg is
incremented before map insertion so stopWatchingJobLogs may wait for the
newly-created goroutine to start/exit and that this ordering is deliberate to
prevent races and deadlocks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@docs/api/Create` Build Job (Test Fail) with Result Parse Step.bru:
- Around line 65-79: The flagged port in the "Parse Results" step (id: 3, name:
"Parse Results", metadata key "API_ENDPOINT") is intentionally
deployment-dependent and requires no change; leave the API_ENDPOINT value
("http://host.docker.internal:8083/adapter/test-results") as-is in the metadata
and proceed with the current configuration.
- Around line 50-64: The "Execute" step already includes the correct
volumeMounts for the shared volume, but the review metadata contains a
duplicate/approval marker; remove the extraneous duplicate comment/approval
tokens (e.g., "[approve_code_changes]" and "[duplicate_comment]") so the review
message isn't duplicated, and keep the "Execute" step's volumeMounts, "script",
and "metadata" (CONTINUE_ON_ERROR) unchanged.

In `@HadesScheduler/docker/job.go`:
- Line 28: Replace the typed-nil error initialization "stepErr := error(nil)"
with the idiomatic Go declaration "var stepErr error"; locate the variable named
stepErr in HadesScheduler/docker/job.go (the place where stepErr is first
declared/used) and change its declaration to use var so the value is an untyped
nil error, leaving subsequent assignments and checks unchanged.

---

Nitpick comments:
In `@compose.yml`:
- Around line 60-69: Add a Docker health check to the hadesArtemisAdapter
service and make hadesLogManager and hadesScheduler depend on its healthy
status: define a healthcheck block for hadesArtemisAdapter (e.g., using curl or
nc against port 8082 and an appropriate interval/retries/start-period) and
update the depends_on in hadesLogManager and hadesScheduler to include
hadesArtemisAdapter with condition: service_healthy so they wait until the
adapter is ready before starting.
- Line 3: Remove the obsolete top-level "version" declaration (e.g. the line
"version: '3.7'") from the Compose YAML; delete that key entirely so the file
relies on the Compose Specification default (Compose v2+) and then validate the
remaining services/networks/volumes sections with docker compose config to
ensure no other schema issues.

In `@HadesLogManager/log_subscriber.go`:
- Around line 173-190: The insertion of the new watcherState
(dlm.watchers[jobID]) occurs after wg.Add(1) so a concurrent stopWatchingJobLogs
may find and cancel the new watcher and block on its wg.Wait() even before the
goroutine has started; this is intentional to avoid race conditions but
non-obvious—add a concise comment next to the block that creates watcherState
(mentioning dlm.watchers, watcherState, wg, oldWatcher, startWatchingJobLogs and
stopWatchingJobLogs) explaining that wg is incremented before map insertion so
stopWatchingJobLogs may wait for the newly-created goroutine to start/exit and
that this ordering is deliberate to prevent races and deadlocks.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 13ca7ec and 1724e50.

📒 Files selected for processing (5)
  • HadesLogManager/log_subscriber.go
  • HadesScheduler/docker/job.go
  • HadesScheduler/docker/step.go
  • compose.yml
  • docs/api/Create Build Job (Test Fail) with Result Parse Step.bru
🚧 Files skipped from review as they are similar to previous changes (1)
  • HadesScheduler/docker/step.go

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@HadesScheduler/docker/job.go`:
- Around line 32-36: The current merge logic builds envs from d.Metadata and
step.Metadata and then unconditionally sets envs["UUID"] = d.ID.String(), which
silently overwrites any caller-provided "UUID" in step.Metadata; change this to
avoid clobbering by either (a) using a reserved namespaced key (e.g.,
"HADES_UUID" or "hades.UUID") when assigning d.ID.String() instead of "UUID", or
(b) performing collision detection after merging and returning an error (or
logging and rejecting the payload) if step.Metadata already contains "UUID";
update the code paths around envs, step.Metadata, and the assignment of
d.ID.String() accordingly so the chosen approach is applied consistently.
- Around line 47-54: The handler in execute() currently treats all errors from
dockerStep.execute the same and allows ContinueOnError to suppress
log-publishing failures; update error propagation so publishing failures from
processContainerLogs (specifically failures from publisher.PublishJobLog) are
distinguished (e.g., wrap/return a sentinel or typed error from
processContainerLogs/dockerStep.execute) and in job.go's execute() check for
that error type and immediately return it (do not accumulate or continue on
ContinueOnError), while still accumulating container-exit errors into stepErr as
before; add a unit/integration test that forces publisher.PublishJobLog to fail
and asserts that execute() returns an error even when the step has
ContinueOnError == true.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 50aa513f-3157-4768-9fcc-6687959026a8

📥 Commits

Reviewing files that changed from the base of the PR and between 1724e50 and bdda573.

📒 Files selected for processing (2)
  • HadesScheduler/docker/job.go
  • shared/payload/payload.go

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/ci.yml:
- Around line 78-85: The workflow contains a duplicate job key "build-operator"
which causes YAML parsing to fail; remove the second "build-operator" job block
(the one using
ls1intum/.github/.github/workflows/build-and-push-docker-image.yml@v1.2.0 with
image-name ls1intum/hades/hades-operator and docker-file
./HadesScheduler/HadesOperator/Dockerfile) so only the original build-operator
job remains.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: be45de5f-8080-4f00-8277-cf8ff6f692b4

📥 Commits

Reviewing files that changed from the base of the PR and between bdda573 and 045d70b.

📒 Files selected for processing (2)
  • .github/workflows/ci.yml
  • Readme.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • Readme.md

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
HadesLogManager/go.mod (1)

3-3: Align Go versions across all modules to ensure consistent builds and toolchain management.

This project uses three different Go versions across its modules: go 1.24.0 (HadesAPI, shared), go 1.24.4 (HadesLogManager), and go 1.25.0 (HadesScheduler, HadesScheduler/HadesOperator). If these modules are deployed together, standardize on a single Go version to simplify toolchain management and avoid potential compatibility issues.

Go 1.24.4 is a valid release.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@HadesLogManager/go.mod` at line 3, Update all module go.mod files to use the
same Go version (choose go 1.24.4 as suggested) so builds are consistent; locate
the go directive in each module's go.mod (e.g., the go 1.24.0 entries in
HadesAPI and shared, and go 1.25.0 entries in HadesScheduler and
HadesScheduler/HadesOperator) and change them to "go 1.24.4" so every module
(HadesLogManager, HadesAPI, shared, HadesScheduler, HadesOperator) declares the
identical Go version.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.env.example:
- Line 8: The env var name in .env.example (HADESAPI_API_PORT) no longer matches
the key used by the application (see HadesAPI/main.go which binds
env:"API_PORT"); fix by restoring alignment: change the variable in .env.example
to API_PORT=8080 (or alternatively update the binding in main.go to
env:"HADESAPI_API_PORT" if you prefer the prefixed name) so that the example
file and the code (main.go's env binding) use the same environment variable
name.
- Around line 27-29: The API_ENDPOINT example in .env.example points to
localhost which is wrong for containerized Compose; update the API_ENDPOINT
value to a Docker-reachable host (e.g., the adapter service name used in
compose, such as http://artemis-adapter:8082/adapter/logs) so
HadesLogManager/processor.go will reach the adapter from inside the
hadesLogManager container, or alternatively add a separate "host-run" example
and clearly document which value to use when running outside Compose; ensure the
example matches the service name in compose.yml and update the comment
accordingly.

In `@docker-compose.dev.yml`:
- Around line 19-25: The docker-compose override sets API_ENDPOINT to
http://hadesArtemisAdapter:8082/adapter/logs but there is no hadesArtemisAdapter
service in this stack, so update the Compose config to either (A) expose a
configurable fallback endpoint for hadesLogManager by setting API_ENDPOINT from
an env var or to a local-dev reachable URL (e.g.
http://host.docker.internal:8082/adapter/logs) so HadesLogManager/processor.go
can post logs during development, or (B) add/wire a hadesArtemisAdapter service
into the same compose network with that service name so the hostname resolves;
pick one approach and apply it to the hadesLogManager service definition
(API_ENDPOINT env) or add the adapter service entry so local forwarding works
without manual network aliases.

In `@HadesLogManager/go.mod`:
- Line 7: Update the vulnerable module `github.com/klauspost/compress` in the
go.mod to version v1.18.3 or later to remediate CVE-2025-61728; locate the
`github.com/klauspost/compress` require/replace entry (alongside other modules
like `github.com/nats-io/nats.go`) and bump its version, then run module tidy
(e.g., `go get github.com/klauspost/compress@v1.18.3` and `go mod tidy`) to
update go.sum and verify the build.

---

Nitpick comments:
In `@HadesLogManager/go.mod`:
- Line 3: Update all module go.mod files to use the same Go version (choose go
1.24.4 as suggested) so builds are consistent; locate the go directive in each
module's go.mod (e.g., the go 1.24.0 entries in HadesAPI and shared, and go
1.25.0 entries in HadesScheduler and HadesScheduler/HadesOperator) and change
them to "go 1.24.4" so every module (HadesLogManager, HadesAPI, shared,
HadesScheduler, HadesOperator) declares the identical Go version.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 28a9830e-8151-4beb-9e70-50ab2a2a0e00

📥 Commits

Reviewing files that changed from the base of the PR and between 045d70b and d9029e1.

⛔ Files ignored due to path filters (3)
  • HadesLogManager/go.sum is excluded by !**/*.sum
  • HadesScheduler/HadesOperator/go.sum is excluded by !**/*.sum
  • go.work is excluded by !**/*.work
📒 Files selected for processing (7)
  • .env.example
  • .github/workflows/ci.yml
  • HadesLogManager/go.mod
  • HadesScheduler/HadesOperator/go.mod
  • compose.yml
  • docker-compose.deploy.yml
  • docker-compose.dev.yml
🚧 Files skipped from review as they are similar to previous changes (1)
  • docker-compose.deploy.yml

@paoxin paoxin requested review from Mtze and robertjndw March 9, 2026 14:57
@paoxin paoxin deployed to hades-test March 9, 2026 15:38 — with GitHub Actions Active
Copy link
Copy Markdown
Member

@Mtze Mtze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please have a look at #388 - this changes some values for traefik. please also reflect these changes in here

@paoxin paoxin requested a review from Mtze March 13, 2026 15:17
@paoxin paoxin changed the title feat/ add HadesArtemisAdapter feat/ add log forwarding in HadesLogManager and ContinueOnError step support Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

send build results to artemis send build logs to artemis from adapter (Hades) matching logs and results by job ID

3 participants