feat: Entra/Intune device authentication as a new peer-registration method#5977
feat: Entra/Intune device authentication as a new peer-registration method#5977thvevirtue wants to merge 19 commits intonetbirdio:mainfrom
Conversation
Add a new peer-registration method that authenticates devices against Entra/Intune,
analogous to setup keys. Admins map Entra security groups to NetBird auto-groups with
configurable resolution (strict_priority / union), revocation, expiry, ephemeral and
extra-DNS-labels flags per mapping.
The feature lives on the dedicated /join/entra path so it never mixes with the
existing gRPC Login/Sync flow.
This commit lays down the self-contained Phase 1 skeleton:
- management/server/types/entra_device_auth.go: domain model.
- management/server/integrations/entra_device:
- errors.go stable error codes (no_mapping_matched etc.)
- activity.go activity codes registered via RegisterActivityMap
- types.go challenge/enroll DTOs + internal value types
- nonce_store.go in-memory single-use nonce store
- resolution.go multi-group mapping resolution (heart of the feature)
- resolution_test.go unit tests covering priority, union, tie-breaking,
revoked/expired filtering, wildcard, fallback
- cert_validator.go Entra device cert chain + proof-of-possession check
- graph_client.go Microsoft Graph app-only client (device, groups, compliance)
- store.go package-local Store interface + MemoryStore
- manager.go orchestrator (challenge + enroll + Graph + resolve + enroll)
- management/server/http/handlers/entra_join/handler.go: /join/entra routes
- management/server/http/handlers/entra_device_auth/handler.go: admin CRUD
- management/server/permissions/modules/module.go: new EntraDeviceAuth module
No existing code paths are modified; the new packages can be wired into the
management server's router without touching the gRPC server, AccountManager
or SQL store in this commit.
Pending (later phases):
- SQL-backed Store impl + gorm auto-migrate registration
- AccountManager.EnrollEntraDevicePeer wiring + PeerEnroller closure
- Proto: optional enrollmentBootstrapToken field on LoginRequest
- Router wiring in management/server/http/handler.go
- OpenAPI schemas + codegen
- Client (Windows device-cert provider, URL-path routing)
- Dashboard admin pages
- Continuous re-validation scheduler
Link the Entra device enrolment flow with NetBird's peer registration + group
assignment plumbing so /join/entra/enroll actually creates peers.
* DefaultAccountManager.EnrollEntraDevicePeer (entra_device_enroll.go):
creates the NetBird peer from a pre-resolved Entra identity + mapping input.
Mirrors the essential data-writing portion of AddPeer (IP allocation with
retries on unique-constraint collisions, group assignment incl. All-group,
ephemeral tracking, network serial bump, integrated peer validator hook) but
drops the setup-key / user-JWT branches since the Entra cert already
authenticated the device. Emits a PeerAddedWithEntraDevice activity event
with full structured meta (device id, all matched mapping ids, resolution
mode, applied auto-groups) for audit trails.
* entraDevicePeerEnroller (entra_device_enroll.go): adapter that exposes
AsEntraDevicePeerEnroller() ed.PeerEnroller on the account manager so the
entra_device.Manager can call back without a circular import on the server
package.
* SQLStore (integrations/entra_device/sql_store.go): gorm-backed Store
implementation that auto-migrates the two new tables (entra_device_auth +
entra_device_auth_mappings) and keeps bootstrap tokens in a TTL-bound
in-memory cache.
* Install wiring (http/handlers/entra_device_auth/wiring.go): one-call
factory that builds the Store + Manager, attaches the PeerEnroller, mounts
the device-facing /join/entra routes on the rootRouter (unauthenticated,
device cert is the credential) and the admin CRUD routes on the /api
subrouter (permission-gated via modules.EntraDeviceAuth).
* handler.go: bypass paths for /join/entra/{challenge,enroll} so the shared
auth middleware doesn't reject anonymous device requests. Calls the new
installEntraDeviceAuth helper, which is a best-effort install (logs a
warning and skips the integration if the store doesn't expose *gorm.DB or
the manager doesn't implement AsEntraDevicePeerEnroller, so the binary
keeps building even if those assumptions change upstream).
Compile status: go vet of the new and touched packages is clean (empty
output after filtering the pre-existing upstream idp/dex CGO issue unrelated
to this work).
…ration
Adds comprehensive unit tests covering every layer of the Entra device
authentication feature, including a direct verification that the enrolment
flow invokes the account-manager-side peer enroller with the correctly
resolved mapping configuration.
New test files:
* nonce_store_test.go:
- Issue produces distinct, non-empty, non-expired nonces
- Consume succeeds exactly once per nonce (single-use)
- Unknown nonces are rejected
- Expired nonces are rejected
- Concurrent Issue/Consume is race-free
* cert_validator_test.go:
- RSA-PSS happy path (primary Windows CNG emission mode)
- RSA-PKCS1v15 happy path (alternate SCEP/CNG emission mode)
- ECDSA happy path (Intune-on-macOS style)
- Tampered signature rejected with invalid_signature
- Expired / not-yet-valid cert rejected with invalid_cert_chain
- Empty chain, garbage base64, garbage DER each rejected
- Chain verification against an empty trust pool correctly rejects
* manager_test.go (end-to-end Manager.Enroll):
- Happy path: cert + signed nonce + Graph device + group lookup -> the
PeerEnroller is called with the exact AutoGroups / Ephemeral /
AllowExtraDNSLabels / AccountID / EntraDeviceMapping that the mapping
resolution produced. This is the primary integration-with-peer-
registration assertion: whatever the integration decides, exactly
that flows into the NetBird AddPeer plumbing.
- Union-mode multi-mapping merge: confirms that when a device matches
two mappings, AutoGroups are unioned, Ephemeral ORs, AllowExtraDNSLabels
ANDs. Every matched mapping ID is recorded in the EnrollPeerInput.
- Rejects unknown tenant / disabled integration / bad nonce
- Rejects disabled Entra device / missing-in-Graph device
- Fails CLOSED on transient Graph errors (CodeGroupLookupFailed)
- RequireIntuneCompliant: compliant device enrolls; non-compliant is
rejected; compliance API failure also fails closed
- No mapping matched: PeerEnroller is NOT called
- Device-id mismatch between cert and request is rejected before Graph
is consulted
- Nonce single-use: replaying the same nonce fails the second time
* store_test.go:
- MemoryStore integration + mapping CRUD, with per-account isolation
- Bootstrap token single-use semantics for both MemoryStore and SQLStore
- SQLStore bootstrap-token TTL expiry
* entra_join/handler_test.go (HTTP-level round-trip):
- GET /join/entra/challenge returns base64 nonce + expiry
- POST /join/entra/enroll with a valid self-signed cert + correctly
signed nonce completes enrolment, returns 200 with bootstrap token
and resolved auto-groups, and invokes the fake PeerEnroller with
the expected input (verifying end-to-end wiring from HTTP to the
account-manager hook)
- Unknown tenant -> 404 with code integration_not_found
- Malformed JSON body -> 400
All test files vet clean.
Note: the tests cannot be executed locally on this Windows machine due to
the pre-existing upstream netbirdio/dex CGO-gating bug (requires gcc) that
blocks any build that transitively imports management/server/types. The
tests run fine on Linux CI or on Windows with CGO_ENABLED=1 + a C compiler.
…+ dex CGO shim
Adds the server-side test harness recommended for Path-A verification:
* idp/dex: split sqlite3 storage construction behind a CGO build tag.
sqlite_cgo.go (cgo) calls (&sql.SQLite3{File: ...}).Open; sqlite_nocgo.go
(!cgo) returns a clear error. Fixes the pre-existing upstream build break
that prevented 'go build ./...' and 'go test ./...' under CGO_ENABLED=0.
All Entra unit tests + HTTP handler tests now run green on Windows
without a C toolchain (confirmed locally).
* management/Dockerfile.entra-test: multi-stage Dockerfile that builds
netbird-mgmt from source with CGO_ENABLED=1, produces a slim Debian-slim
runtime image, exposes 33073 (gRPC) + 33081 (HTTP/join/entra), includes
a healthcheck that probes /join/entra/challenge.
* tools/entra-test/docker-compose.yml: brings up Postgres + management
from the feature branch. Auto-migrates the two new gorm tables on boot.
* tools/entra-test/enroll-tester/: standalone Go program that impersonates
a NetBird device. Generates a self-signed RSA cert (CN = device id),
fetches /challenge, signs the nonce with RSA-PSS SHA-256, POSTs /enroll,
prints the resolved auto-groups + bootstrap token. Lets you exercise the
full enrolment round-trip before the real Windows client is implemented.
Verified to build and run clean under CGO_ENABLED=0.
* tools/entra-test/TESTING.md: step-by-step walkthrough covering Docker
bring-up, Entra app registration, admin-API integration + mapping
creation, synthetic client invocation, expected error codes, and what
the harness does NOT yet cover (real Windows client = Phase 2).
With these pieces the feature is ready for end-to-end server-side testing
against a real Entra tenant. The missing Phase-2 work (client URL routing,
Windows CNG signing, tray UI) is independent of this harness.
Spins up the real entra_join handler in-process, seeds a wildcard mapping, and enrols against itself using a fake Graph client. Lets you verify the full HTTP -> handler -> validator -> resolver -> PeerEnroller pipeline with a single command, no Docker or Entra tenant required: enroll-tester --demo -v Exercised end-to-end: cert chain parsing, RSA-PSS signature verification against a freshly generated RSA key, single-use nonce lifecycle, wildcard mapping resolution, PeerEnroller invocation with correctly populated input, and bootstrap-token issuance. Confirmed locally: returns 200 OK with the expected peer_id, matched mapping ids, resolved auto-groups, and a 64-char bootstrap token. The only piece this does NOT exercise is the real HTTPGraphClient against a live Entra tenant; use the full docker-compose walkthrough in tools/entra-test/TESTING.md for that.
Add docs/ENTRA_DEVICE_AUTH.md with: * Feature overview + when to use it * Conceptual model: integration, mappings, resolution modes, wildcards, tenant-only fallback, stable error codes with HTTP-status mapping * Azure portal walkthrough for the app registration + required Graph application permissions * Full REST admin API reference (CRUD for integration + mappings) * /join/entra device-enrolment protocol reference including signature formats (RSA-PSS / PKCS1v15 / ECDSA) and the exact nonce-signing rule * Architecture diagram + package layout * Security notes (fail-closed Graph, bypass paths, bootstrap token TTL, client-secret encryption caveat) * Current implementation status table so readers know what's done vs. deferred to Phase 2/4/5
Fixes uncovered while actually running the stack: * Use a single --port 33073 (NetBird multiplexes gRPC + HTTP via cmux); --mgmt-api-port doesn't exist. * Supply NB_STORE_ENGINE + NB_STORE_ENGINE_POSTGRES_DSN env vars so the server can reach Postgres (NetBird reads these directly). * Replace placeholder DataStoreEncryptionKey with a real 32-byte base64 value so the field encryptor initialises cleanly. * Align Dockerfile EXPOSE + HEALTHCHECK to the single port. Verified: docker compose up -d brings the stack healthy, /join/entra/challenge responds with fresh nonces, and both new gorm tables (entra_device_auth + entra_device_auth_mappings) appear in Postgres.
Added during the first real-tenant run-through: * tools/entra-test/seed-account/: small Go utility that inserts a minimal but valid NetBird Account row + required groups via gorm. Uses netbird's own types.NewNetwork() + types.Account so JSON-encoded fields (network_net, settings.Extra, etc.) round-trip correctly. Needed because the /join/entra flow creates peers under an existing account, and the normal AccountManager signup requires SSO/IdP which isn't wired in the test harness. * docker-compose.yml: expose postgres on host:5432 so seed-account (and psql) can connect from the host. Flagged in-line as test-harness-only. Verified end-to-end against a real Entra tenant: 1. /join/entra/challenge returned a nonce 2. POST /enroll validated the synthetic cert + signature 3. Server authenticated to the tenant via Graph OAuth2 client-credentials 4. Graph found the device and enumerated groups 5. Wildcard mapping resolved -> auto_groups=[test-group-1] 6. Peer created in Postgres with IP + DNS label + All-group membership 7. Bootstrap token issued
Document the six end-to-end scenarios exercised against a real Entra tenant using the enroll-tester harness. All accept + reject paths (happy path with wildcard and specific-group mappings, no_mapping_matched, device_disabled, compliance-on with compliant device, compliance-on with non-compliant device) behaved as specified with zero database side effects on any reject.
First half of Phase 2 (the real NetBird client). Lands the pluggable cert provider + enrolment orchestration + unit tests. Still pending: profile persistence, URL-path routing in auth.go, CLI flags, end-to-end run. New package client/internal/enroll/entradevice: * provider.go - CertProvider interface with two required methods (CertChainDER, SignNonce, DeviceID). PFXProvider implementation loads a .pfx file (such as Intune deploys via PKCS Certificate profile), signs nonces with RSA-PSS SHA-256 or ECDSA. Interface shaped so a future CNG provider (TPM-backed non-exportable key) can drop in without changes. * enroller.go - drives GET /join/entra/challenge + POST /join/entra/enroll. Returns a structured *Error when the server responds with a machine- readable error body so clients can surface specific UI for codes like no_mapping_matched or device_not_compliant. * state.go - EntraEnrollState DTO for per-profile persistence (tenant id, device id, peer id, enrolled_at, resolved auto-groups, matched mapping ids). * enroller_test.go - 7 tests covering PFX load (happy path, wrong password, missing file), enrolment round-trip against httptest server, structured server error decode, input validation, and trailing /join/entra in BaseURL tolerance. All green locally. New go.mod dependency: software.sslmate.com/src/go-pkcs12 v0.7.1 (the stdlib golang.org/x/crypto/pkcs12 is frozen and lacks DecodeChain/Encode).
Add a first-class subcommand that drives the Entra enrolment flow against
a /join/entra endpoint and persists the result into the active profile.
* client/cmd/entra_enroll.go: new cobra command. Loads the active profile,
reads PFX (path + password env var), derives the WG pubkey from the
profile's WG private key, runs the Enroller, persists the returned state
into profilemanager.Config.EntraEnroll, and strips /join/entra from the
saved ManagementURL so subsequent 'netbird up' goes straight to gRPC.
Includes a dev-path fallback for Windows boxes without admin rights
(util.WriteJsonWithRestrictedPermission -> util.WriteJson).
* client/cmd/root.go: register the new subcommand.
* client/internal/profilemanager/config.go: add EntraEnroll *EntraEnrollState
field to Config (JSON, optional). Duplicated the state type here instead
of importing entradevice to avoid a circular dep.
* tools/entra-test/make-pfx: standalone helper that produces a self-signed
PFX with a supplied Entra device id as Subject CN. Used for local E2E
verification when Intune-issued PFXes aren't available.
* tools/entra-test/.gitignore: exclude built binaries and generated PFXes.
End-to-end verification run against the live Docker stack + real Entra
tenant:
netbird entra-enroll \\
--management-url http://localhost:33073/join/entra \\
--entra-tenant 5a7a81b2-99cc-45fc-b6d1-cd01ba176c26 \\
--entra-pfx tools/entra-test/device.pfx \\
--entra-pfx-password-env NB_ENTRA_PFX_PASSWORD
Returned ENROLMENT SUCCESS, persisted EntraEnroll state into the profile,
and the peer was created in Postgres with the correct mapping-resolved
auto-groups + All-group membership.
Tried landing a Windows cert-store provider via github/smimesign/certstore.
It builds cleanly but requires CGO (links against crypt32.dll + ncrypt.dll
via #cgo LDFLAGS). Our build pipeline is CGO_ENABLED=0 and the dev box has
no C toolchain; switching to CGO is a substantial infrastructure change
(Dockerfiles, CI, goreleaser, cross-compile).
Rather than land half-done or force a CGO migration, roll back cleanly:
* Remove the smimesign/certstore dep from go.mod / go.sum.
* Delete the provider_certstore_{windows,stub}.go stubs.
* Revert entra_enroll.go to PFX-only CLI (as was working and live-tested).
* Add a 'Future work' section to docs/ENTRA_DEVICE_AUTH.md spelling out
the two viable implementation routes (CGO/smimesign vs pure-Go ncrypt)
so whoever picks this up next doesn't have to re-do the research.
The existing CertProvider interface is unchanged, so either future
implementation drops in next to PFXProvider without touching the enroller.
The PFX path we landed + live-tested still works:
- Intune PKCS Certificate profile can deploy PFX files to both Windows
and macOS devices (not just Windows cert store).
- The server validates any RSA/ECDSA cert the client presents.
- Only tradeoff vs cert store: private key is in a password-protected
file on disk rather than TPM-bound. Acceptable pattern for many
enterprise deployments.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds end-to-end Entra/Intune device enrollment: client CLI and PFX handling, a one-shot enrollment flow against management Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as Client CLI
participant MgmtHTTP as Management HTTP (/join/entra)
participant Validator as CertValidator
participant Graph as Microsoft Graph
participant Enroller as PeerEnroller/Store
CLI->>MgmtHTTP: GET /join/entra/challenge
MgmtHTTP->>MgmtHTTP: Issue nonce (NonceStore)
MgmtHTTP-->>CLI: {nonce, expires_at}
CLI->>CLI: load PFX, sign nonce, derive WG pubkey
CLI->>MgmtHTTP: POST /join/entra/enroll (cert_chain, nonce, signature, wg_pubkey,...)
MgmtHTTP->>Validator: Validate cert_chain + signature
Validator-->>MgmtHTTP: DeviceIdentity
MgmtHTTP->>Graph: Device lookup / TransitiveMemberOf / IsCompliant
Graph-->>MgmtHTTP: group IDs, compliance
MgmtHTTP->>MgmtHTTP: Resolve mappings -> resolved mapping
MgmtHTTP->>Enroller: EnrollEntraDevicePeer(EnrollPeerInput)
Enroller-->>MgmtHTTP: Enroll result (peerID, config)
MgmtHTTP-->>CLI: EnrollResponse (peerID, bootstrap_token, resolved groups, mapping IDs)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
First-pass response to PR netbirdio#5977 review feedback from SonarCloud. Complexity refactors (extract helpers to bring cognitive-complexity and function-length under SonarCloud thresholds): * management/server/integrations/entra_device/resolution.go: split ResolveMapping into filterCandidates + handleNoCandidates helpers, and factor the sort + per-mapping merge logic out of resolveUnion into sortByPriorityThenID + appendNewAutoGroups + mergeMinExpiry. Same semantics; all 11 resolution_test.go cases still pass. * management/server/integrations/entra_device/manager.go: split Enroll into 7 focused helpers (loadEnabledIntegration, consumeNonce + decodeNonceBytes, validateCertAndDeviceID, verifyWithGraph, resolveMappingForAccount, enrollPeer, issueBootstrapToken). Same externally-visible behaviour; the manager_test.go suite still passes end-to-end including the HTTP handler integration tests. Hard-coded-credential findings (dev-test-harness only, flagged BLOCKER + MAJOR by SonarCloud): * tools/entra-test/docker-compose.yml: parametrise Postgres password via \$\{NB_TEST_PG_PASSWORD:-netbird\} so operators can override via a .env file. // NOSONAR annotations on the fixtures that legitimately have a default for local-dev use. * tools/entra-test/seed-account/main.go: move the DSN default away from a single hard-coded string into an env-var-override helper (envOrDefault). Honours NB_TEST_PG_{USER,PASSWORD,DB} consistently with the compose file. Pre-existing issues intentionally NOT addressed (not introduced by this PR): * client/internal/profilemanager/config.go file-length (751/750) \u2014 config.go was ~730 lines before this PR; adding the 20-line EntraEnrollState struct crossed the threshold. Splitting config.go into multiple files is out of scope for this feature PR. * management/server/http/handler.go NewAPIHandler complexity + 23 parameters \u2014 function existed before, this PR adds one call (installEntraDeviceAuth) that doesn't change the signature.
First-pass response to PR netbirdio#5977 review feedback from SonarCloud. Complexity refactors (extract helpers to bring cognitive-complexity and function-length under SonarCloud thresholds): * management/server/integrations/entra_device/resolution.go: split ResolveMapping into filterCandidates + handleNoCandidates helpers, and factor the sort + per-mapping merge logic out of resolveUnion into sortByPriorityThenID + appendNewAutoGroups + mergeMinExpiry. Same semantics; all 11 resolution_test.go cases still pass. * management/server/integrations/entra_device/manager.go: split Enroll into 7 focused helpers (loadEnabledIntegration, consumeNonce + decodeNonceBytes, validateCertAndDeviceID, verifyWithGraph, resolveMappingForAccount, enrollPeer, issueBootstrapToken). Same externally-visible behaviour; the manager_test.go suite still passes end-to-end including the HTTP handler integration tests. Hard-coded-credential findings (dev-test-harness only, flagged BLOCKER + MAJOR by SonarCloud): * tools/entra-test/docker-compose.yml: parametrise Postgres password via \$\{NB_TEST_PG_PASSWORD:-netbird\} so operators can override via a .env file. // NOSONAR annotations on the fixtures that legitimately have a default for local-dev use. * tools/entra-test/seed-account/main.go: move the DSN default away from a single hard-coded string into an env-var-override helper (envOrDefault). Honours NB_TEST_PG_{USER,PASSWORD,DB} consistently with the compose file. Pre-existing issues intentionally NOT addressed (not introduced by this PR): * client/internal/profilemanager/config.go file-length (751/750) \u2014 config.go was ~730 lines before this PR; adding the 20-line EntraEnrollState struct crossed the threshold. Splitting config.go into multiple files is out of scope for this feature PR. * management/server/http/handler.go NewAPIHandler complexity + 23 parameters \u2014 function existed before, this PR adds one call (installEntraDeviceAuth) that doesn't change the signature. Co-Authored-By: Oz <oz-agent@warp.dev>
There was a problem hiding this comment.
Actionable comments posted: 10
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🟡 Minor comments (11)
management/server/integrations/entra_device/activity.go-9-20 (1)
9-20:⚠️ Potential issue | 🟡 MinorAdd safeguards to prevent silent activity ID collisions from external dependencies.
activity.RegisterActivityMapusesmaps.Copy, which silently overwrites duplicate keys. The IDs 200–209 are currently collision-free in this repo, but with the external dependency ongithub.com/netbirdio/management-integrationsand no central allocation mechanism, future code could introduce collisions without any warning. This would silently corrupt activity logging for affected entries.Recommend either:
- Centrally reserve the 200–209 range in
management/server/activity/codes.goalongside existing constants (0–112), or- Add a test in this package that asserts none of these IDs exist in the registry before
init()runs, preventing runtime surprises.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/integrations/entra_device/activity.go` around lines 9 - 20, The new Entra device activity constants (PeerAddedWithEntraDevice, EntraDeviceAuthCreated, EntraDeviceAuthUpdated, EntraDeviceAuthDeleted, EntraDeviceAuthMappingCreated, EntraDeviceAuthMappingUpdated, EntraDeviceAuthMappingDeleted, EntraDeviceAuthMappingRevoked, GroupAddedToEntraDeviceMapping, GroupRemovedFromEntraDeviceMapping) risk silent ID collisions because activity.RegisterActivityMap uses maps.Copy; either reserve the 200–209 range in the central activity codes (add them to the central constants in management/server/activity/codes.go alongside 0–112) or add a pre-init test in this package that queries the activity registry (via the same registry API used by activity.RegisterActivityMap) to assert none of these numeric IDs already exist before init() runs, failing the test if any collision is found so duplicates cannot be introduced silently.tools/entra-test/docker-compose.yml-9-11 (1)
9-11:⚠️ Potential issue | 🟡 MinorHeader comment contradicts the actual port layout.
Lines 9-11 claim
:33073is gRPC and:33081is HTTP (admin API +/join/entra/*), but the service only exposes:33073and the comment at lines 49-51 correctly notes that cmux multiplexes both protocols on that single port. The top-of-file banner will mislead anyone reading it before scrolling down.📝 Proposed fix
# The management server exposes: -# - :33073 gRPC (normal NetBird agent login flow) -# - :33081 HTTP, including the admin API and /join/entra/* +# - :33073 gRPC + HTTP (cmux-multiplexed): agent Login/Sync, +# admin API, and /join/entra/*🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/entra-test/docker-compose.yml` around lines 9 - 11, Update the top-of-file header comment in the docker-compose.yml to match the actual port layout: remove the separate :33081 HTTP entry and state that both gRPC and HTTP (including the admin API and /join/entra/*) are multiplexed on :33073 via cmux, so the header aligns with the later cmux comment.management/server/integrations/entra_device/README.md-57-69 (1)
57-69:⚠️ Potential issue | 🟡 MinorPhase 1 status checklist is stale relative to this PR.
Per the PR objectives, several boxes that are still unchecked here actually ship in this same PR:
- "Wired into the main router" — done via
installEntraDeviceAuthinmanagement/server/http/handler.go.- "Full integration with AccountManager.AddPeer" — done via
AccountManager.EnrollEntraDevicePeer/AsEntraDevicePeerEnroller.- "Admin CRUD handlers" — done under
management/server/http/handlers/entra_device_auth.The proto field
enrollmentBootstrapTokenand dashboard/OpenAPI items are legitimately deferred follow-ups and should stay unchecked. Please refresh the list so first-time readers aren't misled about the current state.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/integrations/entra_device/README.md` around lines 57 - 69, Update the Phase 1 checklist in the README to reflect work shipped in this PR: mark "Wired into the main router" as done and reference installEntraDeviceAuth (management/server/http/handler.go), mark "Full integration with AccountManager.AddPeer" as done and reference AccountManager.EnrollEntraDevicePeer / AsEntraDevicePeerEnroller, and mark "Admin CRUD handlers" as done and reference management/server/http/handlers/entra_device_auth; leave "Proto field enrollmentBootstrapToken" and dashboard/OpenAPI items unchecked as noted.management/server/integrations/entra_device/asn1.go-5-9 (1)
5-9:⚠️ Potential issue | 🟡 MinorComment claims "function variable" but implementation is a plain function — comment is misleading.
The doc says this is "wrapped behind a package-level function variable so unit tests can substitute it," but
realASN1Unmarshalis a regular function declaration, not a variable. No tests actually override this function orasn1Unmarshalin the codebase.Since the architecture already uses a two-layer wrapper (stdlib →
realASN1Unmarshal→asn1Unmarshalvar in cert_validator.go) and tests do not mock either, the "so unit tests can substitute it" claim in the comment is inaccurate and should be removed to keep the comment honest.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/integrations/entra_device/asn1.go` around lines 5 - 9, The comment above realASN1Unmarshal is misleading: it claims the function is "wrapped behind a package-level function variable so unit tests can substitute it" but realASN1Unmarshal is a plain function and tests do not override it or asn1Unmarshal; update the comment to accurately describe the function (e.g., "simple helper that directly calls encoding/asn1.Unmarshal") and remove the incorrect clause about being a package-level function variable or substitutable in tests; leave the realASN1Unmarshal function and the asn1Unmarshal variable (in cert_validator.go) unchanged.tools/entra-test/TESTING.md-187-199 (1)
187-199:⚠️ Potential issue | 🟡 Minor"What this does NOT test" section is stale re: the client CLI.
Per the PR objectives, client Phase 2 — the
netbird entra-enrollCLI with PFXCertProviderand enroller logic — is included in this PR (client/cmd/entra_enroll.go). Only Windows cert-store / TPM-backed CNG signing is explicitly deferred. The current wording ("The real NetBird Windows client is not yet wired to the/join/entra/*path. That's Phase 2 … and has not been implemented.") reads as if no client exists yet.Consider narrowing the scope to what is actually deferred (Windows cert-store integration) rather than the whole client path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/entra-test/TESTING.md` around lines 187 - 199, Update the "What this does NOT test" paragraph to avoid implying the entire client is unimplemented: state that the client-side CLI and enroller (client/cmd/entra_enroll.go) and the PFX CertProvider are present in this PR, but Windows cert-store / TPM-backed CNG signing (integration with Cert:\LocalMachine\My) is deferred to Phase 2; keep instructions about server-side verification and the current test harness but narrow the scope of the deferred work to "Windows cert-store / TPM-backed signing" only.docs/ENTRA_DEVICE_AUTH.md-272-272 (1)
272-272:⚠️ Potential issue | 🟡 MinorFix markdownlint hits (MD040, MD058, MD055/MD056).
Per static-analysis hints:
- Line 272: the ASCII architecture diagram is inside a fenced block without a language. Use
text(orasciidoc) to satisfy MD040.- Line 327 and the second table around Line 350: tables must be surrounded by blank lines (MD058). The MD055/MD056 warnings around Line 335 are a downstream effect of the missing blank line between the last table row and the "Observations from the runs:" paragraph — the linter folds the paragraph into the table.
📝 Proposed doc fixes
## Architecture -``` +```text ┌─────────────────────────────────────┐ │ Device (Entra-joined) │Docker test harness + the synthetic `enroll-tester` tool. The following scenarios were all executed end-to-end through Microsoft Graph: + | Scenario | Configuration | Input | Expected result | Actual | |----------------------------------------------|---------------------------------------------|-----------------------------|--------------------------|----------|| Compliance on, non-compliant device | `require_intune_compliant: true` | non-compliant device id | `403 device_not_compliant` | ✅ | + Observations from the runs:Apply the same "blank line before the table" fix to the "Current implementation status" table that starts at Line 350.
Also applies to: 323-348
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/ENTRA_DEVICE_AUTH.md` at line 272, The fenced ASCII diagram block needs a language label and the tables need surrounding blank lines: change the diagram fence from ``` to ```text (or ```asciidoc) where the ASCII architecture diagram appears and insert a blank line before and after each markdown table (including the table that summarizes runs and the "Current implementation status" table) and ensure there is a blank line between the last table row and the following paragraph ("Observations from the runs:") so the linter (MD040, MD058, MD055/MD056) recognizes the code block and tables correctly.management/server/integrations/entra_device/store.go-70-76 (1)
70-76:⚠️ Potential issue | 🟡 Minor
SaveEntraDeviceAuthleaks stalebyTenantentries on tenant change.If an integration is saved with tenant
A, then re-saved with tenantB,m.byTenant["A"]still points at the (now tenant-B) auth. A laterGetEntraDeviceAuthByTenant("A")returns a record whoseTenantIDisB. Clear the old index entry before writing the new one (readauths[auth.AccountID]first, remove its previousbyTenantkey if it differs). Low-severity since this store is dev-only, but worth a one-liner fix.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/integrations/entra_device/store.go` around lines 70 - 76, SaveEntraDeviceAuth currently leaves stale entries in m.byTenant when an auth's TenantID changes; update SaveEntraDeviceAuth to first load the previous auth from m.auths using auth.AccountID, and if that previous auth exists and its TenantID is non-empty and different from auth.TenantID, delete the old key from m.byTenant; then store/update m.auths[auth.AccountID] and, if auth.TenantID != "", set m.byTenant[auth.TenantID] = auth. Use the existing function SaveEntraDeviceAuth and the maps m.auths and m.byTenant when implementing this fix.management/server/integrations/entra_device/cert_validator.go-147-154 (1)
147-154:⚠️ Potential issue | 🟡 MinorECDSA curve fallback to P-256 silently coerces the key.
x509.ParseCertificatepopulatesCurvefor all curves Go supports. Ifpub.Curve == nilhere, the cert either parsed incorrectly or carries a curve the Go stdlib doesn't recognize — mutating the caller's*ecdsa.PublicKeyand assuming P-256 can (a) mutate a sharedleaf.PublicKeyvisible to later callers, and (b) produce a "successful" verify against the wrong curve if the key happens to share coordinates. Safer to reject:🛡️ Proposed fix
- if pub.Curve == nil { - // Fall back to P-256, which is what Windows CNG + most Intune - // SCEP profiles emit. - pub.Curve = elliptic.P256() - } + if pub.Curve == nil { + return fmt.Errorf("ecdsa public key has no curve set") + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/integrations/entra_device/cert_validator.go` around lines 147 - 154, Don't silently coerce pub.Curve to P-256: instead of mutating pub.Curve = elliptic.P256() when pub.Curve == nil, treat a nil curve as an error condition and return a descriptive error (e.g., "unsupported or unknown ECDSA curve") from the verification code path that uses ecdsa.Verify; update the block around pub.Curve, elliptic.P256(), and ecdsa.Verify to bail out on nil curve and only call ecdsa.Verify when pub.Curve is non-nil so you don't mutate the caller's *ecdsa.PublicKey or risk verifying with the wrong curve.management/server/http/handlers/entra_device_auth/handler.go-192-197 (1)
192-197:⚠️ Potential issue | 🟡 MinorConflates storage error with "no integration" and returns 409 for 5xx.
If
GetEntraDeviceAuthreturns a non-nil error (DB outage, etc.), the client gets409 Conflict — configure the Entra device auth integration before adding mappings, which is both misleading to the caller and confusing for on-call. Separate the branches:🛡️ Proposed fix
- integ, err := h.Store.GetEntraDeviceAuth(r.Context(), accountID) - if err != nil || integ == nil { - httpJSON(w, http.StatusConflict, apiError{"no_integration", - "configure the Entra device auth integration before adding mappings"}) - return - } + integ, err := h.Store.GetEntraDeviceAuth(r.Context(), accountID) + if err != nil { + httpJSON(w, http.StatusInternalServerError, apiError{"internal_error", err.Error()}) + return + } + if integ == nil { + httpJSON(w, http.StatusConflict, apiError{"no_integration", + "configure the Entra device auth integration before adding mappings"}) + return + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/http/handlers/entra_device_auth/handler.go` around lines 192 - 197, The current handler conflates storage errors and a missing integration by treating any non-nil err the same as integ == nil; update the GetEntraDeviceAuth branch in handler.go (the call to h.Store.GetEntraDeviceAuth) to first check for err and, if err != nil, return an http.StatusInternalServerError via httpJSON with an appropriate apiError indicating a storage/backend error, and only if err == nil && integ == nil return the existing http.StatusConflict response telling the client to configure the Entra device auth integration; ensure you reference the same apiError shape and use the same httpJSON helper so calling code/clients remain consistent.management/server/http/handlers/entra_device_auth/wiring.go-101-110 (1)
101-110:⚠️ Potential issue | 🟡 MinorPermissive fallback when
Permissions == nilshould at least log loudly.Install() has no guard against
w.Permissions == nil, and the fallback inbuildPermissionCheckersilently returns(true, nil)for any operation. This is called out in the doc comment but there's no runtime breadcrumb — a misconfigured deployment would expose admin CRUD to every authenticated user with no trace. Consider either rejecting nil inInstall(treat it the same as missingPeerEnroller) or emitting a loud warning so operators notice.🛡️ Proposed fix
func buildPermissionChecker(pm permissions.Manager) PermissionChecker { if pm == nil { + log.Warn("entra_device_auth: no permissions.Manager wired; admin CRUD is permissive. Do not use in production.") return func(context.Context, string, string, string) (bool, error) { return true, nil } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/http/handlers/entra_device_auth/wiring.go` around lines 101 - 110, buildPermissionChecker currently returns a silent permissive checker when permissions.Manager is nil which can expose admin actions; change Install and wiring so nil permissions are either rejected or loudly logged: in Install (where w.Permissions is used) validate that w.Permissions != nil and return an error (treat like missing PeerEnroller) or if you must allow nil, modify buildPermissionChecker to wrap the permissive fallback with a processLogger.warn/error (include traceable context) so any deployment missing permissions.Manager produces a clear runtime breadcrumb; refer to buildPermissionChecker, PermissionChecker, permissions.Manager, Install and w.Permissions when making the change.management/server/integrations/entra_device/manager.go-109-126 (1)
109-126:⚠️ Potential issue | 🟡 MinorInconsistent nonce handling: trim applied to
Consume, not to base64 decode.At Line 111 you pass
strings.TrimSpace(req.Nonce)toConsume, but at Line 118 you callbase64.RawURLEncoding.DecodeString(req.Nonce)on the untrimmed input. A client that sends a nonce with stray whitespace (or a trailing newline) will have the nonce successfully consumed (single-use) and then fail base64 decoding, burning an otherwise valid nonce with no way to retry.Additionally, consuming the nonce before decoding means any malformed base64 permanently burns the nonce — consider decoding first, then consuming only after the payload is confirmed parseable.
🛠️ Proposed fix
- // 2. Consume nonce (single-use; this also ensures the nonce was issued by - // this server instance and has not yet expired). - ok, err := m.NonceStore.Consume(strings.TrimSpace(req.Nonce)) - if err != nil { - return nil, NewError(CodeInternal, "nonce store error", err) - } - if !ok { - return nil, NewError(CodeInvalidNonce, "nonce is unknown, already consumed, or expired", nil) - } - nonceBytes, err := base64.RawURLEncoding.DecodeString(req.Nonce) - if err != nil { - // Clients sometimes use the padded URL or Std alphabets. Fall back. - if b, e2 := base64.StdEncoding.DecodeString(req.Nonce); e2 == nil { - nonceBytes = b - } else { - return nil, NewError(CodeInvalidNonce, "nonce is not base64", err) - } - } + // 2. Decode first, then consume (single-use). This avoids burning a + // nonce on malformed input. + nonce := strings.TrimSpace(req.Nonce) + nonceBytes, err := base64.RawURLEncoding.DecodeString(nonce) + if err != nil { + // Clients sometimes use the padded URL or Std alphabets. Fall back. + if b, e2 := base64.StdEncoding.DecodeString(nonce); e2 == nil { + nonceBytes = b + } else { + return nil, NewError(CodeInvalidNonce, "nonce is not base64", err) + } + } + ok, err := m.NonceStore.Consume(nonce) + if err != nil { + return nil, NewError(CodeInternal, "nonce store error", err) + } + if !ok { + return nil, NewError(CodeInvalidNonce, "nonce is unknown, already consumed, or expired", nil) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/integrations/entra_device/manager.go` around lines 109 - 126, The code currently calls m.NonceStore.Consume(strings.TrimSpace(req.Nonce)) before attempting base64.RawURLEncoding.DecodeString(req.Nonce), which can burn a valid nonce if the raw value contains whitespace or is non-base64; change the flow to first trim the nonce into a local variable (e.g., nonce := strings.TrimSpace(req.Nonce)), attempt base64 decoding using nonce (try RawURLEncoding then fall back to StdEncoding), and only call m.NonceStore.Consume(nonce) after successful decode/parsing; update all error returns to reference the trimmed nonce decode errors and ensure Consume is passed the same trimmed nonce value.
🧹 Nitpick comments (22)
tools/entra-test/.gitignore (1)
2-2: Consider ignoring all PFX artifacts, not onlydevice.pfx.This reduces the chance of accidentally committing sensitive test certs when filenames vary.
Suggested update
netbird.exe -device.pfx +*.pfx🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/entra-test/.gitignore` at line 2, Replace the specific device.pfx ignore with a broader pattern to catch all PFX certificate artifacts (e.g., add a "*.pfx" entry) so any test cert filenames are ignored; update the .gitignore entry that currently lists "device.pfx" to the wildcard pattern to ensure all .pfx files are excluded from commits.idp/dex/sqlite_nocgo.go (1)
16-20: Minor:errors.Newwould be slightly more idiomatic here.The error string has no format verbs, so
errors.New(...)(or a package-levelvar errSQLiteNoCGO = errors.New(...)) would avoid thefmt.Errorfoverhead/lint noise. Non-blocking.♻️ Optional refactor
import ( - "fmt" + "errors" "log/slog" "github.com/dexidp/dex/storage" ) // openSQLite is a no-CGO stub. Dex's sqlite3 backend requires CGO; when this // binary is built with CGO_ENABLED=0 we reject sqlite storage with a clear // message pointing operators at an alternative (Postgres) or a CGO build. func openSQLite(_ string, _ *slog.Logger) (storage.Storage, error) { - return nil, fmt.Errorf( - "sqlite3 storage is not available: this binary was built with CGO_ENABLED=0; " + - "rebuild with CGO_ENABLED=1 or switch to a postgres storage backend") + return nil, errors.New( + "sqlite3 storage is not available: this binary was built with CGO_ENABLED=0; " + + "rebuild with CGO_ENABLED=1 or switch to a postgres storage backend") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@idp/dex/sqlite_nocgo.go` around lines 16 - 20, The openSQLite function returns a static error via fmt.Errorf though there are no format verbs; replace fmt.Errorf(...) with errors.New(...) to be more idiomatic (or define a package-level var like errSQLiteNoCGO = errors.New("...") and return that) and update openSQLite to return that errors.New/var instead of fmt.Errorf to remove unnecessary formatting overhead and lint noise.tools/entra-test/enroll-tester/tls.go (1)
8-10: PinMinVersionon the insecure TLS config.Even in a dev-only helper, setting
MinVersion: tls.VersionTLS12(ortls.VersionTLS13) is essentially free and addresses themissing-ssl-minversion-golint without changing behavior.🛡️ Proposed tweak
func insecureTLSConfig() *tls.Config { - return &tls.Config{InsecureSkipVerify: true} //nolint:gosec // opt-in dev flag + return &tls.Config{ + InsecureSkipVerify: true, //nolint:gosec // opt-in dev flag (--insecure) + MinVersion: tls.VersionTLS12, + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/entra-test/enroll-tester/tls.go` around lines 8 - 10, The insecureTLSConfig function currently returns &tls.Config{InsecureSkipVerify: true} and should also pin the minimum TLS version to avoid the missing-ssl-minversion-go lint; update insecureTLSConfig to set MinVersion: tls.VersionTLS12 (or tls.VersionTLS13) on the tls.Config while keeping InsecureSkipVerify: true so behavior is unchanged but the lint is satisfied.client/internal/enroll/entradevice/provider.go (2)
118-123: Preferecdsa.SignASN1over the manualasn1.Marshal({R,S}).
ecdsa.SignASN1(Go 1.15+) produces exactly the ASN.1-DER{R, S}shape you're constructing here and avoids the anonymous struct +math/bigimport. Functionally equivalent, just the idiomatic form for Go 1.25.♻️ Proposed simplification
case *ecdsa.PrivateKey: - r, s, err := ecdsa.Sign(rand.Reader, k, digest[:]) - if err != nil { - return nil, fmt.Errorf("ecdsa sign: %w", err) - } - return asn1.Marshal(struct{ R, S *big.Int }{r, s}) + return ecdsa.SignASN1(rand.Reader, k, digest[:])You can then drop the
encoding/asn1andmath/bigimports.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/internal/enroll/entradevice/provider.go` around lines 118 - 123, Replace the manual two-step ECDSA signing + ASN.1 marshaling in the switch branch that handles *ecdsa.PrivateKey: instead of calling ecdsa.Sign to get r,s and then asn1.Marshal(struct{R,S *big.Int}{r,s}), call ecdsa.SignASN1(rand.Reader, k, digest[:]) to produce the DER-encoded {R,S} directly; remove the now-unused encoding/asn1 and math/big imports after this change. Ensure you keep the existing error handling around the sign call and return the byte slice from ecdsa.SignASN1.
124-129: Fallback comment overstates RSA-PSS guarantee.Passing
crypto.SHA256(acrypto.Hash) asoptstocrypto.Signer.Signmeans:
- For an RSA opaque signer, the default is PKCS1v15, not PSS (PSS requires
*rsa.PSSOptions).- For ECDSA opaque signers, output is the standard ASN.1-DER encoding.
So the comment "they MUST handle RSA-PSS" is misleading — this fallback will in practice produce PKCS1v15 for RSA keys. Since line 116 states the server accepts PKCS1v15 too, the behavior is fine, but please update the comment so a future contributor doesn't assume PSS here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/internal/enroll/entradevice/provider.go` around lines 124 - 129, The fallback comment incorrectly claims opaque signers "MUST handle RSA-PSS"; update the comment near the call to p.signer.Sign(rand.Reader, digest[:], crypto.SHA256) to state that passing crypto.SHA256 yields PKCS1v15 for RSA opaque signers (and ASN.1-DER for ECDSA), not RSA-PSS, and note that the server also accepts PKCS1v15 per the earlier comment so behavior is intentional; keep reference to p.signer.Sign and the use of crypto.SHA256 so maintainers understand the actual signing defaults.tools/entra-test/config/management.json (1)
37-37: Nit: label the test-only encryption key.This is obviously a dev fixture, but since it looks like a real base64 key (and scanners will flag it), consider adding a comment in the surrounding docs/README or renaming the value (e.g.,
"<test-only-not-a-secret>"placeholder regenerated on first run) to avoid anyone grepping this out of git history and re-using it. Not a blocker.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/entra-test/config/management.json` at line 37, Update the DataStoreEncryptionKey entry so it’s clearly marked as a test-only placeholder: replace the existing base64-looking value for the "DataStoreEncryptionKey" field with an explicit sentinel like "<test-only-not-a-secret>" (or a value generated at first-run) and add a short note in the surrounding docs/README explaining it is a dev fixture and must not be used in production; ensure the string "DataStoreEncryptionKey" is what you change so scanners/grep will match the placeholder instead of the real-looking key.management/Dockerfile.entra-test (2)
28-35: Install build deps beforeCOPY . .to keep the layer cache warm.
apt-get install gcc libc6-devcurrently lives in the sameRUNasgo build, which is downstream ofCOPY . .. Any source edit invalidates that whole layer and re-runsapt-get update && apt-get install. Moving the OS package install into its ownRUNbeforeCOPY . .(right after theCOPY go.mod go.sumstep) makes it cache alongside the module download.♻️ Proposed ordering
WORKDIR /src +# Build deps for CGO (Dex SQLite). Kept in a stable layer to preserve cache +# across source edits. +RUN apt-get update && \ + apt-get install -y --no-install-recommends gcc libc6-dev && \ + rm -rf /var/lib/apt/lists/* + # Prime the module cache separately so source edits don't invalidate it. COPY go.mod go.sum ./ RUN --mount=type=cache,target=/go/pkg/mod \ --mount=type=cache,target=/root/.cache/go-build \ go mod download COPY . . RUN --mount=type=cache,target=/go/pkg/mod \ --mount=type=cache,target=/root/.cache/go-build \ - apt-get update && \ - apt-get install -y --no-install-recommends gcc libc6-dev && \ - rm -rf /var/lib/apt/lists/* && \ go build -trimpath -ldflags "-s -w" -o /out/netbird-mgmt ./management🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/Dockerfile.entra-test` around lines 28 - 35, Move the OS package installation (apt-get update && apt-get install -y --no-install-recommends gcc libc6-dev) into its own RUN layer placed before the full source COPY . . (ideally immediately after the existing COPY go.mod go.sum step) so the package install is cached independently of source changes; then keep the go build step (the RUN using --mount=type=cache targets and go build -trimpath -ldflags "-s -w" -o /out/netbird-mgmt ./management) in its original RUN after COPY . . to ensure only the build runs when sources change.
44-55: Run the runtime container as non-root.Trivy flags the image as root-user. Even for a test image, adding a non-root
USERis a one-liner and avoids downstream surprises when the same Dockerfile gets adapted/copied.🔒 Proposed change
COPY --from=builder /out/netbird-mgmt /usr/local/bin/netbird-mgmt + +RUN useradd --system --uid 10001 --user-group --no-create-home netbird +USER 10001:10001🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/Dockerfile.entra-test` around lines 44 - 55, The image is running as root; add a non-root user and switch to it before ENTRYPOINT to avoid running /usr/local/bin/netbird-mgmt as root. Create or use a dedicated unprivileged user (e.g., "netbird"), ensure the binary and any needed directories are owned/readable by that user (adjust ownership after COPY with chown in the Dockerfile), then add a USER netbird line just before ENTRYPOINT so HEALTHCHECK and ENTRYPOINT run unprivileged; keep EXPOSE and HEALTHCHECK as-is.management/server/integrations/entra_device/nonce_store_test.go (1)
60-91: Concurrent test can report a misleading nil error.At lines 79-82 the worker sends on
errswhenevererr != nil || !ok, but whenok == falsewitherr == nilit still enqueues a nil. The reader then printsunexpected error ... %v→<nil>, obscuring the real failure (first-consume returningok=false). Report the two conditions distinctly:♻️ Proposed fix
- ok, err := s.Consume(n) - if err != nil || !ok { - errs <- err - return - } + ok, err := s.Consume(n) + if err != nil { + errs <- err + return + } + if !ok { + errs <- fmt.Errorf("consume returned ok=false for freshly issued nonce") + return + }(Requires adding
fmtto the imports.)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/integrations/entra_device/nonce_store_test.go` around lines 60 - 91, The concurrent test TestNonceStore_ConcurrentIssueAndConsume can enqueue a nil into errs when s.Consume returns ok==false but err==nil; change the worker goroutine so it reports the two failure modes distinctly: when err != nil send that error (as now), and when ok == false send a non-nil error describing the unexpected consume result (e.g., using fmt.Errorf with context including the nonce); reference the goroutine block that calls s.Issue() and s.Consume(), ensure the errs channel only receives non-nil errors, and add the fmt import to the test file.management/server/http/handler.go (2)
218-229: "Not configured" path will log a WARN on every non-Entra deployment.Both type assertions fail in the common case where this integration isn't used, producing
Entra device auth: ... skipping installwarnings on every server start. Since the design is explicitly data-gated (return 404 if no integration), a missing interface is the expected shape for most deployments, not a warning condition. Consider demoting toDebugf, or only warning when at least oneEntraDeviceAuthrow exists in the DB.♻️ Proposed tweak
dbProvider, ok := accountManager.GetStore().(entra_device_auth.DBProvider) if !ok { - log.WithContext(ctx).Warnf("Entra device auth: store does not expose *gorm.DB; skipping install") + log.WithContext(ctx).Debugf("Entra device auth: store does not expose *gorm.DB; skipping install") return } enrollerProvider, ok := accountManager.(interface { AsEntraDevicePeerEnroller() entra_device.PeerEnroller }) if !ok { - log.WithContext(ctx).Warnf("Entra device auth: account manager does not implement AsEntraDevicePeerEnroller; skipping install") + log.WithContext(ctx).Debugf("Entra device auth: account manager does not implement AsEntraDevicePeerEnroller; skipping install") return }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/http/handler.go` around lines 218 - 229, The current WARN logs in the non-configured path are noisy because the type assertions on accountManager.GetStore() to entra_device_auth.DBProvider and on accountManager to the AsEntraDevicePeerEnroller interface are expected to fail for most deployments; change these Warnf calls to debug-level logs (log.WithContext(ctx).Debugf) OR gate the Warnf behind a check for existing EntraDeviceAuth rows (query via the store/DBProvider for any EntraDeviceAuth entries) so you only warn when the integration is actually present but unusable; update the two locations referencing accountManager.GetStore(), entra_device_auth.DBProvider, accountManager.(AsEntraDevicePeerEnroller), and the EntraDeviceAuth existence check accordingly.
94-102: Remove the redundant/join/entra/*bypass entries.The
bypass.ShouldBypass()function is called exclusively byauthMiddleware.Handler(line 76), which is attached only to the/apisubrouter (line 155). Sinceentra_device_auth.Install()registers the Entra device-side endpoints onrootRouterrather than the/apisubrouter, requests to/join/entra/challengeand/join/entra/enrollnever pass throughauthMiddlewareand therefore never consult the bypass list. These two entries are unreachable and should be removed to avoid misleading future readers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/http/handler.go` around lines 94 - 102, Remove the two unreachable bypass registrations: delete the calls to bypass.AddBypassPath("/join/entra/challenge") and bypass.AddBypassPath("/join/entra/enroll") (and their accompanying comment) because bypass.ShouldBypass() is only used by authMiddleware.Handler which is mounted on the /api subrouter, while entra_device_auth.Install() registers those endpoints on rootRouter so they never consult the bypass list; update any nearby comment to avoid implying those paths need bypass entries.client/internal/profilemanager/config.go (1)
183-195: Flag for drift risk between the twoEntraEnrollStatedefinitions.The struct here is a verbatim copy of
client/internal/enroll/entradevice/state.go. The import-cycle justification is reasonable, but any future field addition/rename must be applied in both files or JSON round-trips through the profile will silently drop fields. A small shared types-only subpackage (e.g.client/internal/enroll/entradevice/enrollstate, no cross-deps) would collapse the duplication, or at minimum add a compile-time sanity check/test that asserts field-name parity.Not a blocker given the comment is explicit; leaving as an optional cleanup.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/internal/profilemanager/config.go` around lines 183 - 195, The duplicate EntraEnrollState risks drifting from client/internal/enroll/entradevice's definition; add a compile-time/test-time parity check rather than changing package layout: create a unit test (e.g. profilemanager_parity_test) that imports client/internal/enroll/entradevice and compares reflect.TypeOf(profilemanager.EntraEnrollState) against reflect.TypeOf(entradevice.EntraEnrollState), iterating fields and asserting identical JSON tag names and field counts, failing the test if any mismatch is found so future edits must keep both structs in sync (or optionally replace the duplicate by moving the struct into a small shared types-only package and updating references to that type).client/cmd/entra_enroll.go (3)
245-261: Usepath/filepath.Dirinstead of the hand-rolledfilepathDir.The custom helper misses Windows volume/UNC handling and trailing-separator cleanup that
filepath.Diralready gets right. SinceconfigPathis a filesystem path here, the stdlib version is a drop-in replacement.♻️ Proposed fix
- if err := os.MkdirAll(filepathDir(configPath), 0o755); err != nil { + if err := os.MkdirAll(filepath.Dir(configPath), 0o755); err != nil { return nil, fmt.Errorf("mkdir %s: %w", configPath, err) }And delete the helper:
-func filepathDir(p string) string { - for i := len(p) - 1; i >= 0; i-- { - if p[i] == '\\' || p[i] == '/' { - return p[:i] - } - } - return "." -}Add
"path/filepath"to the import block.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/cmd/entra_enroll.go` around lines 245 - 261, Replace the custom filepathDir helper with the standard library implementation: call filepath.Dir(configPath) where filepathDir(configPath) is used (e.g., the os.MkdirAll call before writing config) and remove the filepathDir function entirely; also add "path/filepath" to the import block so the code compiles. Ensure the rest of the function (returning cfg after util.WriteJson) remains unchanged.
134-176: Minor: the 60s enrolment timeout also boundsutil.WriteJson.
ctxis threaded intoutil.WriteJsonat line 174 afteren.Enrolhas already consumed some of the 60-second budget. If enrolment is slow, config persistence can fail under ctx cancellation even though enrolment succeeded — and we'd lose theEntraEnrollStatebut the server would consider the device enrolled. Consider a freshcontext.Background()(or a new short timeout) for the persistence step so a successful enrolment is never thrown away by the timer.- if err := util.WriteJson(ctx, configPath, cfg); err != nil { + if err := util.WriteJson(context.Background(), configPath, cfg); err != nil { return fmt.Errorf("persist profile config: %w", err) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/cmd/entra_enroll.go` around lines 134 - 176, The current ctx created by context.WithTimeout(…, 60*time.Second) is passed to en.Enrol and later to util.WriteJson, risking WriteJson failing if the enrolment consumed the timeout; create a fresh context for persistence (e.g. context.Background() or a short separate timeout via context.WithTimeout) and use that new context when calling util.WriteJson (and any persistence-related work) so successful en.Enrol results (profilemanager.EntraEnrollState) are always saved even if the original enrolment timeout expires.
147-155: Useerrors.Asto survive error wrapping.The type assertion silently falls through if the enroller ever wraps the server error (e.g.
fmt.Errorf("enrol: %w", err)), so the user stops seeing the stablecode/ HTTP status that is the whole point of this branch. Defensive against future refactors of the enroller.♻️ Proposed fix
state, err := en.Enrol(ctx) if err != nil { - // Surface structured server errors with their stable code. - if structured, ok := err.(*entradevice.Error); ok { + // Surface structured server errors with their stable code. + var structured *entradevice.Error + if errors.As(err, &structured) { cmd.PrintErrf("Enrolment rejected: %s (HTTP %d)\n %s\n", structured.Code, structured.HTTPStatus, structured.Message) return fmt.Errorf("enrolment failed: %s", structured.Code) } return fmt.Errorf("enrolment failed: %w", err) }(Also add
"errors"to the import block.)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/cmd/entra_enroll.go` around lines 147 - 155, The error handling block currently uses a concrete type assertion against *entradevice.Error which will fail if the enroller wraps the server error; change it to use errors.As to unwrap and detect entradevice.Error (ensure "errors" is added to imports), then log the structured fields via cmd.PrintErrf (as done now) and return the stable code as before; specifically replace the type assertion on err with an errors.As check that assigns to a variable of type *entradevice.Error to preserve behavior when errors are wrapped.management/server/integrations/entra_device/store_test.go (1)
65-70: Guard the slice length before indexing.If a regression causes
ListEntraDeviceMappingsto return a different count, this test panics with an opaque "index out of range" rather than failing with a useful assertion.♻️ Proposed fix
all, err := s.ListEntraDeviceMappings(ctx, "acct-1") require.NoError(t, err) + require.Len(t, all, 2) assert.ElementsMatch(t, []string{m1.ID, m2.ID}, []string{all[0].ID, all[1].ID}, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/integrations/entra_device/store_test.go` around lines 65 - 70, The test indexes into the returned slice "all" without checking its length, which can cause an index-out-of-range panic; update the test around the ListEntraDeviceMappings call to first assert/require the expected length (e.g., require.Len/ assert.Len on "all" for 2) or otherwise guard that all has at least two elements, then compare the IDs (m1.ID, m2.ID) against the collected IDs from "all" (or use assert.ElementsMatch on the extracted IDs) so the failure is a clear test assertion rather than a panic.client/internal/enroll/entradevice/enroller_test.go (1)
11-11: Drop the strayfmtimport andvar _ = fmt.Sprintfsink.
fmtisn't referenced anywhere else in this test; the comment about "Go versions without error.As shortcut" is inaccurate (errors.Ashas no relation tofmt). Just remove the import and the sink.♻️ Proposed cleanup
- "fmt" "math/big"// Compile-time assertions. var _ CertProvider = (*PFXProvider)(nil) - -// Silence unused-import warning on Go versions without error.As shortcut. -var _ = fmt.SprintfAlso applies to: 250-250
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/internal/enroll/entradevice/enroller_test.go` at line 11, Remove the unused fmt import and the dummy sink var _ = fmt.Sprintf from enroller_test.go (and the duplicate occurrence at the other location), delete or adjust the misleading comment about "Go versions without error.As shortcut" since errors.As is unrelated to fmt, and run `go vet`/`go test` to ensure there are no other unused-import artifacts; specifically update/remove the import line referencing "fmt" and the variable named using fmt.Sprintf so the file no longer imports or references fmt.management/server/integrations/entra_device/nonce_store.go (1)
74-98: Consider replacing the O(n) constant-time scan with a direct map lookup.Given nonces are 32 random bytes (~256 bits of entropy) URL-safe base64 encoded, timing attacks to recover a nonce are not a practical concern. The current loop is O(n) per
Consume, and thebreakon match at Line 86 partially defeats the constant-time property anyway (iteration order of Go maps is randomized, so the early exit leaks little, but the "constant-time across entries" claim in the comment is overstated).A plain map lookup is O(1), simpler to reason about, and secure for this input distribution.
♻️ Suggested simplification
// Consume implements NonceStore using a constant-time equality check. func (s *InMemoryNonceStore) Consume(nonce string) (bool, error) { s.mu.Lock() defer s.mu.Unlock() - - now := time.Now().UTC() - // Scan with constant-time comparison to avoid leaking which nonces exist - // via timing. The overhead is negligible given the store's small size. - var found string - for key := range s.entries { - if subtle.ConstantTimeCompare([]byte(key), []byte(nonce)) == 1 { - found = key - break - } - } - if found == "" { - return false, nil - } - e := s.entries[found] - delete(s.entries, found) - if now.After(e.expiresAt) { - return false, nil - } - return true, nil + e, ok := s.entries[nonce] + if !ok { + return false, nil + } + delete(s.entries, nonce) + if time.Now().UTC().After(e.expiresAt) { + return false, nil + } + return true, nil }The
crypto/subtleimport would also become unused after this change.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/integrations/entra_device/nonce_store.go` around lines 74 - 98, Replace the O(n) constant-time scan in InMemoryNonceStore.Consume with a direct map lookup: lock s.mu, check s.entries[nonce] via normal map access to get the entry and presence flag, delete the key if present, then check expiresAt (e.expiresAt) and return accordingly; remove the crypto/subtle usage/import since the constant-time loop is no longer needed. Ensure behavior (lock/unlock via s.mu, deletion of the found entry, and returning false for missing or expired nonces) remains unchanged.management/server/integrations/entra_device/graph_client.go (1)
78-123:bearer()holds the mutex across the HTTP token call.
c.mu.Lock()at Line 79 is held through the fullHTTPClient.Do(up to 15s default timeout). All concurrent enrollments serialize behind a single token fetch on startup and on every renewal. For the expected load this is probably fine, but consider a double-checked-lock pattern (check under lock → release → fetch → re-acquire to store) if enrollment concurrency is ever a concern.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/integrations/entra_device/graph_client.go` around lines 78 - 123, The bearer() method currently holds c.mu for the entire HTTP call, blocking concurrent callers; change it to a double-checked-lock: inside bearer() check token/tokenExp under c.mu and if valid return, otherwise release the lock, perform the HTTP request using c.HTTPClient.Do(ctx, ...) to fetch a new token, then re-acquire c.mu and check token/tokenExp again (another goroutine may have refreshed it) before writing c.token and c.tokenExp; reference the bearer function, c.mu, c.token, c.tokenExp, and c.HTTPClient.Do when making the changes and ensure errors and response parsing behave the same.management/server/http/handlers/entra_device_auth/handler.go (1)
111-132:applyDTOToAuthmutates the stored pointer before validation.When
existing != nil,authis the exact pointer returned by the store.applyDTOToAuthwrites all fields (TenantID/ClientID/FallbackAutoGroups/…) before thetenant_id && client_idcheck at Line 116. If validation fails, the mutations aren't persisted via SQLStore (which re-reads), but theMemoryStorereturns the live map entry — subsequent reads see a half-applied DTO even thoughSaveEntraDeviceAuthwas never called. Apply changes to a local copy and only commit when all validation passes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/http/handlers/entra_device_auth/handler.go` around lines 111 - 132, The handler currently calls applyDTOToAuth on the live pointer returned from the store (auth == existing), mutating the stored object before validation; change the logic to make a shallow copy of the existing EntraDeviceAuth (or construct a new types.NewEntraDeviceAuth(accountID) when nil), apply applyDTOToAuth to that copy, run the tenant_id/client_id validation, and only if validation passes assign/replace the original pointer or call SaveEntraDeviceAuth with the modified copy; ensure this prevents in-place writes from affecting MemoryStore entries when SaveEntraDeviceAuth is not invoked and keep references to functions/types applyDTOToAuth, existing, auth, SaveEntraDeviceAuth, and types.NewEntraDeviceAuth to locate the relevant code.management/server/integrations/entra_device/resolution.go (1)
177-181: Remove the no-op "defensive fallback" or make it meaningful.
if i == len(candidates)-1 && len(out.AutoGroups) == 0 { out.AutoGroups = []string{} }only changes a nil slice to an empty slice on the last iteration, which is almost never observable (and isn't inside union semantics). If you actually want stable JSON rendering, initializeout.AutoGroups = []string{}once up-front instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/integrations/entra_device/resolution.go` around lines 177 - 181, The fallback block that sets out.AutoGroups = []string{} only on the last loop iteration is a no-op; either remove that conditional entirely or initialize out.AutoGroups once before iterating candidates so it's reliably an empty slice rather than nil; locate the loop using candidates and the index i in resolution.go and remove the if i == len(candidates)-1 && len(out.AutoGroups) == 0 { ... } block or add out.AutoGroups = []string{} at the point where out is constructed prior to the loop.management/server/entra_device_enroll.go (1)
176-180: Drop the unusedvalidatedPeerbinding.
_ = validatedPeer // peer object returned to caller is not needed by Manageris just a discarded return; use_in the destructuring directly.♻️ Proposed refactor
- validatedPeer, netMap, checks, _, err := am.networkMapController.GetValidatedPeerWithMap(ctx, false, input.AccountID, newPeer) + _, netMap, checks, _, err := am.networkMapController.GetValidatedPeerWithMap(ctx, false, input.AccountID, newPeer) if err != nil { return nil, fmt.Errorf("failed to build network map for entra peer: %w", err) } - _ = validatedPeer // peer object returned to caller is not needed by Manager🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/entra_device_enroll.go` around lines 176 - 180, The destructuring assigns validatedPeer but then discards it; change the call to use the blank identifier directly (e.g., _, netMap, checks, _, err := am.networkMapController.GetValidatedPeerWithMap(ctx, false, input.AccountID, newPeer)) and remove the subsequent discard line `_ = validatedPeer` to eliminate the unused binding in entra_device_enroll.go (reference GetValidatedPeerWithMap and the validatedPeer variable).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d94437e4-028c-4571-8bd6-08beae90f2fa
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (48)
client/cmd/entra_enroll.goclient/cmd/root.goclient/internal/enroll/entradevice/enroller.goclient/internal/enroll/entradevice/enroller_test.goclient/internal/enroll/entradevice/provider.goclient/internal/enroll/entradevice/state.goclient/internal/profilemanager/config.godocs/ENTRA_DEVICE_AUTH.mdgo.modidp/dex/config.goidp/dex/provider.goidp/dex/sqlite_cgo.goidp/dex/sqlite_nocgo.gomanagement/Dockerfile.entra-testmanagement/server/entra_device_enroll.gomanagement/server/http/handler.gomanagement/server/http/handlers/entra_device_auth/handler.gomanagement/server/http/handlers/entra_device_auth/wiring.gomanagement/server/http/handlers/entra_join/handler.gomanagement/server/http/handlers/entra_join/handler_test.gomanagement/server/integrations/entra_device/README.mdmanagement/server/integrations/entra_device/activity.gomanagement/server/integrations/entra_device/asn1.gomanagement/server/integrations/entra_device/cert_validator.gomanagement/server/integrations/entra_device/cert_validator_test.gomanagement/server/integrations/entra_device/errors.gomanagement/server/integrations/entra_device/graph_client.gomanagement/server/integrations/entra_device/manager.gomanagement/server/integrations/entra_device/manager_test.gomanagement/server/integrations/entra_device/nonce_store.gomanagement/server/integrations/entra_device/nonce_store_test.gomanagement/server/integrations/entra_device/resolution.gomanagement/server/integrations/entra_device/resolution_test.gomanagement/server/integrations/entra_device/sql_store.gomanagement/server/integrations/entra_device/store.gomanagement/server/integrations/entra_device/store_test.gomanagement/server/integrations/entra_device/types.gomanagement/server/permissions/modules/module.gomanagement/server/types/entra_device_auth.gotools/entra-test/.gitignoretools/entra-test/TESTING.mdtools/entra-test/config/management.jsontools/entra-test/docker-compose.ymltools/entra-test/enroll-tester/demo.gotools/entra-test/enroll-tester/main.gotools/entra-test/enroll-tester/tls.gotools/entra-test/make-pfx/main.gotools/entra-test/seed-account/main.go
| type SQLStore struct { | ||
| DB *gorm.DB | ||
|
|
||
| // BootstrapTTL controls how long a bootstrap token remains valid. | ||
| BootstrapTTL time.Duration | ||
|
|
||
| mu sync.Mutex | ||
| tokens map[string]bootstrapEntry | ||
| tokenOps int | ||
| } |
There was a problem hiding this comment.
In-memory bootstrap tokens do not survive restart or scale to multiple instances.
The tokens map is process-local. In a multi-instance (HA/load-balanced) management deployment, a client's POST /join/entra/enroll might land on node A while the subsequent gRPC Login that echoes the bootstrap token lands on node B — node B has no knowledge of the token and will reject it. Similarly, a management restart between enroll and Login will invalidate all pending enrollments.
Given enrollment is a public-facing feature, either:
- Persist bootstrap tokens in the DB (with TTL column + periodic GC), which also fits naturally with the existing
*gorm.DB, or - Document prominently that this integration is currently single-instance only, and gate multi-instance deployments accordingly.
A hashed form (e.g., SHA-256 of the random token) is preferable in either store so that a DB/memory dump does not leak still-valid tokens.
Also applies to: 133-181
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@management/server/integrations/entra_device/sql_store.go` around lines 18 -
27, The in-memory tokens map on SQLStore (tokens, bootstrapEntry, tokenOps) is
not resilient across restarts or multiple instances; persist bootstrap tokens in
the existing *gorm.DB instead: add a BootstrapToken model/table with fields for
id/hash (store SHA-256 of the random token), plain_token should never be stored,
created_at and expires_at (TTL) and an index on expires_at, run an automatic
migration, and replace reads/writes that reference SQLStore.tokens (and methods
handling enroll/Login/validate like the bootstrap handling code between lines
~133-181) to use DB lookups against the hashed token; also implement a periodic
GC job (or DB-native TTL cleanup) to remove expired rows and ensure token
comparison uses constant-time/hashing semantics so leaked DB dumps don’t reveal
live tokens.
There was a problem hiding this comment.
Acknowledged as a known production gap for HA / multi-instance deployments. The SQLStore in-memory token map is explicitly documented in the package README and docs/ENTRA_DEVICE_AUTH.md (Known production gaps) as a follow-up: persist hashed tokens in the main DB with expires_at + periodic GC. Not done in this round because it is a schema/migration change that deserves its own PR; wanted to keep the review-round commit scoped to correctness fixes.
There was a problem hiding this comment.
Only users with a collaborator, contributor, member, or owner role can interact with CodeRabbit.
| func (s *SQLStore) ConsumeBootstrapToken(_ context.Context, peerID, token string) (bool, error) { | ||
| s.mu.Lock() | ||
| defer s.mu.Unlock() | ||
| entry, ok := s.tokens[peerID] | ||
| if !ok { | ||
| return false, nil | ||
| } | ||
| delete(s.tokens, peerID) | ||
| if entry.token != token { | ||
| return false, nil | ||
| } | ||
| if time.Now().UTC().After(entry.expiresAt) { | ||
| return false, nil | ||
| } | ||
| return true, nil | ||
| } |
There was a problem hiding this comment.
Bootstrap token can be burned by anyone who knows/guesses the peerID.
At Line 165 the cached entry is deleted unconditionally, before the token or expiry is verified. Any caller who can invoke ConsumeBootstrapToken with a matching peerID and any wrong token will evict the legitimate entry, causing the real client's subsequent gRPC Login to fail. Since gRPC Login is the public path that feeds this check, this is a DoS surface against in-flight enrollments.
Delete only after confirming both match and non-expiry (or only on success):
🔒️ Proposed fix
func (s *SQLStore) ConsumeBootstrapToken(_ context.Context, peerID, token string) (bool, error) {
s.mu.Lock()
defer s.mu.Unlock()
entry, ok := s.tokens[peerID]
if !ok {
return false, nil
}
- delete(s.tokens, peerID)
if entry.token != token {
return false, nil
}
+ // Token matched — consume it exactly once regardless of expiry.
+ delete(s.tokens, peerID)
if time.Now().UTC().After(entry.expiresAt) {
return false, nil
}
return true, nil
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (s *SQLStore) ConsumeBootstrapToken(_ context.Context, peerID, token string) (bool, error) { | |
| s.mu.Lock() | |
| defer s.mu.Unlock() | |
| entry, ok := s.tokens[peerID] | |
| if !ok { | |
| return false, nil | |
| } | |
| delete(s.tokens, peerID) | |
| if entry.token != token { | |
| return false, nil | |
| } | |
| if time.Now().UTC().After(entry.expiresAt) { | |
| return false, nil | |
| } | |
| return true, nil | |
| } | |
| func (s *SQLStore) ConsumeBootstrapToken(_ context.Context, peerID, token string) (bool, error) { | |
| s.mu.Lock() | |
| defer s.mu.Unlock() | |
| entry, ok := s.tokens[peerID] | |
| if !ok { | |
| return false, nil | |
| } | |
| if entry.token != token { | |
| return false, nil | |
| } | |
| // Token matched — consume it exactly once regardless of expiry. | |
| delete(s.tokens, peerID) | |
| if time.Now().UTC().After(entry.expiresAt) { | |
| return false, nil | |
| } | |
| return true, nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@management/server/integrations/entra_device/sql_store.go` around lines 158 -
173, The ConsumeBootstrapToken method currently deletes the cached entry
unconditionally before validating the token and expiry, allowing anyone who
knows a peerID to burn the token; change the logic in ConsumeBootstrapToken so
you first look up entry := s.tokens[peerID], then verify entry.token == token
and that time.Now().UTC().Before(entry.expiresAt), and only after both checks
pass call delete(s.tokens, peerID) and return true; keep the existing mutex
(s.mu.Lock()/defer s.mu.Unlock()) and only remove the map entry on successful
validation.
There was a problem hiding this comment.
Fixed in 07bf8cf: SQLStore.ConsumeBootstrapToken now validates prior existence + token match BEFORE calling delete, closing the DoS surface where any caller with a guessed peerID could evict the real entry by supplying a wrong token.
There was a problem hiding this comment.
Only users with a collaborator, contributor, member, or owner role can interact with CodeRabbit.
| type EntraDeviceAuth struct { | ||
| ID string `gorm:"primaryKey"` | ||
| AccountID string `gorm:"uniqueIndex"` | ||
|
|
||
| // Entra configuration | ||
| TenantID string | ||
| ClientID string | ||
| ClientSecret string `gorm:"column:client_secret"` // encrypt at rest via existing secret storage | ||
| Issuer string | ||
| Audience string | ||
|
|
||
| // Behaviour flags | ||
| Enabled bool | ||
| RequireIntuneCompliant bool | ||
| AllowTenantOnlyFallback bool | ||
| FallbackAutoGroups []string `gorm:"serializer:json"` | ||
| MappingResolution MappingResolution | ||
|
|
||
| // Optional continuous revalidation interval. 0 = join-only validation. | ||
| RevalidationInterval time.Duration | ||
|
|
||
| CreatedAt time.Time | ||
| UpdatedAt time.Time `gorm:"autoUpdateTime:false"` | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Look for an existing pattern for encrypting secret columns at rest in the
# management DB (gorm serializer, wrapper types, or a secret-store helper).
rg -nP --type=go -C3 '(?i)(encrypt|seal|secret).{0,40}(serializer|Scanner|Valuer|gorm)' management/
rg -nP --type=go -C2 'gorm:"[^"]*serializer:' management/
rg -nP --type=go -C3 'ClientSecret\s+string' management/server/types/ management/server/account.go management/server/idp/Repository: netbirdio/netbird
Length of output: 23441
🏁 Script executed:
# Check EventMeta() implementation to ensure ClientSecret is not exposed
rg -n 'func.*EventMeta' management/server/types/entra_device_auth.go -A 20
# Look for documentation on secret management
find management/ -type f \( -name "*.md" -o -name "*.go" \) | xargs rg -l "secret.*management|vault|KMS" | head -5
# Check how other sensitive fields (KeySecret, etc.) are handled in migrations or at-rest
rg -n 'KeySecret|ClientSecret' management/server/migration/ -B 2 -A 2Repository: netbirdio/netbird
Length of output: 2881
ClientSecret is stored in plaintext despite the comment claiming encrypt-at-rest.
The field is declared as a plain string with only gorm:"column:client_secret" for column naming; no encryption serializer, wrapper type, or secret-store integration is applied. The comment "encrypt at rest via existing secret storage" does not reflect implemented functionality. An Entra app's client secret can mint app-only Graph tokens, making DB leakage a material security risk.
Before release, either:
- Wrap the field in a type using gorm's
Serializermechanism (matching the project's pattern for sensitive fields), or - Store only a reference and keep the secret in a dedicated secret store (KMS/Vault/environment).
EventMeta() correctly excludes ClientSecret from event payloads, so no regression risk there.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@management/server/types/entra_device_auth.go` around lines 40 - 63, The
ClientSecret field on EntraDeviceAuth is currently a plaintext string despite
the comment; change its storage to an encrypted serializer or secret reference:
either replace ClientSecret string with a wrapped type (e.g., EncryptedString or
SecretRef) and add the gorm tag `serializer:<YourSerializer>` to the
EntraDeviceAuth struct so GORM uses the project’s existing encryption
serializer, or alter the field to store only a secret identifier (e.g.,
ClientSecretRef string) and integrate retrieval from the project's secret store
when needed; update any usages of EntraDeviceAuth.ClientSecret (creation, load,
persist) to use the new type or secret lookup and ensure EventMeta() continues
to exclude the secret.
There was a problem hiding this comment.
Acknowledged as a known production gap. client_secret is called out in both the package README and docs/ENTRA_DEVICE_AUTH.md under Known production gaps as a must-do before the integration is exposed to a real tenant; rotating the column to the existing encrypted-column/serialiser pattern is deliberately deferred to a follow-up PR so the review round stays focused on correctness.
There was a problem hiding this comment.
Only users with a collaborator, contributor, member, or owner role can interact with CodeRabbit.
19db31a to
9f04a0c
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
management/server/integrations/entra_device/manager.go (2)
60-76:Clockfield is set but never consumed.
Manager.Clockis initialized inNewManagerbut no method in this file reads it — the onlytime.Now()usage in the file is literally inside this initializer. Either plumbm.Clock()into places that need a timestamp (e.g. wherever expiry or issuance time gets stamped) or drop the field to avoid giving the false impression that time is injectable for tests.♻️ Drop the unused field (if no caller needs it yet)
type Manager struct { Store Store NonceStore NonceStore Cert *CertValidator NewGraph func(tenantID, clientID, clientSecret string) GraphClient PeerEnroller PeerEnroller - - // Clock overridable for tests. - Clock func() time.Time } // NewManager constructs a manager with sensible defaults. Callers are expected // to set PeerEnroller before handling enrolments. func NewManager(store Store) *Manager { return &Manager{ Store: store, NonceStore: NewInMemoryNonceStore(0), Cert: NewCertValidator(nil, nil), NewGraph: func(tenantID, clientID, clientSecret string) GraphClient { return NewHTTPGraphClient(tenantID, clientID, clientSecret) }, - Clock: func() time.Time { return time.Now().UTC() }, } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/integrations/entra_device/manager.go` around lines 60 - 76, Remove the unused Clock injection to avoid misleading testability: delete the Clock field from the Manager struct and remove the Clock initializer in NewManager (the func() time.Time { return time.Now().UTC() } entry), and update any callers/constructors that set Manager.Clock so they no longer reference it; if a timestamp is later needed, reintroduce and wire m.Clock() into the specific methods that stamp issuance/expiry (refer to Manager and NewManager for where to edit).
195-230: Graph flow reads cleanly.Nice separation: existence → enabled → transitive groups → optional compliance, with identity mutated in-place so downstream stages (mapping resolution, peer enrol, activity emission) have authoritative Entra state rather than trusting the client echo. One minor nit you may want to tidy later:
CodeGroupLookupFailedis used as the bucket for device lookup and compliance lookup failures too — a more granular code set (CodeDeviceLookupFailed/CodeComplianceLookupFailed) would make operator dashboards easier to slice, but it's not worth blocking on.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/integrations/entra_device/manager.go` around lines 195 - 230, Replace the generic CodeGroupLookupFailed usage in verifyWithGraph with more specific error codes: use a new CodeDeviceLookupFailed for the graph.Device(...) error path (the NewError call that wraps "graph device lookup failed") and use a new CodeComplianceLookupFailed for the graph.IsCompliant(...) error path (the NewError call that wraps "graph Intune compliance lookup failed"); update the error code constants (and any switch/metrics/dashboards that depend on them) so NewError can be called with CodeDeviceLookupFailed and CodeComplianceLookupFailed, leaving the transitiveMemberOf path as CodeGroupLookupFailed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@management/server/integrations/entra_device/manager.go`:
- Around line 159-179: The code trims the nonce only for NonceStore.Consume but
passes the raw encoded string to decodeNonceBytes, causing
valid-but-whitespace-padded nonces to be consumed then fail decoding; update
Manager.consumeNonce to normalize (e.g., call strings.TrimSpace) the encoded
nonce once into a local variable and use that normalized value for both
m.NonceStore.Consume(...) and decodeNonceBytes(...), ensuring you still return
the same NewError codes from consumeNonce on failures.
- Around line 119-136: The current flow creates the peer via m.enrollPeer(...)
and only afterwards calls m.issueBootstrapToken(...), which can fail and leave
an orphaned peer; change the flow to generate the bootstrap token before
creating the peer and pass it into the transactional peer-creation call (e.g.,
add a token field to EnrollPeerInput and have
enrollPeer/DefaultAccountManager.EnrollEntraDevicePeer persist the token
together with the peer and via StoreBootstrapToken inside the same DB
transaction), so peer creation + token persistence is atomic; if you cannot make
that change immediately, implement best-effort compensation inside this
function: on issueBootstrapToken error attempt to delete the created peer (call
the peer delete method) and log/report any failure to delete before returning
the error.
---
Nitpick comments:
In `@management/server/integrations/entra_device/manager.go`:
- Around line 60-76: Remove the unused Clock injection to avoid misleading
testability: delete the Clock field from the Manager struct and remove the Clock
initializer in NewManager (the func() time.Time { return time.Now().UTC() }
entry), and update any callers/constructors that set Manager.Clock so they no
longer reference it; if a timestamp is later needed, reintroduce and wire
m.Clock() into the specific methods that stamp issuance/expiry (refer to Manager
and NewManager for where to edit).
- Around line 195-230: Replace the generic CodeGroupLookupFailed usage in
verifyWithGraph with more specific error codes: use a new CodeDeviceLookupFailed
for the graph.Device(...) error path (the NewError call that wraps "graph device
lookup failed") and use a new CodeComplianceLookupFailed for the
graph.IsCompliant(...) error path (the NewError call that wraps "graph Intune
compliance lookup failed"); update the error code constants (and any
switch/metrics/dashboards that depend on them) so NewError can be called with
CodeDeviceLookupFailed and CodeComplianceLookupFailed, leaving the
transitiveMemberOf path as CodeGroupLookupFailed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e0505143-307d-4315-8b15-d647554f47de
📒 Files selected for processing (4)
management/server/integrations/entra_device/manager.gomanagement/server/integrations/entra_device/resolution.gotools/entra-test/docker-compose.ymltools/entra-test/seed-account/main.go
✅ Files skipped from review due to trivial changes (1)
- tools/entra-test/docker-compose.yml
🚧 Files skipped from review as they are similar to previous changes (1)
- tools/entra-test/seed-account/main.go
| func (m *Manager) consumeNonce(encoded string) ([]byte, error) { | ||
| ok, err := m.NonceStore.Consume(strings.TrimSpace(encoded)) | ||
| if err != nil { | ||
| return nil, NewError(CodeInternal, "nonce store error", err) | ||
| } | ||
| if !ok { | ||
| return nil, NewError(CodeInvalidNonce, "nonce is unknown, already consumed, or expired", nil) | ||
| } | ||
| return decodeNonceBytes(encoded) | ||
| } | ||
|
|
||
| // decodeNonceBytes tolerates both RawURL and Std base64 alphabets. | ||
| func decodeNonceBytes(encoded string) ([]byte, error) { | ||
| if b, err := base64.RawURLEncoding.DecodeString(encoded); err == nil { | ||
| return b, nil | ||
| } | ||
| if b, err := base64.StdEncoding.DecodeString(encoded); err == nil { | ||
| return b, nil | ||
| } | ||
| return nil, NewError(CodeInvalidNonce, "nonce is not base64", nil) | ||
| } |
There was a problem hiding this comment.
Normalize the nonce once, before both Consume and decode.
Consume sees a trimmed string but decodeNonceBytes gets the raw encoded. If a client sends a nonce with trailing whitespace/newline (curl-from-file is a common offender), Consume succeeds against the stored trimmed value, then decodeNonceBytes fails base64 decode and returns CodeInvalidNonce. The nonce is now burned and the client has no way to retry without a fresh /challenge.
🛠️ Normalize once
func (m *Manager) consumeNonce(encoded string) ([]byte, error) {
- ok, err := m.NonceStore.Consume(strings.TrimSpace(encoded))
+ encoded = strings.TrimSpace(encoded)
+ ok, err := m.NonceStore.Consume(encoded)
if err != nil {
return nil, NewError(CodeInternal, "nonce store error", err)
}
if !ok {
return nil, NewError(CodeInvalidNonce, "nonce is unknown, already consumed, or expired", nil)
}
return decodeNonceBytes(encoded)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (m *Manager) consumeNonce(encoded string) ([]byte, error) { | |
| ok, err := m.NonceStore.Consume(strings.TrimSpace(encoded)) | |
| if err != nil { | |
| return nil, NewError(CodeInternal, "nonce store error", err) | |
| } | |
| if !ok { | |
| return nil, NewError(CodeInvalidNonce, "nonce is unknown, already consumed, or expired", nil) | |
| } | |
| return decodeNonceBytes(encoded) | |
| } | |
| // decodeNonceBytes tolerates both RawURL and Std base64 alphabets. | |
| func decodeNonceBytes(encoded string) ([]byte, error) { | |
| if b, err := base64.RawURLEncoding.DecodeString(encoded); err == nil { | |
| return b, nil | |
| } | |
| if b, err := base64.StdEncoding.DecodeString(encoded); err == nil { | |
| return b, nil | |
| } | |
| return nil, NewError(CodeInvalidNonce, "nonce is not base64", nil) | |
| } | |
| func (m *Manager) consumeNonce(encoded string) ([]byte, error) { | |
| encoded = strings.TrimSpace(encoded) | |
| ok, err := m.NonceStore.Consume(encoded) | |
| if err != nil { | |
| return nil, NewError(CodeInternal, "nonce store error", err) | |
| } | |
| if !ok { | |
| return nil, NewError(CodeInvalidNonce, "nonce is unknown, already consumed, or expired", nil) | |
| } | |
| return decodeNonceBytes(encoded) | |
| } | |
| // decodeNonceBytes tolerates both RawURL and Std base64 alphabets. | |
| func decodeNonceBytes(encoded string) ([]byte, error) { | |
| if b, err := base64.RawURLEncoding.DecodeString(encoded); err == nil { | |
| return b, nil | |
| } | |
| if b, err := base64.StdEncoding.DecodeString(encoded); err == nil { | |
| return b, nil | |
| } | |
| return nil, NewError(CodeInvalidNonce, "nonce is not base64", nil) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@management/server/integrations/entra_device/manager.go` around lines 159 -
179, The code trims the nonce only for NonceStore.Consume but passes the raw
encoded string to decodeNonceBytes, causing valid-but-whitespace-padded nonces
to be consumed then fail decoding; update Manager.consumeNonce to normalize
(e.g., call strings.TrimSpace) the encoded nonce once into a local variable and
use that normalized value for both m.NonceStore.Consume(...) and
decodeNonceBytes(...), ensuring you still return the same NewError codes from
consumeNonce on failures.
There was a problem hiding this comment.
Fixed in 07bf8cf: consumeNonce trims the input once and passes the trimmed value to both NonceStore.Consume and decodeNonceBytes.
There was a problem hiding this comment.
Rate Limit Exceeded
@thvevirtue have exceeded the limit for the number of chat messages per hour. Please wait 59 minutes and 24 seconds before sending another message.
There was a problem hiding this comment.
Actionable comments posted: 16
♻️ Duplicate comments (4)
management/server/integrations/entra_device/store.go (1)
31-138:⚠️ Potential issue | 🔴 CriticalMemoryStore doc still claims goroutine-safety while all receivers access plain maps without synchronization.
Lines 31–33 state "It is goroutine-safe" but the struct has no mutex and every receiver (
GetEntraDeviceAuth,SaveEntraDeviceAuth,DeleteEntraDeviceAuth, the mapping CRUD, and the token ops) touchesm.auths/m.byTenant/m.mappings/m.tokensunguarded. Concurrent access from/join/entra/enrollplus an admin mutation on the same account will race and can panic at runtime.
ConsumeBootstrapToken(lines 131–138) additionally violates its "single-use" contract: the read-compare (got, ok := m.tokens[peerID]) and thedelete(m.tokens, peerID)are separate steps, so two concurrent consumers can both observe the match and both returntruebefore either deletes.Either add a
sync.RWMutextoMemoryStoreand wrap reads withRLock/ writes (including the full read-compare-delete inConsumeBootstrapToken) withLock, or downgrade the doc comment to "not goroutine-safe; callers must serialize access" and rely onSQLStorefor production.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/integrations/entra_device/store.go` around lines 31 - 138, The MemoryStore claims goroutine-safety but uses plain maps without synchronization; add a sync.RWMutex field (e.g., mu) to MemoryStore and protect all accessors: use mu.RLock()/RUnlock() in read-only methods (GetEntraDeviceAuth, GetEntraDeviceAuthByTenant, ListEntraDeviceMappings, GetEntraDeviceMapping) and mu.Lock()/Unlock() in mutating methods (SaveEntraDeviceAuth, DeleteEntraDeviceAuth, SaveEntraDeviceMapping, DeleteEntraDeviceMapping, StoreBootstrapToken) and perform the read-compare-delete in ConsumeBootstrapToken entirely under mu.Lock() to ensure single-use semantics; update the constructor NewMemoryStore to initialize the mutex (no other behavioral changes).management/server/types/entra_device_auth.go (1)
40-63:⚠️ Potential issue | 🟠 Major
ClientSecretis persisted in plaintext despite the "encrypt at rest" comment.The field is a plain
stringwith onlygorm:"column:client_secret"for column naming — noserializer:-based encrypt-at-rest wrapper, no secret-store indirection. An Entra app's client secret can mint app-only Graph tokens against the tenant, so a DB leak (backup, replica snapshot, log line) is a direct privilege-escalation vector.Before release, either wrap the column in an encrypted serializer matching the project's pattern for other sensitive fields, or store a reference and resolve it from a dedicated secret store (KMS/Vault/env).
EventMeta()already excludes the secret from activity payloads, so that side is fine.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/types/entra_device_auth.go` around lines 40 - 63, The ClientSecret field on the EntraDeviceAuth struct is stored as a plain string; change it to either use the project's encrypted serializer or to store a secret-reference resolved from the secret store: replace the ClientSecret string field (in EntraDeviceAuth) with the project's encrypted/serialized secret wrapper (matching other sensitive fields' serializer tag) or with a reference type (e.g., SecretRef) and update persistence/loader logic to resolve it from KMS/Vault/env; ensure GORM tags reflect the serializer or reference column and adjust any create/update/load code paths that read/write EntraDeviceAuth.ClientSecret accordingly.management/server/integrations/entra_device/manager.go (2)
119-136:⚠️ Potential issue | 🟠 MajorOrphan-peer risk: bootstrap-token failure after peer creation.
If
issueBootstrapTokenfails (DB blip onStoreBootstrapToken, or the vanishingly rarerand.Readerror) the peer has already been created with its WG pubkey. The client surfaces the error, and any retry collides with the duplicate-pubkey guard downstream, leaving the device stuck until an admin manually deletes the orphan row.Prefer generating the token up-front and persisting it inside the same transaction as peer creation (extend
EnrollPeerInput/EnrollEntraDevicePeer). If atomic persistence is too invasive short-term, at least best-effort compensate by deleting the just-created peer onissueBootstrapTokenerror before returning.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/integrations/entra_device/manager.go` around lines 119 - 136, The current flow calls enrollPeer(...) first and then issueBootstrapToken(...), which can leave an orphan peer if token issuance fails; you should either (preferred) generate the token before creating the peer and extend EnrollPeerInput / EnrollEntraDevicePeer to accept and persist the bootstrap token within the same DB transaction inside enrollPeer, or (minimal change) implement best-effort compensation: on error from issueBootstrapToken(ctx, result.PeerID) call a peer-deletion routine (e.g., deletePeer/deleteEntraDevicePeer by PeerID) to remove the just-created peer, log any deletion error, and then return the original token issuance error; update the call sites and types referenced (enrollPeer, issueBootstrapToken, EnrollPeerInput/EnrollEntraDevicePeer, and any deletePeer method) accordingly so retries won’t hit duplicate-pubkey collisions.
159-168:⚠️ Potential issue | 🟡 MinorNormalize the nonce once, then use the same value for both
Consumeand decode.
Consumeis called withstrings.TrimSpace(encoded)butdecodeNonceBytesgets the rawencodedstring. A client that sends the nonce with trailing whitespace/newline will succeed on the store (which was presumably issued with the trimmed value match) and then fail base64 decode — burning the nonce and forcing a fresh/challengeround-trip.🛠️ Normalize once
func (m *Manager) consumeNonce(encoded string) ([]byte, error) { - ok, err := m.NonceStore.Consume(strings.TrimSpace(encoded)) + encoded = strings.TrimSpace(encoded) + ok, err := m.NonceStore.Consume(encoded) if err != nil { return nil, NewError(CodeInternal, "nonce store error", err) } if !ok { return nil, NewError(CodeInvalidNonce, "nonce is unknown, already consumed, or expired", nil) } return decodeNonceBytes(encoded) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/integrations/entra_device/manager.go` around lines 159 - 168, In consumeNonce, normalize the incoming nonce once (e.g., n := strings.TrimSpace(encoded)) and use that normalized value for both m.NonceStore.Consume and decodeNonceBytes so you don't consume a trimmed nonce but attempt to decode the raw string; update calls to m.NonceStore.Consume(n) and return decodeNonceBytes(n) while keeping existing error handling in NewError unchanged.
🧹 Nitpick comments (13)
client/internal/enroll/entradevice/state.go (1)
11-36: Duplicated struct withprofilemanager.EntraEnrollState— consider a shared location or conversion helper.This struct is intentionally duplicated in
client/internal/profilemanager/config.goto avoid an import cycle (documented there). Without a conversion helper or shared schema test, future changes to one copy (e.g. adding a field) can silently drift out of the other, breaking round-tripping between the enroller result and the persisted profile. Consider moving the struct to a small leaf package (e.g.client/internal/enroll/entradevice/stateor a newshared/entradevice/state) that both packages can import, or add a unit test that marshals one and unmarshals into the other to guard against drift.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/internal/enroll/entradevice/state.go` around lines 11 - 36, The EntraEnrollState struct is duplicated between EntraEnrollState in this file and profilemanager.EntraEnrollState; extract the canonical type into a small shared package (e.g., enroll/entradevice/state) and update both usages to import that package (replace the local struct declarations with the shared type), or alternatively add a unit test that marshals the enroller's EntraEnrollState and unmarshals into profilemanager.EntraEnrollState (and vice versa) to detect drift; locate occurrences by the type name EntraEnrollState and the profilemanager.EntraEnrollState reference and update imports/usages accordingly.tools/entra-test/seed-account/main.go (1)
110-120: Local-only default DSN with baked-in password.Acknowledged this is a dev fixture (hence the
NOSONARsuppression and comment). Consider also gating the default credentials behind an explicit-i-know-this-is-insecureflag, or pulling them from the compose file's.env— so a contributor can't accidentally runseed-accountagainst a misconfigured prod database with the weak fallback taking effect. Non-blocking.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/entra-test/seed-account/main.go` around lines 110 - 120, The defaultDSN function currently returns hard-coded local dev credentials (using envOrDefault) which can silently be used against a non-dev DB; change it so the insecure fallback is only used when an explicit opt-in is provided (e.g., add and check a boolean flag like allowInsecureDefaults or a CLI flag `-i-know-this-is-insecure` before returning the baked-in user/pass/db), or alternatively load the credentials from the docker-compose/.env file instead of hard-coding; update defaultDSN to consult that flag (or the parsed .env) and fail fast with a clear error if neither a DSN nor explicit opt-in is present.client/internal/enroll/entradevice/enroller_test.go (3)
249-250: Drop thefmt.Sprintfplaceholder.
fmtisn't referenced anywhere else in this test. Remove the import and thevar _ = fmt.Sprintfline; they don't do anything useful and are confusing (error.Ashas always been available in theerrorspackage and is already used viarequire.ErrorAs).♻️ Suggested change
"encoding/json" - "fmt" "math/big"-// Silence unused-import warning on Go versions without error.As shortcut. -var _ = fmt.Sprintf🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/internal/enroll/entradevice/enroller_test.go` around lines 249 - 250, Remove the unused fmt import and the placeholder variable var _ = fmt.Sprintf from the test; locate the import block in enroller_test.go and delete the "fmt" import along with the var _ = fmt.Sprintf line so there are no leftover unused symbols (fmt.Sprintf) in the file.
209-210: Assert onLoadPFXerrors in these setup lines.Lines 210 and 228 drop the error from
LoadPFX. If PFX generation regresses, later assertions will trip on a nilcertwith an opaque NPE. Same fix as elsewhere in the file — userequire.NoError.♻️ Suggested change
- cert, _ := LoadPFX(pfxPath, pfxPass) + cert, err := LoadPFX(pfxPath, pfxPass) + require.NoError(t, err)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/internal/enroll/entradevice/enroller_test.go` around lines 209 - 210, The test setup drops the error from LoadPFX which can lead to nil cert NPEs; update the setup where makeSelfSignedPFX and LoadPFX are called (symbols: makeSelfSignedPFX, LoadPFX, cert) to assert the LoadPFX call succeeded by using require.NoError(t, err) (or equivalent) immediately after calling LoadPFX and before using cert so failures surface clearly during test setup.
114-117: Don't silently swallow malformed enroll request bodies in the fake server.If JSON decoding fails,
fs.gotEnrollbecomes a zero-value pointer and downstream assertions (tenant-1,device-001, cert chain length, etc.) will fail with confusing "expected X got empty" messages that hide the real cause. Fail the test loudly instead.♻️ Suggested change
- mux.HandleFunc("/join/entra/enroll", func(w http.ResponseWriter, r *http.Request) { - var req enrollReq - _ = json.NewDecoder(r.Body).Decode(&req) - fs.gotEnroll = &req + mux.HandleFunc("/join/entra/enroll", func(w http.ResponseWriter, r *http.Request) { + var req enrollReq + if err := json.NewDecoder(r.Body).Decode(&req); err != nil { + t.Errorf("fake /enroll: decode body: %v", err) + w.WriteHeader(http.StatusBadRequest) + return + } + fs.gotEnroll = &req🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/internal/enroll/entradevice/enroller_test.go` around lines 114 - 117, The fake server handler registered with mux.HandleFunc("/join/entra/enroll", ...) currently ignores JSON decode errors for enrollReq and sets fs.gotEnroll to a zero-value pointer; change the handler to check the error returned by json.NewDecoder(r.Body).Decode(&req) and, if non-nil, fail the test loudly (e.g., call t.Fatalf or equivalent) with the decode error included, and only assign fs.gotEnroll = &req when decode succeeds so downstream assertions see the real cause instead of empty values.management/server/integrations/entra_device/nonce_store.go (2)
54-72: Expired-nonce GC only runs onIssue— expired entries can linger indefinitely on idle stores.If enrollments are bursty followed by a long quiet period, expired nonces sit in
entriesuntil the nextIssue. Not a security issue (they'll be rejected on consume), but a minor memory footprint concern on long-lived single-node deployments.Either run a periodic goroutine GC or also GC on
Consume. Given the existinggcLockedis already mutex-aware, calling it inConsumeevery N operations is a one-liner.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/integrations/entra_device/nonce_store.go` around lines 54 - 72, Expired nonces are only purged in Issue, so entries can accumulate when the store is idle; update Consume (method on InMemoryNonceStore) to increment ops and call gcLocked(time.Now().UTC()) under the same gcEvery condition as Issue (use s.mu lock already used in Consume) so GC runs every N operations even on consume paths; reference the existing fields/methods: s.ops, s.gcEvery, s.mu, s.entries, Consume, Issue, and gcLocked to mirror the one-line GC invocation pattern used in Issue.
74-98: The "constant-time" nonce scan doesn't buy anything and is strictly slower than a direct map lookup.Two issues with the current
Consume:
- The comment claims constant-time to avoid leaking which nonces exist, but (a) Go map iteration order is randomized so the first-match
breakstill leaks the match's iteration position, and (b)subtle.ConstantTimeCompareshort-circuits on unequal byte-length inputs. Net: it's not constant-time in a meaningful sense.- The threat model doesn't require timing resistance here — nonces are 32 random bytes from
crypto/randand base64url-encoded (~256 bits of entropy). An attacker cannot guess a value to probe for. A directmap[string]entrylookup is O(1), correct, and easier to reason about.♻️ Suggested replacement
-// Consume implements NonceStore using a constant-time equality check. func (s *InMemoryNonceStore) Consume(nonce string) (bool, error) { s.mu.Lock() defer s.mu.Unlock() - now := time.Now().UTC() - // Scan with constant-time comparison to avoid leaking which nonces exist - // via timing. The overhead is negligible given the store's small size. - var found string - for key := range s.entries { - if subtle.ConstantTimeCompare([]byte(key), []byte(nonce)) == 1 { - found = key - break - } - } - if found == "" { - return false, nil - } - e := s.entries[found] - delete(s.entries, found) - if now.After(e.expiresAt) { - return false, nil - } - return true, nil + e, ok := s.entries[nonce] + if !ok { + return false, nil + } + delete(s.entries, nonce) + if time.Now().UTC().After(e.expiresAt) { + return false, nil + } + return true, nil }Drop the
crypto/subtleimport once this is applied.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/integrations/entra_device/nonce_store.go` around lines 74 - 98, The Consume method in InMemoryNonceStore should use a direct map lookup instead of the constant-time iteration: replace the loop that uses subtle.ConstantTimeCompare with an existence check like e, ok := s.entries[nonce]; if !ok return false, nil; then delete(s.entries, nonce) and check e.expiresAt as before; update the function Consume accordingly and remove the unused crypto/subtle import. This keeps the existing mutex usage (s.mu.Lock()/defer s.mu.Unlock()) and uses the entries map and e.expiresAt identifiers to locate the changes.management/Dockerfile.entra-test (1)
38-44: Runtime image runs as root (Trivy DS-0002).For a local test image this is low-risk, but adding a non-root
USERkeeps you aligned with the productionmanagement/Dockerfileand silences the scanner.♻️ Suggested minimal hardening
RUN apt-get update && \ apt-get install -y --no-install-recommends ca-certificates curl && \ - rm -rf /var/lib/apt/lists/* + rm -rf /var/lib/apt/lists/* && \ + useradd --system --uid 10001 --no-create-home --shell /usr/sbin/nologin netbird COPY --from=builder /out/netbird-mgmt /usr/local/bin/netbird-mgmt + +USER 10001:10001🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/Dockerfile.entra-test` around lines 38 - 44, The image currently runs as root (base FROM debian:bookworm-slim) — create a non-root user and switch to it: add steps after copying the binary (COPY --from=builder /out/netbird-mgmt /usr/local/bin/netbird-mgmt) to create a dedicated user/group, chown the /usr/local/bin/netbird-mgmt binary (and any runtime dirs) to that user, and set USER to that non-root account so the container runs unprivileged; keep the changes minimal and ensure permissions allow execution by the new user.client/internal/enroll/entradevice/provider.go (1)
110-130:SignNoncefallback for opaque signers silently uses PSS defaults and won't interop with RSA PKCS1v15 TPM signers.The
default:branch callsp.signer.Sign(rand.Reader, digest, crypto.SHA256). For opaque RSA signers this produces a PKCS1v15 signature (the standard library'scrypto.Signercontract for RSA), which the server accepts — good. But the comment says "they MUST handle RSA-PSS" which is the opposite of what the fallback actually requests (a nil opts /crypto.SHA256yields PKCS1v15, not PSS). Either fix the comment or pass&rsa.PSSOptions{Hash: crypto.SHA256}for consistency with the explicit RSA branch.Non-blocking until the CNG/TPM provider lands, but worth aligning before that PR.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/internal/enroll/entradevice/provider.go` around lines 110 - 130, The fallback in PFXProvider.SignNonce currently calls p.signer.Sign(rand.Reader, digest[:], crypto.SHA256) which requests PKCS1v15 for RSA signers but the comment says signers must handle RSA-PSS; update the implementation and comment to be consistent: call p.signer.Sign(rand.Reader, digest[:], &rsa.PSSOptions{Hash: crypto.SHA256}) for the default branch (so opaque signers are requested to produce PSS like rsa.SignPSS) and adjust the comment to reflect that RSA-PSS is required (or alternatively, if you intend to allow PKCS1v15, change the comment to state PKCS1v15 fallback); reference SignNonce, PFXProvider, p.signer.Sign and rsa.SignPSS when making the change.tools/entra-test/enroll-tester/main.go (1)
143-147: Optional: mirror the enroller's dual-alphabet tolerance for nonce decoding.
client/internal/enroll/entradevice/enroller.gotolerates bothbase64.RawURLEncodingandbase64.StdEncodingwhen decoding the challenge nonce, while this tester only acceptsRawURLEncoding. The server contract is URL-safe base64, so this works today, but keeping the tester as tolerant as the real client would prevent it from being the first thing that breaks if the encoding is ever tweaked on either side.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/entra-test/enroll-tester/main.go` around lines 143 - 147, The tester currently decodes the challenge nonce using only base64.RawURLEncoding (in the block that decodes challenge.Nonce into rawNonce); update that logic to mirror enroller.go by trying RawURLEncoding first and if it fails, retry with base64.StdEncoding before calling die. Ensure you reference the same variables and error path (challenge.Nonce, rawNonce, err) and preserve the existing die("decode nonce: %v", err) behavior when both decodes fail.client/cmd/entra_enroll.go (1)
147-155: Useerrors.Asandfilepath.Dirinstead of the hand-rolled equivalents.Two small refactors:
err.(*entradevice.Error)at line 149 misses wrapped errors. The enroller returns the structured*Errordirectly in some paths, but if any future path wraps it withfmt.Errorf("...: %w", err)(and some existing paths inenroller.goalready do), the structured branch silently skips and you lose the stableCode/HTTPStatusin the user-facing output.filepathDirreimplementspath/filepath.Dirbut doesn't handle Windows drive letters, trailing separators, or"."/"/"edge cases the same way the stdlib does. This CLI already targets Windows (see doc comment aboutC:\ProgramData\NetBird\device.pfx).♻️ Proposed fix
@@ import ( "context" + "errors" "fmt" "net/url" "os" + "path/filepath" "strings" "time" @@ - if structured, ok := err.(*entradevice.Error); ok { + var structured *entradevice.Error + if errors.As(err, &structured) { cmd.PrintErrf("Enrolment rejected: %s (HTTP %d)\n %s\n", structured.Code, structured.HTTPStatus, structured.Message) return fmt.Errorf("enrolment failed: %s", structured.Code) } @@ - if err := os.MkdirAll(filepathDir(configPath), 0o755); err != nil { + if err := os.MkdirAll(filepath.Dir(configPath), 0o755); err != nil { @@ -func filepathDir(p string) string { - for i := len(p) - 1; i >= 0; i-- { - if p[i] == '\\' || p[i] == '/' { - return p[:i] - } - } - return "." -}Also applies to: 254-261
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/cmd/entra_enroll.go` around lines 147 - 155, The code currently uses a direct type assertion err.(*entradevice.Error) which misses wrapped errors and a custom filepathDir helper that reimplements stdlib behavior; update the error handling in the Entra enrolment command to use errors.As to detect a wrapped *entradevice.Error (capture into a variable, then print structured.Code/structured.HTTPStatus/structured.Message and return the stable Code) and replace usages of filepathDir with the standard filepath.Dir to ensure correct path semantics on Windows and edge cases; apply the same changes to the other occurrence noted (the block around lines 254-261) so both places handle wrapped errors and paths correctly.management/server/integrations/entra_device/manager.go (1)
271-279: Defensively guard against a nilresultwith nilerr.
PeerEnrolleris an injected interface; a misbehaving implementation that returns(nil, nil)will crash the manager at the caller (result.PeerIDon line 123 / response struct access). Cheap belt-and-braces:🛡️ Suggested fix
result, err := m.PeerEnroller.EnrollEntraDevicePeer(ctx, in) if err != nil { if e, ok := AsError(err); ok { return nil, e } return nil, NewError(CodeInternal, "peer enrolment failed", err) } + if result == nil { + return nil, NewError(CodeInternal, "peer enroller returned nil result", nil) + } return result, nil🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/integrations/entra_device/manager.go` around lines 271 - 279, The handler calling PeerEnroller.EnrollEntraDevicePeer (result, err := m.PeerEnroller.EnrollEntraDevicePeer(ctx, in)) must defensively check for a nil result even when err == nil to avoid panics when accessing result.PeerID; after the call, if result == nil { return nil, NewError(CodeInternal, "peer enrolment returned nil result", nil) } (or include context) so the manager method fails gracefully; update the method containing that call to perform this nil-check before returning result.management/server/types/entra_device_auth.go (1)
160-170: Consider defaultingEnabled: falseand enabling only after credentials are validated.
NewEntraDeviceAuthcurrently constructs withEnabled: truedespite having emptyTenantID,ClientID, andClientSecret. While the handler validates these fields before callingSaveEntraDeviceAuth(preventing the unconfigured state from persisting in production), the default is semantically misleading—a new integration should not be marked enabled until it is actually configured and validated.The challenge endpoint (
/join/entra/challenge) does not check integration status and will issue nonces regardless, which is intentional for the protocol design, but settingEnabled: trueupfront creates ambiguity about what enabled means and could lead to bugs if future code paths assume the enabled state implies valid credentials. Defaulting toEnabled: falsemakes the intent clearer: the integration is ready only after the admin handler sets credentials and flips the flag to true.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/types/entra_device_auth.go` around lines 160 - 170, NewEntraDeviceAuth currently sets Enabled: true even when TenantID/ClientID/ClientSecret are empty; change the constructor to default Enabled: false so new Entra integrations are created disabled until credentials are validated and explicitly enabled (e.g., by the admin handler that calls SaveEntraDeviceAuth). Update NewEntraDeviceAuth to set Enabled to false and ensure any code paths that expect an enabled integration (such as the admin enable flow) explicitly flip Enabled to true after validation; keep /join/entra/challenge behavior unchanged as noted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@client/internal/enroll/entradevice/provider.go`:
- Around line 101-108: DeviceID currently returns p.leaf.Subject.CommonName
verbatim which can mismatch server-side extractDeviceID (which trims whitespace
and strips a leading "CN="); update PFXProvider.DeviceID to normalize the CN the
same way as the server by trimming whitespace and removing a leading "CN="
prefix before returning, and add "strings" to the import block; reference the
PFXProvider.DeviceID function and ensure the normalized value is used for the
return and error check.
In `@docs/ENTRA_DEVICE_AUTH.md`:
- Around line 323-347: Fix the markdown lint issues in the "Live-tenant
verification results" section by adding a blank line immediately before and
after the verification-results table and by making every table row use leading
and trailing pipes consistently (ensure rows such as the one currently missing
pipes are updated); also update the fenced architecture diagram (the code block
at the earlier fenced block around line 272) to include a language hint like
```text so it renders as a plain preformatted block. Ensure changes touch the
"Live-tenant verification results" table and the fenced architecture diagram in
this document.
- Around line 319-322: The client_secret column is stored in plain-text and must
be moved to the existing encrypted-column pattern before shipping; either
implement a migration that renames or replaces client_secret with the
encrypted-column variant and updates all code paths that read/write it to use
the encryption helpers (use the same encryption utilities and column naming
convention used elsewhere in the codebase), update schema/migrations and any
tests to use the new encrypted field, and update this doc to remove the warning;
if you don't want to implement it in this PR, open a tracked issue and add its
link to this document so the rotation of client_secret to the encrypted-column
pattern is scheduled and cannot be dropped.
In `@management/server/entra_device_enroll.go`:
- Around line 207-219: The parseIP function mishandles IPv6 addresses with
brackets because it strips the port by last-colon and leaves brackets, causing
net.ParseIP to fail; update parseIP to try net.ParseIP(s) first, then if that
fails call net.SplitHostPort(s) (or attempt to SplitHostPort and on success
parse the host portion) so that inputs like "[::1]:8080" and "host:port" are
handled correctly, and return nil only if parsing still fails; reference the
parseIP function and use net.SplitHostPort to extract the host for parsing.
In `@management/server/http/handler.go`:
- Around line 94-102: Remove the redundant bypass registrations for the Entra
device enrolment endpoints: delete the calls to
bypass.AddBypassPath("/join/entra/challenge") and
bypass.AddBypassPath("/join/entra/enroll") since these routes are mounted on
rootRouter (unauthenticated) and authMiddleware is only applied to the /api
subrouter; leave other bypass.AddBypassPath calls (e.g., for /api/instance,
/api/setup) intact. Ensure no other code paths expect those bypass entries and
run tests to confirm behavior.
In `@management/server/http/handlers/entra_device_auth/wiring.go`:
- Around line 49-110: Install currently permits a nil Permissions silently via
buildPermissionChecker returning an always-true checker, which creates a
fail-open auth bypass; update Install to require w.Permissions != nil (return an
error like "entra_device_auth.Install: Permissions is nil") or introduce an
explicit opt-in flag on Wiring (e.g. InsecureAllowAllForTests bool) and only
allow the permit-all branch in buildPermissionChecker when that flag is true;
adjust buildPermissionChecker to return a deny-by-default checker when pm is nil
and ensure Install enforces the chosen policy (check Wiring.Permissions or
Wiring.InsecureAllowAllForTests) before constructing the Handler.
In `@management/server/integrations/entra_device/graph_client.go`:
- Around line 205-213: In the pagination loop handling wrap.NextLink in
graph_client.go, don't silently break when strings.HasPrefix(wrap.NextLink,
c.baseURL()) is false; instead return an error (or at least log the unexpected
host and return an error) so callers know pagination was truncated; modify the
block that currently does "break" to construct a descriptive error that includes
wrap.NextLink and c.baseURL(), log it via the existing logger and return it up
the call path (the change should touch the pagination loop around wrap.NextLink,
the code that currently sets path/q, and the function's error return flow).
In `@management/server/integrations/entra_device/manager.go`:
- Around line 268-270: The code currently sets EntraDeviceMapping =
MatchedMappingIDs[0] which is misleading for union resolution; change the
assignment so that when resolved.ResolutionMode == "union" you leave
EntraDeviceMapping empty/cleared (and rely on resolved.MatchedMappingIDs for the
full set), and only set EntraDeviceMapping = resolved.MatchedMappingIDs[0] for
strict_priority (or equivalent single-winner modes); update the branch around
the existing assignment (referencing EntraDeviceMapping, MatchedMappingIDs, and
ResolutionMode) and ensure downstream audit/event usage (e.g.,
PeerAddedWithEntraDevice / EnrollPeerInput.EntraDeviceMapping) will instead
reference MatchedMappingIDs for union cases.
- Around line 183-212: The cert validation currently allows a missing
EntraDeviceID to pass through; after calling m.Cert.Validate in
validateCertAndDeviceID, explicitly check if identity.EntraDeviceID == "" and
return a NewError(CodeInvalidCertChain, "certificate missing Entra device ID",
nil) so certs that lack the device ID extracted by extractDeviceID (from
Cert.Validate) are rejected early; do this before any comparison with
req.EntraDeviceID or calling graph.Device.
- Around line 66-76: NewManager currently calls NewCertValidator(nil, nil)
leaving Cert.TrustRoots nil so chain verification is skipped; update
NewManager/Manager to fail-fast when m.Cert.TrustRoots == nil (or require an
explicit opt-in flag like Manager.AllowUntrustedCertChains) by returning an
error or panicking at construction or by rejecting in Enroll if TrustRoots is
nil, or alternatively initialize Cert.TrustRoots with the Microsoft Device Auth
CA set by default; also update the NewManager godoc and README to state that
TrustRoots must be configured for production and document the
AllowUntrustedCertChains override if you add it.
In `@management/server/integrations/entra_device/README.md`:
- Around line 57-69: The "Phase 1" checklist in
management/server/integrations/entra_device/README.md is out of date and marks
items as incomplete even though they are implemented (e.g., wiring into the main
router via the entra-join handler, integration in
DefaultAccountManager.EnrollEntraDevicePeer, and admin CRUD handlers under
management/server/http/handlers/entra_device_auth/). Update the checklist to
reflect current status by toggling the relevant items to complete or remove the
Phase 1 section and instead add a single pointer to docs/ENTRA_DEVICE_AUTH.md as
the canonical source of truth so readers won't be misled.
In `@tools/entra-test/config/management.json`:
- Line 37: The file contains a hardcoded DataStoreEncryptionKey value
("DataStoreEncryptionKey") in the JSON; add a short explicit comment near that
field (or an entry in tools/entra-test/TESTING.md) stating this key is for the
ephemeral test harness only and MUST NOT be used in production or copied as a
template, and mention to rotate/remove it if repurposed; ensure the comment
references the DataStoreEncryptionKey name so reviewers and future readers see
the warning clearly.
- Around line 2-24: The test fixture currently points at production hosts in the
"Stuns", "TURNConfig.Turns", and "Signal" entries (entries using keys "Proto"
and "URI"); update these to target the local docker-compose test services (or
remove them if in-harness peer connectivity isn’t required): replace
"stun:stun.wiretrustee.com:3468" and any "turn:turn.wiretrustee.com:..." URIs
under "TURNConfig.Turns" with the local STUN/TURN service hostnames/ports used
by your docker-compose, and change the "Signal" entry to the local signal
service and correct protocol for gRPC (set "Proto" to the gRPC-appropriate value
and point "URI" at the docker-compose signal endpoint) so the harness uses local
test services rather than production hosts.
In `@tools/entra-test/seed-account/main.go`:
- Around line 46-67: The seed currently always calls types.NewNetwork() and
assigns it to Account.Network before upserting with db.Save(acct), which causes
re-seeding to re-randomize the subnet; change the flow to first attempt to load
an existing types.Account by the provided account ID from the DB, and if an
account exists keep its Network (do not call types.NewNetwork()), otherwise
create the new network and assign it; if you still want to permit overwrites, at
minimum emit a warning via the logger when you are about to overwrite an
existing Account.Network (reference types.NewNetwork, types.Account.Network and
the db.Save(acct) upsert path to locate where to add the load-check and
conditional assignment).
---
Duplicate comments:
In `@management/server/integrations/entra_device/manager.go`:
- Around line 119-136: The current flow calls enrollPeer(...) first and then
issueBootstrapToken(...), which can leave an orphan peer if token issuance
fails; you should either (preferred) generate the token before creating the peer
and extend EnrollPeerInput / EnrollEntraDevicePeer to accept and persist the
bootstrap token within the same DB transaction inside enrollPeer, or (minimal
change) implement best-effort compensation: on error from
issueBootstrapToken(ctx, result.PeerID) call a peer-deletion routine (e.g.,
deletePeer/deleteEntraDevicePeer by PeerID) to remove the just-created peer, log
any deletion error, and then return the original token issuance error; update
the call sites and types referenced (enrollPeer, issueBootstrapToken,
EnrollPeerInput/EnrollEntraDevicePeer, and any deletePeer method) accordingly so
retries won’t hit duplicate-pubkey collisions.
- Around line 159-168: In consumeNonce, normalize the incoming nonce once (e.g.,
n := strings.TrimSpace(encoded)) and use that normalized value for both
m.NonceStore.Consume and decodeNonceBytes so you don't consume a trimmed nonce
but attempt to decode the raw string; update calls to m.NonceStore.Consume(n)
and return decodeNonceBytes(n) while keeping existing error handling in NewError
unchanged.
In `@management/server/integrations/entra_device/store.go`:
- Around line 31-138: The MemoryStore claims goroutine-safety but uses plain
maps without synchronization; add a sync.RWMutex field (e.g., mu) to MemoryStore
and protect all accessors: use mu.RLock()/RUnlock() in read-only methods
(GetEntraDeviceAuth, GetEntraDeviceAuthByTenant, ListEntraDeviceMappings,
GetEntraDeviceMapping) and mu.Lock()/Unlock() in mutating methods
(SaveEntraDeviceAuth, DeleteEntraDeviceAuth, SaveEntraDeviceMapping,
DeleteEntraDeviceMapping, StoreBootstrapToken) and perform the
read-compare-delete in ConsumeBootstrapToken entirely under mu.Lock() to ensure
single-use semantics; update the constructor NewMemoryStore to initialize the
mutex (no other behavioral changes).
In `@management/server/types/entra_device_auth.go`:
- Around line 40-63: The ClientSecret field on the EntraDeviceAuth struct is
stored as a plain string; change it to either use the project's encrypted
serializer or to store a secret-reference resolved from the secret store:
replace the ClientSecret string field (in EntraDeviceAuth) with the project's
encrypted/serialized secret wrapper (matching other sensitive fields' serializer
tag) or with a reference type (e.g., SecretRef) and update persistence/loader
logic to resolve it from KMS/Vault/env; ensure GORM tags reflect the serializer
or reference column and adjust any create/update/load code paths that read/write
EntraDeviceAuth.ClientSecret accordingly.
---
Nitpick comments:
In `@client/cmd/entra_enroll.go`:
- Around line 147-155: The code currently uses a direct type assertion
err.(*entradevice.Error) which misses wrapped errors and a custom filepathDir
helper that reimplements stdlib behavior; update the error handling in the Entra
enrolment command to use errors.As to detect a wrapped *entradevice.Error
(capture into a variable, then print
structured.Code/structured.HTTPStatus/structured.Message and return the stable
Code) and replace usages of filepathDir with the standard filepath.Dir to ensure
correct path semantics on Windows and edge cases; apply the same changes to the
other occurrence noted (the block around lines 254-261) so both places handle
wrapped errors and paths correctly.
In `@client/internal/enroll/entradevice/enroller_test.go`:
- Around line 249-250: Remove the unused fmt import and the placeholder variable
var _ = fmt.Sprintf from the test; locate the import block in enroller_test.go
and delete the "fmt" import along with the var _ = fmt.Sprintf line so there are
no leftover unused symbols (fmt.Sprintf) in the file.
- Around line 209-210: The test setup drops the error from LoadPFX which can
lead to nil cert NPEs; update the setup where makeSelfSignedPFX and LoadPFX are
called (symbols: makeSelfSignedPFX, LoadPFX, cert) to assert the LoadPFX call
succeeded by using require.NoError(t, err) (or equivalent) immediately after
calling LoadPFX and before using cert so failures surface clearly during test
setup.
- Around line 114-117: The fake server handler registered with
mux.HandleFunc("/join/entra/enroll", ...) currently ignores JSON decode errors
for enrollReq and sets fs.gotEnroll to a zero-value pointer; change the handler
to check the error returned by json.NewDecoder(r.Body).Decode(&req) and, if
non-nil, fail the test loudly (e.g., call t.Fatalf or equivalent) with the
decode error included, and only assign fs.gotEnroll = &req when decode succeeds
so downstream assertions see the real cause instead of empty values.
In `@client/internal/enroll/entradevice/provider.go`:
- Around line 110-130: The fallback in PFXProvider.SignNonce currently calls
p.signer.Sign(rand.Reader, digest[:], crypto.SHA256) which requests PKCS1v15 for
RSA signers but the comment says signers must handle RSA-PSS; update the
implementation and comment to be consistent: call p.signer.Sign(rand.Reader,
digest[:], &rsa.PSSOptions{Hash: crypto.SHA256}) for the default branch (so
opaque signers are requested to produce PSS like rsa.SignPSS) and adjust the
comment to reflect that RSA-PSS is required (or alternatively, if you intend to
allow PKCS1v15, change the comment to state PKCS1v15 fallback); reference
SignNonce, PFXProvider, p.signer.Sign and rsa.SignPSS when making the change.
In `@client/internal/enroll/entradevice/state.go`:
- Around line 11-36: The EntraEnrollState struct is duplicated between
EntraEnrollState in this file and profilemanager.EntraEnrollState; extract the
canonical type into a small shared package (e.g., enroll/entradevice/state) and
update both usages to import that package (replace the local struct declarations
with the shared type), or alternatively add a unit test that marshals the
enroller's EntraEnrollState and unmarshals into profilemanager.EntraEnrollState
(and vice versa) to detect drift; locate occurrences by the type name
EntraEnrollState and the profilemanager.EntraEnrollState reference and update
imports/usages accordingly.
In `@management/Dockerfile.entra-test`:
- Around line 38-44: The image currently runs as root (base FROM
debian:bookworm-slim) — create a non-root user and switch to it: add steps after
copying the binary (COPY --from=builder /out/netbird-mgmt
/usr/local/bin/netbird-mgmt) to create a dedicated user/group, chown the
/usr/local/bin/netbird-mgmt binary (and any runtime dirs) to that user, and set
USER to that non-root account so the container runs unprivileged; keep the
changes minimal and ensure permissions allow execution by the new user.
In `@management/server/integrations/entra_device/manager.go`:
- Around line 271-279: The handler calling PeerEnroller.EnrollEntraDevicePeer
(result, err := m.PeerEnroller.EnrollEntraDevicePeer(ctx, in)) must defensively
check for a nil result even when err == nil to avoid panics when accessing
result.PeerID; after the call, if result == nil { return nil,
NewError(CodeInternal, "peer enrolment returned nil result", nil) } (or include
context) so the manager method fails gracefully; update the method containing
that call to perform this nil-check before returning result.
In `@management/server/integrations/entra_device/nonce_store.go`:
- Around line 54-72: Expired nonces are only purged in Issue, so entries can
accumulate when the store is idle; update Consume (method on InMemoryNonceStore)
to increment ops and call gcLocked(time.Now().UTC()) under the same gcEvery
condition as Issue (use s.mu lock already used in Consume) so GC runs every N
operations even on consume paths; reference the existing fields/methods: s.ops,
s.gcEvery, s.mu, s.entries, Consume, Issue, and gcLocked to mirror the one-line
GC invocation pattern used in Issue.
- Around line 74-98: The Consume method in InMemoryNonceStore should use a
direct map lookup instead of the constant-time iteration: replace the loop that
uses subtle.ConstantTimeCompare with an existence check like e, ok :=
s.entries[nonce]; if !ok return false, nil; then delete(s.entries, nonce) and
check e.expiresAt as before; update the function Consume accordingly and remove
the unused crypto/subtle import. This keeps the existing mutex usage
(s.mu.Lock()/defer s.mu.Unlock()) and uses the entries map and e.expiresAt
identifiers to locate the changes.
In `@management/server/types/entra_device_auth.go`:
- Around line 160-170: NewEntraDeviceAuth currently sets Enabled: true even when
TenantID/ClientID/ClientSecret are empty; change the constructor to default
Enabled: false so new Entra integrations are created disabled until credentials
are validated and explicitly enabled (e.g., by the admin handler that calls
SaveEntraDeviceAuth). Update NewEntraDeviceAuth to set Enabled to false and
ensure any code paths that expect an enabled integration (such as the admin
enable flow) explicitly flip Enabled to true after validation; keep
/join/entra/challenge behavior unchanged as noted.
In `@tools/entra-test/enroll-tester/main.go`:
- Around line 143-147: The tester currently decodes the challenge nonce using
only base64.RawURLEncoding (in the block that decodes challenge.Nonce into
rawNonce); update that logic to mirror enroller.go by trying RawURLEncoding
first and if it fails, retry with base64.StdEncoding before calling die. Ensure
you reference the same variables and error path (challenge.Nonce, rawNonce, err)
and preserve the existing die("decode nonce: %v", err) behavior when both
decodes fail.
In `@tools/entra-test/seed-account/main.go`:
- Around line 110-120: The defaultDSN function currently returns hard-coded
local dev credentials (using envOrDefault) which can silently be used against a
non-dev DB; change it so the insecure fallback is only used when an explicit
opt-in is provided (e.g., add and check a boolean flag like
allowInsecureDefaults or a CLI flag `-i-know-this-is-insecure` before returning
the baked-in user/pass/db), or alternatively load the credentials from the
docker-compose/.env file instead of hard-coding; update defaultDSN to consult
that flag (or the parsed .env) and fail fast with a clear error if neither a DSN
nor explicit opt-in is present.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 41c325b7-7d63-40f4-b533-638100a0091a
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (48)
client/cmd/entra_enroll.goclient/cmd/root.goclient/internal/enroll/entradevice/enroller.goclient/internal/enroll/entradevice/enroller_test.goclient/internal/enroll/entradevice/provider.goclient/internal/enroll/entradevice/state.goclient/internal/profilemanager/config.godocs/ENTRA_DEVICE_AUTH.mdgo.modidp/dex/config.goidp/dex/provider.goidp/dex/sqlite_cgo.goidp/dex/sqlite_nocgo.gomanagement/Dockerfile.entra-testmanagement/server/entra_device_enroll.gomanagement/server/http/handler.gomanagement/server/http/handlers/entra_device_auth/handler.gomanagement/server/http/handlers/entra_device_auth/wiring.gomanagement/server/http/handlers/entra_join/handler.gomanagement/server/http/handlers/entra_join/handler_test.gomanagement/server/integrations/entra_device/README.mdmanagement/server/integrations/entra_device/activity.gomanagement/server/integrations/entra_device/asn1.gomanagement/server/integrations/entra_device/cert_validator.gomanagement/server/integrations/entra_device/cert_validator_test.gomanagement/server/integrations/entra_device/errors.gomanagement/server/integrations/entra_device/graph_client.gomanagement/server/integrations/entra_device/manager.gomanagement/server/integrations/entra_device/manager_test.gomanagement/server/integrations/entra_device/nonce_store.gomanagement/server/integrations/entra_device/nonce_store_test.gomanagement/server/integrations/entra_device/resolution.gomanagement/server/integrations/entra_device/resolution_test.gomanagement/server/integrations/entra_device/sql_store.gomanagement/server/integrations/entra_device/store.gomanagement/server/integrations/entra_device/store_test.gomanagement/server/integrations/entra_device/types.gomanagement/server/permissions/modules/module.gomanagement/server/types/entra_device_auth.gotools/entra-test/.gitignoretools/entra-test/TESTING.mdtools/entra-test/config/management.jsontools/entra-test/docker-compose.ymltools/entra-test/enroll-tester/demo.gotools/entra-test/enroll-tester/main.gotools/entra-test/enroll-tester/tls.gotools/entra-test/make-pfx/main.gotools/entra-test/seed-account/main.go
✅ Files skipped from review due to trivial changes (6)
- tools/entra-test/.gitignore
- go.mod
- management/server/integrations/entra_device/nonce_store_test.go
- tools/entra-test/docker-compose.yml
- management/server/integrations/entra_device/types.go
- tools/entra-test/TESTING.md
🚧 Files skipped from review as they are similar to previous changes (13)
- management/server/integrations/entra_device/asn1.go
- tools/entra-test/enroll-tester/demo.go
- management/server/integrations/entra_device/activity.go
- management/server/integrations/entra_device/resolution_test.go
- management/server/permissions/modules/module.go
- management/server/http/handlers/entra_device_auth/handler.go
- tools/entra-test/make-pfx/main.go
- management/server/http/handlers/entra_join/handler_test.go
- client/internal/enroll/entradevice/enroller.go
- management/server/integrations/entra_device/store_test.go
- management/server/integrations/entra_device/sql_store.go
- idp/dex/sqlite_nocgo.go
- management/server/integrations/entra_device/manager_test.go
Security / correctness fixes surfaced by CodeRabbit on PR netbirdio#5977: * cert_validator: reject leaves with empty Subject CN instead of returning an empty EntraDeviceID; manager.validateCertAndDeviceID fails closed on the same condition as a second line of defence. * graph_client: escape deviceID for OData single-quoted string literals in Device() and IsCompliant() to close an injection vector (pair with the dev-only TrustRoots=nil path). Pagination now fails closed when @odata.nextLink host differs from the configured base URL instead of silently truncating group enumeration. * manager.consumeNonce: normalise the encoded nonce once so Consume and decodeNonceBytes see the same value (avoids burning the nonce on trailing whitespace). * manager.enrollPeer -> issueBootstrapToken: on token-issue failure after peer creation, best-effort compensate by deleting the orphan peer. Required a new DeletePeer hook on the PeerEnroller interface, wired through DefaultAccountManager via the existing deletePeers() helper. Test doubles get a no-op stub. * manager.enrollPeer: under union resolution, leave EntraDeviceMapping empty instead of misleading audit metadata with an arbitrary MatchedMappingIDs[0]. * MemoryStore: add sync.Mutex and lock every receiver so the doc-claimed goroutine-safety actually holds; the ConsumeBootstrapToken read+compare+delete is now atomic. * SQLStore.ConsumeBootstrapToken: validate the supplied token (and prior existence) BEFORE deleting so a caller with a wrong token cannot evict the real entry. * entra_join handler: switch to http.MaxBytesReader so oversized bodies produce a real 413 Payload Too Large instead of a silent 400 invalid_json. realIP now only consults X-Forwarded-For / X-Real-IP when Handler.TrustForwardedHeaders is explicitly set (opt-in trusted-proxy gate for an unauthenticated endpoint); XFF left-most entry is taken, RemoteAddr has its port stripped. * entra_device_enroll.parseIP: handle [host]:port IPv6 form via net.SplitHostPort. * entra_device_auth.wiring: fail-closed in Install when Permissions is nil; the permit-all checker is now only reachable via an explicit InsecureAllowAllForTests opt-in. * http.handler: drop the redundant /join/entra/* bypass registrations (those routes are on rootRouter and never cross authMiddleware); upgrade the best-effort install logs from Warnf to Errorf and include concrete types so a wiring mismatch is actually visible. * client/entradevice.PFXProvider.DeviceID: normalise Subject CN the same way the server does (trim whitespace, strip a leading CN=) so the client-supplied device id and the server-extracted one cannot mismatch over whitespace. Test harness + docs: * management/Dockerfile.entra-test: healthcheck switched from GET /join/entra/challenge (stateful, allocates a single-use nonce every 30s and 404s on fresh containers) to a stateless TCP probe against the cmux-multiplexed port. * tools/entra-test/TESTING.md: replace stale 33081 references with the actual multiplexed port 33073 everywhere (instructions, curl examples, enroll-tester --url, troubleshooting). * tools/entra-test/config/management.json: null out STUN/TURN/Signal entries that pointed at production wiretrustee.com hosts; add an explicit _comment noting the DataStoreEncryptionKey is publicly visible and must NOT be reused in production. * tools/entra-test/seed-account: reuse the existing Account.Network on re-seed so re-running the harness mid-debug does not randomise the subnet out from under already-allocated peers. * entra_device/README.md + docs/ENTRA_DEVICE_AUTH.md: refresh the stale Phase 1 checklist; add a "Known production gaps" section tracking plaintext ClientSecret, in-memory bootstrap tokens (HA/restart) and the TrustRoots=nil default as explicit follow-ups. Fix markdown lint (table pipe style / blank-line separators / fenced-code language identifier for the architecture diagram).
|
@mlsmaycon @braginini @pascal-fischer @lixmal — when one of you has a moment, could I get a maintainer look on this? Status since the PR was opened:
The net of
Three known production gaps deliberately deferred to follow-ups so this PR stays reviewable:
All three are explicitly documented in Full re-test matrix (server unit tests, handler integration tests, client tests, and the live-tenant verification matrix in |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
tools/entra-test/seed-account/main.go (1)
63-84: Re-seed resets untracked scalar fields on an existing account row.
db.Save(acct)with a freshly constructedtypes.Accountwill overwriteCreatedAt,DomainCategory,IsDomainPrimaryAccount, embeddedDNSSettings, etc. to zero/now on every run. For a harness seeder this is usually acceptable, but if testers rely on stableCreatedAtor tweak other fields between runs they'll lose them silently. Consider usingclause.OnConflict{DoNothing: true}(or aWhere(...).Updates(...)on just the fields you own) when the account already exists, and/or preservingexisting.CreatedAtwhen reusingexisting.Network.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/entra-test/seed-account/main.go` around lines 63 - 84, The current seeder constructs a new types.Account (acct) and calls db.Save(acct), which overwrites fields like CreatedAt, DomainCategory, IsDomainPrimaryAccount and embedded DNSSettings on re-seed; change the save logic to avoid clobbering existing rows by either using a CREATE-ON-CONFLICT do-nothing/merge strategy (use gorm's clause.OnConflict{DoNothing:true} or a targeted OnConflict.Update with only the fields you intend to update) or by first loading the existing account (e.g., query by Id), preserving fields such as existing.CreatedAt and existing.Network when present, and then performing a selective Update/Save of owned fields only instead of blindly db.Save(acct).management/server/integrations/entra_device/graph_client.go (1)
162-223: OData escape + fail-closed pagination LGTM.
odataEscapecorrectly handles the single-quote case per OData v4, and the pagination branch now returns an error instead of silently truncating the group list — exactly what was needed to avoid under-enrolling a device.One small optional nit: at lines 104/147
body, _ := io.ReadAll(resp.Body)drops the read error. If the body is truncated mid-stream you'd log/return a misleading "unexpected EOF during JSON decode" rather than the underlying network error. Low priority.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/integrations/entra_device/graph_client.go` around lines 162 - 223, In graphGET, the code currently ignores the error from io.ReadAll(resp.Body) (body, _ := io.ReadAll(resp.Body)); change this to capture and handle the error (body, err := io.ReadAll(resp.Body)) and if err != nil return or log a wrapped error (including response status and URL) instead of proceeding to json decode — this ensures network/truncate errors surface (refer to function graphGET and the resp.Body read) and avoids misleading "unexpected EOF during JSON decode" messages.management/server/integrations/entra_device/sql_store.go (1)
161-177: Validate-before-delete fix looks correct; consider constant-time comparison.The DoS concern from the prior round (any caller with a known peerID burning the legitimate entry with a wrong token) is cleanly resolved by deleting only after the token matches. One follow-up worth considering:
entry.token != tokenat Line 168 is a non-constant-time comparison. The practical risk is small because bootstrap tokens are 256 bits of entropy hex-encoded, butcrypto/subtle.ConstantTimeCompareis the idiomatic default for secret equality and costs nothing.🛡️ Proposed tightening
import ( "context" + "crypto/subtle" "errors" "sync" "time" @@ - if entry.token != token { - return false, nil - } + if subtle.ConstantTimeCompare([]byte(entry.token), []byte(token)) != 1 { + return false, nil + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/integrations/entra_device/sql_store.go` around lines 161 - 177, Replace the plain string comparison in ConsumeBootstrapToken with a constant-time comparison: convert both entry.token and the incoming token to byte slices and use crypto/subtle.ConstantTimeCompare to decide equality, keeping the current logic of deleting the map entry only after a successful match (delete(s.tokens, peerID) remains where it is) and preserving the expiry check that follows; update imports to include crypto/subtle.management/server/entra_device_enroll.go (1)
108-152: Consider a warn-level log when the retry loop exhausts.The IP/DNS-label retry loop (max 10 attempts) is sound, but a persistent exhaustion is operationally interesting — it almost always signals either a very narrow
network.Netpool or a DNS-label hot-spot worth alerting on. The per-attemptTracefat Line 144 is too quiet for that.♻️ Optional refactor
if err != nil { + log.WithContext(ctx).Warnf("entra enrolment IP/label allocation exhausted after %d attempts for account %s: %v", + maxAttempts, input.AccountID, err) return nil, fmt.Errorf("failed to add entra-enrolled peer after %d attempts: %w", maxAttempts, err) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/entra_device_enroll.go` around lines 108 - 152, The retry loop allocating IP/DNS (for example variables maxAttempts, freeIP, freeLabel and function isUniqueConstraintError) currently only traces per-attempt collisions but does not warn when all attempts fail; modify the post-loop error path (the block that checks if err != nil after the for-loop) to emit a warn-level log via log.WithContext(ctx).WithFields(...) including dns_label, ip and the number of attempts (maxAttempts) and the final error before returning the formatted error, so exhausted-retry conditions are visible at warn level.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/ENTRA_DEVICE_AUTH.md`:
- Around line 35-37: The zero-touch onboarding example uses the wrong command:
replace the scheduled task invocation of "netbird up --management-url
https://<your-mgmt>/join/entra" with the one-shot Entra enrollment command
"netbird entra-enroll" (which reads the PFX and tenant flags and performs
enrollment against /join/entra); if the daemon should also run afterwards,
append a separate "netbird up" step to start the long-running agent. Ensure the
doc mentions the enrollment is one-shot (entra-enroll) and that "netbird up" is
used only to start the daemon.
In `@management/server/http/handlers/entra_join/handler.go`:
- Around line 20-33: The Handler's TrustForwardedHeaders field is never set by
the wiring layer; add a TrustForwardedHeaders bool to the Wiring struct used by
entra_device_auth.Install() and propagate it into the Handler after creating it
with NewHandler. Update the Wiring type to include TrustForwardedHeaders, change
the call site in wiring.go so that after handler := NewHandler(m) you set
handler.TrustForwardedHeaders = wiring.TrustForwardedHeaders (or pass it into
Install if that pattern is used), and ensure the Install function
accepts/forwards the Wiring so realIP() will honor configured
X-Forwarded-For/X-Real-IP behavior.
In `@management/server/integrations/entra_device/cert_validator.go`:
- Around line 95-105: The VerifyOptions currently sets KeyUsages to include
x509.ExtKeyUsageAny which disables EKU checks; update the opts construction in
cert_validator.go (the VerifyOptions passed to leaf.Verify) to remove
x509.ExtKeyUsageAny and use only x509.ExtKeyUsageClientAuth (i.e., KeyUsages:
[]x509.ExtKeyUsage{x509.ExtKeyUsageClientAuth}) so certificate EKU is enforced
when calling leaf.Verify(opts); adjust any related tests or callers expecting
lax EKU behavior.
---
Nitpick comments:
In `@management/server/entra_device_enroll.go`:
- Around line 108-152: The retry loop allocating IP/DNS (for example variables
maxAttempts, freeIP, freeLabel and function isUniqueConstraintError) currently
only traces per-attempt collisions but does not warn when all attempts fail;
modify the post-loop error path (the block that checks if err != nil after the
for-loop) to emit a warn-level log via log.WithContext(ctx).WithFields(...)
including dns_label, ip and the number of attempts (maxAttempts) and the final
error before returning the formatted error, so exhausted-retry conditions are
visible at warn level.
In `@management/server/integrations/entra_device/graph_client.go`:
- Around line 162-223: In graphGET, the code currently ignores the error from
io.ReadAll(resp.Body) (body, _ := io.ReadAll(resp.Body)); change this to capture
and handle the error (body, err := io.ReadAll(resp.Body)) and if err != nil
return or log a wrapped error (including response status and URL) instead of
proceeding to json decode — this ensures network/truncate errors surface (refer
to function graphGET and the resp.Body read) and avoids misleading "unexpected
EOF during JSON decode" messages.
In `@management/server/integrations/entra_device/sql_store.go`:
- Around line 161-177: Replace the plain string comparison in
ConsumeBootstrapToken with a constant-time comparison: convert both entry.token
and the incoming token to byte slices and use crypto/subtle.ConstantTimeCompare
to decide equality, keeping the current logic of deleting the map entry only
after a successful match (delete(s.tokens, peerID) remains where it is) and
preserving the expiry check that follows; update imports to include
crypto/subtle.
In `@tools/entra-test/seed-account/main.go`:
- Around line 63-84: The current seeder constructs a new types.Account (acct)
and calls db.Save(acct), which overwrites fields like CreatedAt, DomainCategory,
IsDomainPrimaryAccount and embedded DNSSettings on re-seed; change the save
logic to avoid clobbering existing rows by either using a CREATE-ON-CONFLICT
do-nothing/merge strategy (use gorm's clause.OnConflict{DoNothing:true} or a
targeted OnConflict.Update with only the fields you intend to update) or by
first loading the existing account (e.g., query by Id), preserving fields such
as existing.CreatedAt and existing.Network when present, and then performing a
selective Update/Save of owned fields only instead of blindly db.Save(acct).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 557eb50c-28ba-4ee1-b589-2050bfa49b1d
📒 Files selected for processing (19)
client/internal/enroll/entradevice/provider.godocs/ENTRA_DEVICE_AUTH.mdmanagement/Dockerfile.entra-testmanagement/server/entra_device_enroll.gomanagement/server/http/handler.gomanagement/server/http/handlers/entra_device_auth/wiring.gomanagement/server/http/handlers/entra_join/handler.gomanagement/server/http/handlers/entra_join/handler_test.gomanagement/server/integrations/entra_device/README.mdmanagement/server/integrations/entra_device/cert_validator.gomanagement/server/integrations/entra_device/graph_client.gomanagement/server/integrations/entra_device/manager.gomanagement/server/integrations/entra_device/manager_test.gomanagement/server/integrations/entra_device/sql_store.gomanagement/server/integrations/entra_device/store.gotools/entra-test/TESTING.mdtools/entra-test/config/management.jsontools/entra-test/enroll-tester/demo.gotools/entra-test/seed-account/main.go
✅ Files skipped from review due to trivial changes (2)
- management/server/integrations/entra_device/README.md
- tools/entra-test/TESTING.md
🚧 Files skipped from review as they are similar to previous changes (2)
- client/internal/enroll/entradevice/provider.go
- tools/entra-test/enroll-tester/demo.go
| // Handler serves the /join/entra/* routes. | ||
| type Handler struct { | ||
| Manager *ed.Manager | ||
|
|
||
| // TrustForwardedHeaders enables reading X-Forwarded-For / X-Real-IP from | ||
| // inbound requests. Should ONLY be enabled when the management server | ||
| // sits behind a known-good reverse proxy that strips client-supplied | ||
| // instances of these headers; otherwise callers can dictate the source | ||
| // IP persisted on the peer (the /join/entra path is unauthenticated). | ||
| TrustForwardedHeaders bool | ||
| } | ||
|
|
||
| // NewHandler constructs a handler using the given manager. | ||
| func NewHandler(m *ed.Manager) *Handler { return &Handler{Manager: m} } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm the wiring layer sets Handler.TrustForwardedHeaders from config.
rg -nP -C3 'NewHandler\s*\(|TrustForwardedHeaders' --type=goRepository: netbirdio/netbird
Length of output: 9852
🏁 Script executed:
cat -n management/server/http/handlers/entra_device_auth/wiring.go | sed -n '75,90p'Repository: netbirdio/netbird
Length of output: 753
🏁 Script executed:
rg -nP 'joinHandler.*TrustForwardedHeaders|TrustForwardedHeaders.*=' --type=go -A2 -B2Repository: netbirdio/netbird
Length of output: 43
🏁 Script executed:
head -n 100 management/server/http/handlers/entra_device_auth/wiring.go | tail -n 50Repository: netbirdio/netbird
Length of output: 2050
🏁 Script executed:
rg -n 'TrustForwardedHeaders' --type=goRepository: netbirdio/netbird
Length of output: 503
🏁 Script executed:
rg -n 'type Wiring struct' -A 20 management/server/http/handlers/entra_device_auth/Repository: netbirdio/netbird
Length of output: 2278
🏁 Script executed:
rg -n 'entra_device_auth.Install|entra_device_auth\.Install' --type=go -B 3 -A 3Repository: netbirdio/netbird
Length of output: 2687
🏁 Script executed:
rg -n 'entrajoin\.NewHandler|&entrajoin\.Handler' --type=goRepository: netbirdio/netbird
Length of output: 256
🏁 Script executed:
sed -n '220,240p' management/server/http/handler.goRepository: netbirdio/netbird
Length of output: 619
TrustForwardedHeaders is not wired from configuration—always defaults to false.
The TrustForwardedHeaders field exists and is correctly used in realIP(), but the wiring layer never sets it. Neither entra_device_auth.Install() (the production path) nor NewHandler() accept or configure this value, so deployments behind a reverse proxy will default to ignoring X-Forwarded-For / X-Real-IP headers. The Wiring struct passed to Install needs a TrustForwardedHeaders field, and the handler instantiation in wiring.go should assign it after calling NewHandler.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@management/server/http/handlers/entra_join/handler.go` around lines 20 - 33,
The Handler's TrustForwardedHeaders field is never set by the wiring layer; add
a TrustForwardedHeaders bool to the Wiring struct used by
entra_device_auth.Install() and propagate it into the Handler after creating it
with NewHandler. Update the Wiring type to include TrustForwardedHeaders, change
the call site in wiring.go so that after handler := NewHandler(m) you set
handler.TrustForwardedHeaders = wiring.TrustForwardedHeaders (or pass it into
Install if that pattern is used), and ensure the Install function
accepts/forwards the Wiring so realIP() will honor configured
X-Forwarded-For/X-Real-IP behavior.
| opts := x509.VerifyOptions{ | ||
| Roots: v.TrustRoots, | ||
| Intermediates: intermediates, | ||
| CurrentTime: now, | ||
| KeyUsages: []x509.ExtKeyUsage{x509.ExtKeyUsageClientAuth, x509.ExtKeyUsageAny}, | ||
| } | ||
| if _, err := leaf.Verify(opts); err != nil { | ||
| return nil, NewError(CodeInvalidCertChain, | ||
| "certificate chain did not verify against configured trust roots", err) | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Go's crypto/x509 VerifyOptions.KeyUsages, what is the behavior of including ExtKeyUsageAny alongside ExtKeyUsageClientAuth? Does it match leaves with any EKU or only those with the anyExtendedKeyUsage OID?
💡 Result:
Including ExtKeyUsageAny alongside ExtKeyUsageClientAuth in VerifyOptions.KeyUsages sets anyKeyUsage to true because the loop checks for ExtKeyUsageAny and breaks upon finding it. When anyKeyUsage is true, the chain is accepted without calling checkChainForKeyUsage on the chain certificates. This means it matches leaves (and chains) with any EKU, as no specific EKU matching is performed on the chain. It does not restrict to only those with the anyExtendedKeyUsage OID (2.5.29.37); instead, ExtKeyUsageAny in KeyUsages overrides other specified usages like ExtKeyUsageClientAuth, accepting any key usage.
Citations:
- 1: https://go.dev/src/crypto/x509/verify.go?s=8414:8499
- 2: https://pkg.go.dev/crypto/x509@go1.19.4
- 3: https://pkg.go.dev/crypto/x509
- 4: https://go.dev/src/crypto/x509/verify.go
Remove ExtKeyUsageAny from KeyUsages — it disables EKU validation entirely.
Including ExtKeyUsageAny in VerifyOptions.KeyUsages causes Go's Certificate.Verify to skip EKU checking altogether. The presence of ExtKeyUsageAny in the list sets an internal flag that bypasses the checkChainForKeyUsage check, accepting leaves with any EKU—not just those with the anyExtendedKeyUsage OID or ClientAuth. This means leaves signed for CodeSigning, EmailProtection, or any other purpose will pass verification. Use KeyUsages: []x509.ExtKeyUsage{x509.ExtKeyUsageClientAuth} to enforce strict device-client-auth validation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@management/server/integrations/entra_device/cert_validator.go` around lines
95 - 105, The VerifyOptions currently sets KeyUsages to include
x509.ExtKeyUsageAny which disables EKU checks; update the opts construction in
cert_validator.go (the VerifyOptions passed to leaf.Verify) to remove
x509.ExtKeyUsageAny and use only x509.ExtKeyUsageClientAuth (i.e., KeyUsages:
[]x509.ExtKeyUsage{x509.ExtKeyUsageClientAuth}) so certificate EKU is enforced
when calling leaf.Verify(opts); adjust any related tests or callers expecting
lax EKU behavior.
…spots) SonarCloud's rescan on the previous review-round commit flagged 10 new code smells + 2 medium security hotspots on the Dockerfile. Refactored to bring each touched function back under the cognitive-complexity and length thresholds (all in scope for this PR): * management/server/integrations/entra_device/cert_validator.go: split Validate into decodeCertChain / checkTimeWindow / verifyChain / verifyProofOfPossession helpers; extract the 16-line ECDSA case in verifySignature into verifyECDSA (+ mirror RSA into verifyRSA). * management/server/entra_device_enroll.go: split EnrollEntraDevicePeer into validateEnrollInput, buildEntraPeer, geoEnrichPeer, allocateEntraPeer (with pickDNSLabel + persistEntraPeerTx inner helpers), and emitEntraPeerAddedEvent. Same external behaviour; the orchestrator is now short enough to read top-to-bottom. * client/cmd/entra_enroll.go: extracted the cobra RunE body into runEntraEnroll + seven single- responsibility helpers (preflightEntraEnroll, loadOrCreateProfileConfig, maybeSkipAlreadyEnrolled, derivedWGPubKey, performEntraEnrolment, persistEnrolmentState, printEnrolmentSuccess). * tools/entra-test/enroll-tester/main.go: split the 121-line main into parseFlags, buildHTTPClient, mustMakeCert, mustMakeWGPubKey, fetchChallenge, signChallenge, postEnroll, printEnrolmentSuccess. Introduced a small testerOpts struct so helpers don't take a forest of parameters. * client/internal/profilemanager/config.go / config_entra.go: moved EntraEnrollState (the ~20-line struct + godoc) out to config_entra.go so config.go drops under the 750-line threshold. No API change for callers. Dockerfile hotspots (test harness only): * management/Dockerfile.entra-test: add a system netbird:netbird user and a USER directive so the runtime stage doesn't run as root. Annotate the source-tree COPY with a // NOSONAR explaining that this Dockerfile is a dev-only harness (no secrets expected in-tree, not used by goreleaser / production image path). Intentionally NOT addressed (pre-existing, not introduced by this PR, documented here for the record): * management/server/http/handler.go NewAPIHandler cognitive complexity + 23 parameters. Function existed before this PR; the only edit in this branch adds one call (installEntraDeviceAuth) that does not affect the signature. All affected unit tests still pass (entra_device, entra_join, client/entradevice). Production go build is clean under CGO_ENABLED=0 across management + client + tools.
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (1)
management/server/entra_device_enroll.go (1)
63-70:⚠️ Potential issue | 🟠 MajorOrphan peer leak when
GetValidatedPeerWithMap(or any post-persist step) fails.By the time you reach lines 63–70 the peer is already in the DB (line 165–167), the activity event has been emitted (line 219), and the caller has no way to know the peer was created —
EnrollEntraDevicePeerreturns(nil, err), so theentra_device.Managercan't invokeDeletePeerfor compensation (it only does so on bootstrap-token failure, and only with aPeerIDfrom a successfulEnrollPeerResult). A subsequent retry from the device will hit the "peer has already been registered" guard invalidateEnrollInput(line 89–91) and fail permanently until an operator intervenes.Since the comment at lines 297–302 already asserts that "the client will resync these on first Sync", the simplest fix is to make
GetValidatedPeerWithMapbest-effort — log and return the result with an emptyPeerConfig/Checks. Alternatively, defer a compensation call on named error.🛡️ Proposed fix — make network-map lookup best-effort so enrolment doesn't leak an orphan peer
- _, netMap, checks, _, err := am.networkMapController.GetValidatedPeerWithMap(ctx, false, input.AccountID, newPeer) - if err != nil { - return nil, fmt.Errorf("failed to build network map for entra peer: %w", err) - } - - return &ed.EnrollPeerResult{ - PeerID: newPeer.ID, - NetbirdConfig: netbirdConfigToMap(am, settings), - PeerConfig: peerConfigToMap(newPeer, netMap), - Checks: checksToMaps(checks), - }, nil + var netMap *types.NetworkMap + var checks any + if _, nm, cks, _, err := am.networkMapController.GetValidatedPeerWithMap(ctx, false, input.AccountID, newPeer); err != nil { + // Peer is already persisted; failing here would leak an orphan with no compensation. + // The client will resync on first gRPC Sync, so fall through with a minimal response. + log.WithContext(ctx).Warnf("failed to build network map for entra peer %s (client will resync): %v", newPeer.ID, err) + } else { + netMap, checks = nm, cks + } + + return &ed.EnrollPeerResult{ + PeerID: newPeer.ID, + NetbirdConfig: netbirdConfigToMap(am, settings), + PeerConfig: peerConfigToMap(newPeer, netMap), + Checks: checksToMaps(checks), + }, nil🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/entra_device_enroll.go` around lines 63 - 70, The call to GetValidatedPeerWithMap inside EnrollEntraDevicePeer can fail after the peer was already persisted/announced, leaking an orphan peer; change the post-persist behavior to be best-effort: instead of returning an error when GetValidatedPeerWithMap(..., newPeer) fails, log the error (include context/newPeer.ID) and continue by using an empty PeerConfig/empty checks and nil/empty netMap so the successful peer creation is still returned; update the error branch that currently does "return nil, fmt.Errorf(...)" to log and set defaults so EnrollEntraDevicePeer returns the created peer result rather than an error.
🧹 Nitpick comments (4)
client/cmd/entra_enroll.go (3)
297-313: Usepath/filepath.Dirinstead of a hand-rolledfilepathDir.The helper duplicates stdlib behavior and silently drops the portability guarantees
filepath.Dirprovides (trailing-separator handling, cleaning, platform-aware separator). It also triggers the stdlib-reuse lint rule. Swap to the standard library and delete the helper.♻️ Suggested refactor
import ( "context" "fmt" "net/url" "os" + "path/filepath" "strings" "time"- if err := os.MkdirAll(filepathDir(configPath), 0o755); err != nil { + if err := os.MkdirAll(filepath.Dir(configPath), 0o755); err != nil { return nil, fmt.Errorf("mkdir %s: %w", configPath, err) }-func filepathDir(p string) string { - for i := len(p) - 1; i >= 0; i-- { - if p[i] == '\\' || p[i] == '/' { - return p[:i] - } - } - return "." -}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/cmd/entra_enroll.go` around lines 297 - 313, Replace the custom filepathDir helper with the standard library path/filepath.Dir: call filepath.Dir(configPath) where filepathDir(configPath) is used (in the os.MkdirAll call) and remove the filepathDir function entirely; ensure you import "path/filepath" (or add it to existing imports) so the platform-aware, cleaned directory behavior replaces the hand-rolled logic.
196-204: Prefererrors.Asover a direct type assertion for structured server errors.
err.(*entradevice.Error)only succeeds whenEnrolreturns the sentinel unwrapped. Ifentradevice.Enroller.Enrolever wraps its structured error withfmt.Errorf("...: %w", err)(a common refactor), this branch silently falls through to the generic message and the stableCodestops being surfaced.errors.Asis the idiomatic guard that survives wrapping.♻️ Suggested refactor
- state, err := en.Enrol(ctx) - if err != nil { - if structured, ok := err.(*entradevice.Error); ok { - cmd.PrintErrf("Enrolment rejected: %s (HTTP %d)\n %s\n", - structured.Code, structured.HTTPStatus, structured.Message) - return nil, fmt.Errorf("enrolment failed: %s", structured.Code) - } - return nil, fmt.Errorf("enrolment failed: %w", err) - } + state, err := en.Enrol(ctx) + if err != nil { + var structured *entradevice.Error + if errors.As(err, &structured) { + cmd.PrintErrf("Enrolment rejected: %s (HTTP %d)\n %s\n", + structured.Code, structured.HTTPStatus, structured.Message) + return nil, fmt.Errorf("enrolment failed: %s", structured.Code) + } + return nil, fmt.Errorf("enrolment failed: %w", err) + }Don't forget to add
"errors"to the import block.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/cmd/entra_enroll.go` around lines 196 - 204, Replace the direct type assertion err.(*entradevice.Error) with errors.As to reliably detect a structured entradevice.Error even if it’s wrapped; in the Enrol error handling after calling en.Enrol(ctx) use errors.As(&structured) to extract the structured error, then call cmd.PrintErrf with structured.Code, structured.HTTPStatus and structured.Message and return fmt.Errorf("enrolment failed: %s", structured.Code) as before; ensure the "errors" package is added to the import block so errors.As is available.
279-279: Nit: parametermanagementURLshadows the package-level CLI flag variable of the same name.Both
managementURL(root flag) and this parameter refer to the same value here, so behavior is unaffected, but the shadow makes the function's provenance ambiguous for later readers/refactors. Consider renaming the parameter (e.g.,mgmtURL) for clarity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/cmd/entra_enroll.go` at line 279, The parameter name managementURL in function directLoadOrCreateProfileConfig shadows the package-level CLI flag of the same name; rename the parameter to a non-shadowing name (e.g., mgmtURL) in directLoadOrCreateProfileConfig and update all callers to pass the same argument name, keeping behavior unchanged but removing the ambiguous shadowing; ensure references inside the function use the new parameter name and run tests/build to confirm no symbol breakage.management/server/entra_device_enroll.go (1)
323-328:checksToMapsalways returns nil — consider simplifying the call site.The function is a documented stub; the
checks anyparameter is deliberately discarded. Since it's guaranteed to returnniltoday, you could drop thecheckscapture inEnrollEntraDevicePeer(line 67) and setChecks: nildirectly until posture checks actually get surfaced here. This removes a small amount of dead plumbing without changing behavior.Non-blocking; safe to defer.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@management/server/entra_device_enroll.go` around lines 323 - 328, The helper checksToMaps always returns nil and discards its input; update the call site in EnrollEntraDevicePeer to stop passing an unused value and set the Checks field explicitly to nil instead of calling checksToMaps, removing the dead plumbing (leave checksToMaps as a stub if you want to keep it for future use). Locate the usage of checksToMaps in EnrollEntraDevicePeer and replace that expression with nil for the Checks field, and remove any now-unused local variable that was only used for that call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@client/cmd/entra_enroll.go`:
- Around line 156-163: The code prints "Pass --force to re-enrol." even when the
user provided --force; move that hint inside the non-force branch. In the block
that checks an existing enrollment (using profileName and cfg.EntraEnroll.*),
only call cmd.Println("Pass --force to re-enrol.") when entraForce is false (if
!entraForce { cmd.Println(...); return true, nil }), otherwise skip the hint and
proceed to the existing return (return false, nil).
- Around line 114-116: The error message in preflightEntraEnroll incorrectly
claims managementURL must end with "/join/entra" but no suffix check exists;
update preflightEntraEnroll to validate that managementURL is non-empty and
strings.HasSuffix(managementURL, "/join/entra") (using the managementURL
variable and importing strings if needed) and return
fmt.Errorf("--management-url is required and must end with /join/entra") when
the suffix check fails; alternatively, if you prefer not to enforce the suffix,
remove the suffix text from the error message so it only requires non-empty
managementURL.
In `@management/Dockerfile.entra-test`:
- Around line 66-67: The Dockerfile currently hardcodes ENTRYPOINT
["/usr/local/bin/netbird-mgmt", "management"] while the compose override in
tools/entra-test/docker-compose.yml also supplies "management", causing the
subcommand to be duplicated; to fix, choose one contract—preferred: remove the
"management" token from the Dockerfile ENTRYPOINT so ENTRYPOINT becomes just
["/usr/local/bin/netbird-mgmt"] and let CMD/compose (the command: list) supply
the full argv, or alternatively remove the leading "management" from the compose
command if you prefer ENTRYPOINT to pin the subcommand; update the Dockerfile
ENTRYPOINT or the compose command accordingly and verify startup args passed to
the mgmt binary are not duplicated.
- Line 32: The Dockerfile has an inline comment on the COPY instruction and a
duplicated positional argument in ENTRYPOINT: remove the inline "# NOSONAR" from
the COPY . . line and place it on its own preceding comment line so the COPY has
only its two arguments; then resolve the duplicated "management" argument by
either removing the positional "management" from the Dockerfile ENTRYPOINT (so
compose/command supplies it) or removing it from the compose command (so
ENTRYPOINT provides it) — target the COPY line and the ENTRYPOINT declaration to
implement one consistent source of the positional argument.
In `@management/server/entra_device_enroll.go`:
- Around line 250-255: The compensation hook currently swallows every error from
tx.GetPeerByID in ExecuteInTransaction; change it to only ignore the specific
"not found" case and return all other errors so real DB failures surface. Locate
the call to tx.GetPeerByID (inside ExecuteInTransaction in
entra_device_enroll.go) and replace the unconditional return nil with a
conditional that checks the NotFound sentinel/utility used elsewhere in the repo
(the same pattern as management/internals/modules/peers/manager.go:113 — e.g.,
store.IsNotFound(err) or errors.Is(err, store.ErrNotFound)); if the error is
NotFound, return nil, otherwise return the original err (and remove the nolint
comment).
---
Duplicate comments:
In `@management/server/entra_device_enroll.go`:
- Around line 63-70: The call to GetValidatedPeerWithMap inside
EnrollEntraDevicePeer can fail after the peer was already persisted/announced,
leaking an orphan peer; change the post-persist behavior to be best-effort:
instead of returning an error when GetValidatedPeerWithMap(..., newPeer) fails,
log the error (include context/newPeer.ID) and continue by using an empty
PeerConfig/empty checks and nil/empty netMap so the successful peer creation is
still returned; update the error branch that currently does "return nil,
fmt.Errorf(...)" to log and set defaults so EnrollEntraDevicePeer returns the
created peer result rather than an error.
---
Nitpick comments:
In `@client/cmd/entra_enroll.go`:
- Around line 297-313: Replace the custom filepathDir helper with the standard
library path/filepath.Dir: call filepath.Dir(configPath) where
filepathDir(configPath) is used (in the os.MkdirAll call) and remove the
filepathDir function entirely; ensure you import "path/filepath" (or add it to
existing imports) so the platform-aware, cleaned directory behavior replaces the
hand-rolled logic.
- Around line 196-204: Replace the direct type assertion
err.(*entradevice.Error) with errors.As to reliably detect a structured
entradevice.Error even if it’s wrapped; in the Enrol error handling after
calling en.Enrol(ctx) use errors.As(&structured) to extract the structured
error, then call cmd.PrintErrf with structured.Code, structured.HTTPStatus and
structured.Message and return fmt.Errorf("enrolment failed: %s",
structured.Code) as before; ensure the "errors" package is added to the import
block so errors.As is available.
- Line 279: The parameter name managementURL in function
directLoadOrCreateProfileConfig shadows the package-level CLI flag of the same
name; rename the parameter to a non-shadowing name (e.g., mgmtURL) in
directLoadOrCreateProfileConfig and update all callers to pass the same argument
name, keeping behavior unchanged but removing the ambiguous shadowing; ensure
references inside the function use the new parameter name and run tests/build to
confirm no symbol breakage.
In `@management/server/entra_device_enroll.go`:
- Around line 323-328: The helper checksToMaps always returns nil and discards
its input; update the call site in EnrollEntraDevicePeer to stop passing an
unused value and set the Checks field explicitly to nil instead of calling
checksToMaps, removing the dead plumbing (leave checksToMaps as a stub if you
want to keep it for future use). Locate the usage of checksToMaps in
EnrollEntraDevicePeer and replace that expression with nil for the Checks field,
and remove any now-unused local variable that was only used for that call.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cfdc177d-e899-4584-8451-647219381fba
📒 Files selected for processing (7)
client/cmd/entra_enroll.goclient/internal/profilemanager/config.goclient/internal/profilemanager/config_entra.gomanagement/Dockerfile.entra-testmanagement/server/entra_device_enroll.gomanagement/server/integrations/entra_device/cert_validator.gotools/entra-test/enroll-tester/main.go
✅ Files skipped from review due to trivial changes (3)
- client/internal/profilemanager/config_entra.go
- tools/entra-test/enroll-tester/main.go
- management/server/integrations/entra_device/cert_validator.go
| if managementURL == "" { | ||
| return "", fmt.Errorf("--management-url is required (and must end with /join/entra)") | ||
| } |
There was a problem hiding this comment.
Message claims a suffix check that isn't performed.
The error says --management-url ... must end with /join/entra, but preflightEntraEnroll never validates the suffix. A user passing --management-url https://mgmt.example.com will bypass this check entirely and only hit a later opaque HTTP error. Either actually enforce the suffix or soften the message.
🛠️ Suggested fix
if managementURL == "" {
return "", fmt.Errorf("--management-url is required (and must end with /join/entra)")
}
+ if !strings.HasSuffix(strings.TrimRight(managementURL, "/"), entradevice.EnrolmentPathSuffix) {
+ return "", fmt.Errorf("--management-url %q must end with %s", managementURL, entradevice.EnrolmentPathSuffix)
+ }
return resolvePFXPassword()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if managementURL == "" { | |
| return "", fmt.Errorf("--management-url is required (and must end with /join/entra)") | |
| } | |
| if managementURL == "" { | |
| return "", fmt.Errorf("--management-url is required (and must end with /join/entra)") | |
| } | |
| if !strings.HasSuffix(strings.TrimRight(managementURL, "/"), entradevice.EnrolmentPathSuffix) { | |
| return "", fmt.Errorf("--management-url %q must end with %s", managementURL, entradevice.EnrolmentPathSuffix) | |
| } | |
| return resolvePFXPassword() |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@client/cmd/entra_enroll.go` around lines 114 - 116, The error message in
preflightEntraEnroll incorrectly claims managementURL must end with
"/join/entra" but no suffix check exists; update preflightEntraEnroll to
validate that managementURL is non-empty and strings.HasSuffix(managementURL,
"/join/entra") (using the managementURL variable and importing strings if
needed) and return fmt.Errorf("--management-url is required and must end with
/join/entra") when the suffix check fails; alternatively, if you prefer not to
enforce the suffix, remove the suffix text from the error message so it only
requires non-empty managementURL.
| cmd.Printf("Profile %q is already Entra-enrolled (peer %s, enrolled %s).\n", | ||
| profileName, cfg.EntraEnroll.PeerID, | ||
| cfg.EntraEnroll.EnrolledAt.Format(time.RFC3339)) | ||
| cmd.Println("Pass --force to re-enrol.") | ||
| if !entraForce { | ||
| return true, nil | ||
| } | ||
| return false, nil |
There was a problem hiding this comment.
"Pass --force to re-enrol." is printed even when --force is supplied.
When the user already passed --force, the hint is misleading — we tell them to pass a flag they just passed, then proceed anyway. Gate the hint behind the non-force branch.
🛠️ Suggested fix
cmd.Printf("Profile %q is already Entra-enrolled (peer %s, enrolled %s).\n",
profileName, cfg.EntraEnroll.PeerID,
cfg.EntraEnroll.EnrolledAt.Format(time.RFC3339))
- cmd.Println("Pass --force to re-enrol.")
if !entraForce {
+ cmd.Println("Pass --force to re-enrol.")
return true, nil
}
+ cmd.Println("--force supplied; re-enrolling.")
return false, nil📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cmd.Printf("Profile %q is already Entra-enrolled (peer %s, enrolled %s).\n", | |
| profileName, cfg.EntraEnroll.PeerID, | |
| cfg.EntraEnroll.EnrolledAt.Format(time.RFC3339)) | |
| cmd.Println("Pass --force to re-enrol.") | |
| if !entraForce { | |
| return true, nil | |
| } | |
| return false, nil | |
| cmd.Printf("Profile %q is already Entra-enrolled (peer %s, enrolled %s).\n", | |
| profileName, cfg.EntraEnroll.PeerID, | |
| cfg.EntraEnroll.EnrolledAt.Format(time.RFC3339)) | |
| if !entraForce { | |
| cmd.Println("Pass --force to re-enrol.") | |
| return true, nil | |
| } | |
| cmd.Println("--force supplied; re-enrolling.") | |
| return false, nil |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@client/cmd/entra_enroll.go` around lines 156 - 163, The code prints "Pass
--force to re-enrol." even when the user provided --force; move that hint inside
the non-force branch. In the block that checks an existing enrollment (using
profileName and cfg.EntraEnroll.*), only call cmd.Println("Pass --force to
re-enrol.") when entraForce is false (if !entraForce { cmd.Println(...); return
true, nil }), otherwise skip the hint and proceed to the existing return (return
false, nil).
| # image; no secrets are expected to live in the source tree and there is no | ||
| # production layer that ships this COPY — see management/Dockerfile for the | ||
| # release build path used by goreleaser. | ||
| COPY . . # NOSONAR |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does the Docker BuildKit dockerfile:1.7 frontend support inline # comments on instruction lines, or only full-line comments?
💡 Result:
No, the Docker BuildKit dockerfile:1.7 frontend supports only full-line comments (lines starting with # after whitespace), not inline comments on instruction lines. Inline # markers are treated as part of the instruction arguments, causing parse errors or incorrect behavior, such as in JSON array instructions like COPY or RUN where they invalidate JSON or become invalid patterns. This aligns with the longstanding Dockerfile syntax specification: "# must be the first non-whitespace character on the line to be a comment." Older Docker versions sometimes tolerated inline comments due to bugs (e.g., ignoring trailing bytes after JSON), but recent BuildKit fixes enforce strict parsing, breaking previously working invalid Dockerfiles. The dockerfile:1.7 release notes and parser source confirm no changes to comment handling; new features focus on variable expansions and context handling, not comments.
Citations:
- 1: Older Docker accepted inline comments in some places in spite of documentation otherwise, leading to broken builds now that it's fixed, and opaque error messaging interferes with troubleshooting and repair. moby/buildkit#5625
- 2: https://thelinuxcode.com/how-to-comment-in-a-dockerfile-syntax-rules-gotchas-and-practical-patterns/
- 3: https://github.com/moby/buildkit/releases/tag/dockerfile%2F1.7.0
- 4: Parser in docker 1.13 breaks backwards compatibility for comments in middle of RUN statement moby/moby#29005
- 5: Comments in Dockerfile doesn't quite work. moby/moby#2209
- 6: inline comments end up in ENV in Dockerfile moby/moby#9533
- 7: https://github.com/moby/buildkit/blob/master/frontend/dockerfile/parser/parser.go
🏁 Script executed:
# Locate the Dockerfile
find . -name "Dockerfile.entra-test" -type fRepository: netbirdio/netbird
Length of output: 96
🏁 Script executed:
# Read the problematic file
if [ -f management/Dockerfile.entra-test ]; then
echo "=== File found, examining key lines ===";
echo "--- Line 32 context (lines 28-36) ---";
sed -n '28,36p' management/Dockerfile.entra-test | cat -n;
echo "--- Line 66-67 context (lines 62-70) ---";
sed -n '62,70p' management/Dockerfile.entra-test | cat -n;
else
echo "File not found";
fiRepository: netbirdio/netbird
Length of output: 972
🏁 Script executed:
# Check if there's a docker-compose file that references this Dockerfile
find . -name "docker-compose*" -type f | head -10Repository: netbirdio/netbird
Length of output: 252
🏁 Script executed:
# Check the docker-compose that might reference the Dockerfile.entra-test
cat ./tools/entra-test/docker-compose.ymlRepository: netbirdio/netbird
Length of output: 2588
Inline # NOSONAR on the COPY line violates Dockerfile syntax — remove it.
Docker only recognizes # as a comment when it appears at the start of a line (after optional whitespace). Inline comments are not supported; COPY . . # NOSONAR is parsed as a 4-argument command with sources ., ., #, and destination NOSONAR, which is invalid. Move the suppression to its own comment line above the instruction.
Additionally, the ENTRYPOINT already includes "management" as a positional argument (line 66). When using the docker-compose override (tools/entra-test/docker-compose.yml), the command also starts with "management", causing the process to receive management management --log-file .... Either the ENTRYPOINT should omit the positional argument and rely on CMD/compose override, or the compose command should not repeat it.
🛠️ Suggested fix for line 32
# Copy the entire repo into the build stage. This is a dev-only harness
# image; no secrets are expected to live in the source tree and there is no
# production layer that ships this COPY — see management/Dockerfile for the
# release build path used by goreleaser.
+# NOSONAR
-COPY . . # NOSONAR
+COPY . .🧰 Tools
🪛 Hadolint (2.14.0)
[error] 32-32: COPY with more than 2 arguments requires the last argument to end with /
(DL3021)
🪛 Trivy (0.69.3)
[error] 32-32: COPY with more than two arguments not ending with slash
Slash is expected at the end of COPY command argument 'NOSONAR'
Rule: DS-0011
(IaC/Dockerfile)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@management/Dockerfile.entra-test` at line 32, The Dockerfile has an inline
comment on the COPY instruction and a duplicated positional argument in
ENTRYPOINT: remove the inline "# NOSONAR" from the COPY . . line and place it on
its own preceding comment line so the COPY has only its two arguments; then
resolve the duplicated "management" argument by either removing the positional
"management" from the Dockerfile ENTRYPOINT (so compose/command supplies it) or
removing it from the compose command (so ENTRYPOINT provides it) — target the
COPY line and the ENTRYPOINT declaration to implement one consistent source of
the positional argument.
| ENTRYPOINT ["/usr/local/bin/netbird-mgmt", "management"] | ||
| CMD ["--log-file", "console", "--log-level", "info"] |
There was a problem hiding this comment.
ENTRYPOINT + compose command double-specifies the management subcommand.
ENTRYPOINT already appends management (line 66), and tools/entra-test/docker-compose.yml overrides CMD with a list that also starts with "management". The process therefore gets invoked as:
/usr/local/bin/netbird-mgmt management management --log-file console --log-level debug --port 33073 --config ...
Cobra today may silently accept the extra positional arg (which is likely why e2e "works"), but this is fragile — the moment mgmtCmd gets an Args: cobra.NoArgs/ExactArgs validator, or someone copies the compose command expecting it to be self-contained, things break.
Pick one side of the contract:
- Option A (preferred — keep compose self-describing): drop
managementfrom theENTRYPOINTsoCMD/composecommandowns the full argv.-ENTRYPOINT ["/usr/local/bin/netbird-mgmt", "management"] -CMD ["--log-file", "console", "--log-level", "info"] +ENTRYPOINT ["/usr/local/bin/netbird-mgmt"] +CMD ["management", "--log-file", "console", "--log-level", "info"]
- Option B: keep the
ENTRYPOINTas-is and remove the leading"management"fromtools/entra-test/docker-compose.yml'scommand:list.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ENTRYPOINT ["/usr/local/bin/netbird-mgmt", "management"] | |
| CMD ["--log-file", "console", "--log-level", "info"] | |
| ENTRYPOINT ["/usr/local/bin/netbird-mgmt"] | |
| CMD ["management", "--log-file", "console", "--log-level", "info"] |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@management/Dockerfile.entra-test` around lines 66 - 67, The Dockerfile
currently hardcodes ENTRYPOINT ["/usr/local/bin/netbird-mgmt", "management"]
while the compose override in tools/entra-test/docker-compose.yml also supplies
"management", causing the subcommand to be duplicated; to fix, choose one
contract—preferred: remove the "management" token from the Dockerfile ENTRYPOINT
so ENTRYPOINT becomes just ["/usr/local/bin/netbird-mgmt"] and let CMD/compose
(the command: list) supply the full argv, or alternatively remove the leading
"management" from the compose command if you prefer ENTRYPOINT to pin the
subcommand; update the Dockerfile ENTRYPOINT or the compose command accordingly
and verify startup args passed to the mgmt binary are not duplicated.
| return e.am.Store.ExecuteInTransaction(ctx, func(tx store.Store) error { | ||
| peer, err := tx.GetPeerByID(ctx, store.LockingStrengthNone, accountID, peerID) | ||
| if err != nil { | ||
| // Peer already gone is not an error; we're compensating. | ||
| return nil //nolint:nilerr // quiet on "already gone" | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Inspect how other Store methods classify not-found errors for GetPeerByID specifically,
# to pick the right predicate (status.NotFound type, errors.Is(gorm.ErrRecordNotFound), etc.)
rg -nP --type=go -C3 'GetPeerByID\s*\(' -g '!**/*_test.go' | head -80
echo "---"
# How other compensation / cleanup sites in the repo distinguish "already gone":
rg -nP --type=go -C2 'status\.NotFound|ErrRecordNotFound' -g '!**/*_test.go' | head -40Repository: netbirdio/netbird
Length of output: 8755
🏁 Script executed:
#!/bin/bash
# Get the full GetPeerByID implementation in sql_store.go
sed -n '3263,3280p' management/server/store/sql_store.go
echo "---"
# Get the full DeletePeer function in entra_device_enroll.go
sed -n '242,261p' management/server/entra_device_enroll.goRepository: netbirdio/netbird
Length of output: 1552
🏁 Script executed:
#!/bin/bash
# Search for other compensation/cleanup patterns that check for NotFound errors
rg -nP --type=go 'ExecuteInTransaction.*error' -A5 | grep -A5 'GetPeerByID\|GetAccountByID\|GetDomain' | head -60
echo "---"
# See what error predicates are used with GetPeerByID specifically
rg -nP --type=go 'GetPeerByID.*if err' -A2 | head -40Repository: netbirdio/netbird
Length of output: 1967
🏁 Script executed:
#!/bin/bash
# Verify the status package import and NotFound type usage in entra_device_enroll.go
head -30 management/server/entra_device_enroll.go | grep -E '^import|status'
echo "---"
# Double-check that status.NotFound is the right constant
rg -nP --type=go 'status\.NotFound' -B2 -A2 | head -20Repository: netbirdio/netbird
Length of output: 1536
🏁 Script executed:
#!/bin/bash
# Verify fmt package is imported in entra_device_enroll.go (needed for the fix)
grep -E '^import|"fmt"' management/server/entra_device_enroll.go | head -20Repository: netbirdio/netbird
Length of output: 77
Narrow error suppression to NotFound only; propagate real DB failures.
The compensation hook swallows all GetPeerByID errors, including transient DB failures, timeouts, and context cancellation. This masks real failures and leaves orphaned peers in the database. The comment correctly identifies the intent ("peer already gone"), but the implementation doesn't verify it.
Use the pattern already established elsewhere in the codebase (e.g., management/internals/modules/peers/manager.go:113): check if the error is specifically a NotFound type, and propagate anything else.
🛡️ Proposed fix
peer, err := tx.GetPeerByID(ctx, store.LockingStrengthNone, accountID, peerID)
if err != nil {
- // Peer already gone is not an error; we're compensating.
- return nil //nolint:nilerr // quiet on "already gone"
+ if e, ok := status.FromError(err); ok && e.Type() == status.NotFound {
+ return nil // peer already gone — compensation is a no-op
+ }
+ return fmt.Errorf("lookup peer for entra compensation: %w", err)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return e.am.Store.ExecuteInTransaction(ctx, func(tx store.Store) error { | |
| peer, err := tx.GetPeerByID(ctx, store.LockingStrengthNone, accountID, peerID) | |
| if err != nil { | |
| // Peer already gone is not an error; we're compensating. | |
| return nil //nolint:nilerr // quiet on "already gone" | |
| } | |
| return e.am.Store.ExecuteInTransaction(ctx, func(tx store.Store) error { | |
| peer, err := tx.GetPeerByID(ctx, store.LockingStrengthNone, accountID, peerID) | |
| if err != nil { | |
| if e, ok := status.FromError(err); ok && e.Type() == status.NotFound { | |
| return nil // peer already gone — compensation is a no-op | |
| } | |
| return fmt.Errorf("lookup peer for entra compensation: %w", err) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@management/server/entra_device_enroll.go` around lines 250 - 255, The
compensation hook currently swallows every error from tx.GetPeerByID in
ExecuteInTransaction; change it to only ignore the specific "not found" case and
return all other errors so real DB failures surface. Locate the call to
tx.GetPeerByID (inside ExecuteInTransaction in entra_device_enroll.go) and
replace the unconditional return nil with a conditional that checks the NotFound
sentinel/utility used elsewhere in the repo (the same pattern as
management/internals/modules/peers/manager.go:113 — e.g., store.IsNotFound(err)
or errors.Is(err, store.ErrNotFound)); if the error is NotFound, return nil,
otherwise return the original err (and remove the nolint comment).
Round out docs/ENTRA_DEVICE_AUTH.md with the operator-facing content
that was missing, refresh stale status text, and drop a lingering
reference to an internal design artifact that should never have
survived scrubbing.
* Refreshed TL;DR header to reflect final state: server + client
(PFX provider) complete, live-tested against a real Entra tenant,
with explicit cross-links to Known production gaps and the new
Production readiness checklist.
* Added a "Deploying device certificates via Intune (PKCS Certificate
profile)" section so the mechanism for getting the cert onto the
device is documented end-to-end (subject name CN={{AAD_Device_ID}},
TPM-preferred KSP, key usage, extended key usage, CA trust path to
plumb into CertValidator.TrustRoots, assignment).
* Added a "Troubleshooting" section with a per-error-code matrix
(most likely cause + where to look) + client-side and server-side
diagnostic tips (--log-level debug, enroll-tester, activity events).
* Added a "Production readiness checklist" so operators can sign off
on Entra app perms, client_secret handling, Intune PKCS profile,
TrustRoots wiring, mapping_resolution choice, trusted-proxy policy,
HA single-node caveat, activity-log sink, and alerting.
* Expanded "Security notes" to list the concrete mitigations the
code implements today (OData escape, fail-closed pagination, empty
CN rejection, opt-in XFF, 512 KiB body cap with 413, bootstrap-token
validate-before-delete, atomic rejection paths).
* Refreshed "Current implementation status" table: Graph client is no
longer "not yet run against a live tenant"; Client PFX provider +
CLI row added; three explicit follow-up rows for the known gaps.
* Dropped the stale "See the Open design decisions in the plan
document" sentence (superseded by the Known production gaps
section) and the "see the original design doc artifact
(create_plan)" reference in Further reading.
No code changes. All affected unit tests still green; build clean.
Drives a real httptest server with both the admin CRUD handler and the device-facing /join/entra handler wired onto the same router, then walks through: 1. Admin PUTs the integration config (tenant/client/secret, mapping_resolution, revalidation_interval). 2. Admin POSTs a mapping (Entra group -> NetBird auto-groups). 3. Device GETs /join/entra/challenge to obtain a one-shot nonce. 4. Device signs the nonce with a self-signed RSA leaf cert (CN = device GUID, RSA-PSS / SHA-256 to match the production CertValidator) and POSTs /join/entra/enroll. 5. Server resolves the mapping, the fake PeerEnroller records the call, the device receives a peer config + bootstrap token. 6. Replaying the same nonce produces a 4xx (single-use semantics). 7. Admin reads back integration (secret masked as ********), lists mappings, updates and deletes the mapping (404 on subsequent GET). A second test confirms that flipping Enabled=false on the integration causes /join/entra/enroll to fail with CodeIntegrationDisabled while leaving the row in place. Microsoft Graph and the AccountManager are stubbed (external dependencies that can't be exercised without a live tenant); every other layer runs the production code path.
Ready for final review 🙏This PR has now been through three substantive rounds:
Microsoft Graph and Test resultsCompanion dashboard PR
PR is mergeable; |
Two issues blocked the gate on PR netbirdio#5977; both are fixed by this commit. 1. management/server/http/handler.go: NewAPIHandler had grown to 23 parameters (go:S107) and had cognitive complexity 24 (go:S3776). Refactored to take a single APIHandlerOptions struct; extracted addBypassPaths(), buildRateLimiterFromEnv(), parseIntEnv(), and registerCoreEndpoints() so the function body is now mostly straight-line wiring. Updated the two existing call sites (boot.go + the channel test harness) to pass APIHandlerOptions. No behavioural change. 2. management/Dockerfile.entra-test: replaced the recursive 'COPY . .' (docker:S6470 hotspot) with explicit per-directory COPYs for each top-level Go source dir that 'go build ./management' actually needs. The previous NOSONAR comment did not suppress the hotspot because Sonar reads each '.' token in 'COPY . .' as a separate finding.
|



Entra / Intune Device Authentication — a third peer-registration method
What this PR adds
A new way for devices to join a NetBird network, alongside setup keys and SSO: the device itself authenticates using an Entra-issued certificate, with zero user interaction. Admins map Entra security groups to NetBird auto-groups; the device ends up in the right NetBird groups based on its Entra membership.
The feature is served on a dedicated endpoint path (
/join/entra) so it never mixes with the existing gRPCLogin/Syncsurface or the admin/apisurface.Deployment story
Design in one page
EntraDeviceAuthintegration (tenant id, app credentials, compliance requirements) andEntraDeviceAuthMappingrows that pair Entra security group IDs with NetBird auto-groups.GET /join/entra/challenge→ server issues a single-use 32-byte nonce (60s TTL)./join/entra/enrollwith cert chain + signed nonce + WG pubkey.accountEnabled == true; optionallycomplianceState == compliantvia Intune.strict_priorityorunionmode, with wildcard + tenant-only fallback options).PeerAddedWithEntraDeviceactivity event with full audit meta (device id, matched mapping ids, resolution mode, applied auto-groups).LoginResponse-shaped JSON + a one-shot bootstrap token. The client persists state and proceeds with normal gRPC Login using the WG pubkey the server already knows about.What's landed
Server
management/server/types/entra_device_auth.go— domain model (EntraDeviceAuth,EntraDeviceAuthMapping,MappingResolution).management/server/integrations/entra_device/— cert validator (RSA/ECDSA), Graph client (OAuth2 client-credentials, device lookup, transitiveMemberOf, compliance), single-use nonce store,strict_priority+unionmapping resolver, gorm-backed Store, manager orchestrator, activity codes.management/server/http/handlers/entra_join/— public/join/entra/{challenge,enroll}routes.management/server/http/handlers/entra_device_auth/— admin CRUD on/api/integrations/entra-device-auth{,/mappings}.management/server/entra_device_enroll.go—DefaultAccountManager.EnrollEntraDevicePeer(reuses IP allocation, retries, group assignment, integrated peer validator, activity events).management/server/permissions/modules/module.go— newEntraDeviceAuthpermission module.idp/dex/{config,provider}.go+sqlite_{cgo,nocgo}.go— bonus fix: split Dex SQLite3 construction behind a CGO build tag so builds work underCGO_ENABLED=0(was a pre-existing upstream break).Client
client/internal/enroll/entradevice/— pluggableCertProviderinterface + PFX-backed implementation + challenge/enroll orchestrator + structured server-error decoding.client/cmd/entra_enroll.go—netbird entra-enrollsubcommand with full CLI.client/internal/profilemanager/config.go—EntraEnroll *EntraEnrollStatefield persisted per profile so re-enrolment is skipped on subsequent runs.Test harness
tools/entra-test/docker-compose.yml+Dockerfile.entra-test— full local deployment (Postgres + management, auto-migrated tables).tools/entra-test/enroll-tester/— synthetic client with--demomode (in-process handler + fake Graph) for zero-dependency E2E verification.tools/entra-test/seed-account/— creates a minimal NetBird account row for testing without an IdP.tools/entra-test/make-pfx/— generates test PFX files.tools/entra-test/TESTING.md— step-by-step walkthrough.Docs
docs/ENTRA_DEVICE_AUTH.md— user/admin-facing documentation: concepts, resolution semantics, error codes, REST API reference, enrolment protocol reference, security notes, live-tenant verification matrix, future work.Testing
Unit tests (all green)
management/server/integrations/entra_device/— 20+ tests covering mapping resolution (priority, union, tie-break, revoked/expired filtering, wildcard, fallback), nonce store (single-use, TTL, concurrent), cert validator (RSA-PSS, PKCS1v15, ECDSA, tampered sig, expired, garbage), manager (happy path, union merge, unknown tenant, disabled integration, bad nonce, disabled device, missing device, fail-closed Graph errors, compliance pass/fail/API-failure, no mapping matched, device-id mismatch, nonce single-use), store (CRUD + bootstrap token lifecycle).management/server/http/handlers/entra_join/— 4 tests with full HTTP round-trip (real cert + signed nonce, error code mapping, malformed JSON).client/internal/enroll/entradevice/— 7 tests covering PFX load (happy, wrong password, missing file), enroller round-trip, structured server error decode, input validation, trailing/join/entratolerance.Live-tenant verification
Run end-to-end against a real Entra tenant using the Docker harness:
403 no_mapping_matched403 device_disabled403 device_not_compliantEvery reject path is atomic — zero rows written on any 4xx/5xx outcome. See
docs/ENTRA_DEVICE_AUTH.md#live-tenant-verification-resultsfor the full matrix.End-to-end Phase 2 client run
Built
netbird.exefrom this branch; ran:→
ENROLMENT SUCCESS. Peer appeared in Postgres (peers+group_peersrows),/join/entrastripped from the savedManagementURL,EntraEnrollstate persisted to the profile config file.Explicit non-goals / follow-ups
CertProviderinterface is shaped so it drops in next toPFXProviderwithout enroller changes. Two routes documented: CGO+smimesign vs pure-Goncrypt.dllsyscalls. Seedocs/ENTRA_DEVICE_AUTH.md#future-work.enrollmentBootstrapTokenproto field onLoginRequest.Manager.ValidateBootstrapTokenis implemented + tested but not yet invoked from the gRPC Login path (requires a proto regen). Post-enrolment Login currently works because the WG pubkey already identifies the peer; the bootstrap token just adds belt-and-braces for a narrow race window.revalidation_intervalexists on the integration; no background scheduler yet.netbird up. Todayentra-enrollis a separate subcommand. A follow-up can detect/join/entrain the URL passed toupand run the enrolment automatically.netbirdio/dashboardis a separate PR.Compatibility + safety
management/server/http/handler.go— adds two bypass paths (/join/entra/*) + oneinstallEntraDeviceAuthcall (best-effort — logs a warning and skips if interfaces don't match).management/server/permissions/modules/module.go— adds oneModuleconstant.client/internal/profilemanager/config.go— adds one nullableEntraEnrollfield.idp/dex/{config,provider}.go— route SQLite3 construction through a build-tagged shim.EntraDeviceAuthrow exists for an account, the/join/entraendpoints simply return404 integration_not_found. No observable behaviour change for existing deployments.Branch commits
~6,300 insertions, 43 files. Every line committed has been either unit-tested or live-tested.
Summary by CodeRabbit
New Features
Documentation
Tests & Tools