Skip to content

feat: Entra/Intune device authentication as a new peer-registration method#5977

Open
thvevirtue wants to merge 19 commits intonetbirdio:mainfrom
thvevirtue:feature/entra-device-auth
Open

feat: Entra/Intune device authentication as a new peer-registration method#5977
thvevirtue wants to merge 19 commits intonetbirdio:mainfrom
thvevirtue:feature/entra-device-auth

Conversation

@thvevirtue
Copy link
Copy Markdown

@thvevirtue thvevirtue commented Apr 24, 2026

Entra / Intune Device Authentication — a third peer-registration method

What this PR adds

A new way for devices to join a NetBird network, alongside setup keys and SSO: the device itself authenticates using an Entra-issued certificate, with zero user interaction. Admins map Entra security groups to NetBird auto-groups; the device ends up in the right NetBird groups based on its Entra membership.

The feature is served on a dedicated endpoint path (/join/entra) so it never mixes with the existing gRPC Login/Sync surface or the admin /api surface.

Deployment story

Intune PKCS Certificate profile → PFX on device
                ↓
netbird entra-enroll --management-url https://<mgmt>/join/entra \
                     --entra-tenant YOUR_TENANT \
                     --entra-pfx    device.pfx \
                     --entra-pfx-password-env NB_ENTRA_PFX_PASSWORD
                ↓
netbird up                  ← normal gRPC login, peer already registered

Design in one page

  1. Admin configures an EntraDeviceAuth integration (tenant id, app credentials, compliance requirements) and EntraDeviceAuthMapping rows that pair Entra security group IDs with NetBird auto-groups.
  2. Device calls GET /join/entra/challenge → server issues a single-use 32-byte nonce (60s TTL).
  3. Device signs the nonce with its device cert private key (RSA-PSS / PKCS1v15 / ECDSA all accepted).
  4. Device POSTs /join/entra/enroll with cert chain + signed nonce + WG pubkey.
  5. Server validates:
    • Cert chain + nonce signature (cryptographic proof of private-key possession).
    • Microsoft Graph lookup: device exists + accountEnabled == true; optionally complianceState == compliant via Intune.
    • Transitive group membership enumeration via Graph.
    • Mapping resolution (strict_priority or union mode, with wildcard + tenant-only fallback options).
  6. Peer is created in NetBird's DB, assigned to the resolved auto-groups + the All group, with proper IP allocation + DNS label + integrated peer validator hooks (approval workflow still applies).
  7. Server emits a PeerAddedWithEntraDevice activity event with full audit meta (device id, matched mapping ids, resolution mode, applied auto-groups).
  8. Server returns a LoginResponse-shaped JSON + a one-shot bootstrap token. The client persists state and proceeds with normal gRPC Login using the WG pubkey the server already knows about.

What's landed

Server

  • management/server/types/entra_device_auth.go — domain model (EntraDeviceAuth, EntraDeviceAuthMapping, MappingResolution).
  • management/server/integrations/entra_device/ — cert validator (RSA/ECDSA), Graph client (OAuth2 client-credentials, device lookup, transitiveMemberOf, compliance), single-use nonce store, strict_priority + union mapping resolver, gorm-backed Store, manager orchestrator, activity codes.
  • management/server/http/handlers/entra_join/ — public /join/entra/{challenge,enroll} routes.
  • management/server/http/handlers/entra_device_auth/ — admin CRUD on /api/integrations/entra-device-auth{,/mappings}.
  • management/server/entra_device_enroll.goDefaultAccountManager.EnrollEntraDevicePeer (reuses IP allocation, retries, group assignment, integrated peer validator, activity events).
  • management/server/permissions/modules/module.go — new EntraDeviceAuth permission module.
  • idp/dex/{config,provider}.go + sqlite_{cgo,nocgo}.gobonus fix: split Dex SQLite3 construction behind a CGO build tag so builds work under CGO_ENABLED=0 (was a pre-existing upstream break).

Client

  • client/internal/enroll/entradevice/ — pluggable CertProvider interface + PFX-backed implementation + challenge/enroll orchestrator + structured server-error decoding.
  • client/cmd/entra_enroll.gonetbird entra-enroll subcommand with full CLI.
  • client/internal/profilemanager/config.goEntraEnroll *EntraEnrollState field persisted per profile so re-enrolment is skipped on subsequent runs.

Test harness

  • tools/entra-test/docker-compose.yml + Dockerfile.entra-test — full local deployment (Postgres + management, auto-migrated tables).
  • tools/entra-test/enroll-tester/ — synthetic client with --demo mode (in-process handler + fake Graph) for zero-dependency E2E verification.
  • tools/entra-test/seed-account/ — creates a minimal NetBird account row for testing without an IdP.
  • tools/entra-test/make-pfx/ — generates test PFX files.
  • tools/entra-test/TESTING.md — step-by-step walkthrough.

Docs

  • docs/ENTRA_DEVICE_AUTH.md — user/admin-facing documentation: concepts, resolution semantics, error codes, REST API reference, enrolment protocol reference, security notes, live-tenant verification matrix, future work.

Testing

Unit tests (all green)

  • management/server/integrations/entra_device/20+ tests covering mapping resolution (priority, union, tie-break, revoked/expired filtering, wildcard, fallback), nonce store (single-use, TTL, concurrent), cert validator (RSA-PSS, PKCS1v15, ECDSA, tampered sig, expired, garbage), manager (happy path, union merge, unknown tenant, disabled integration, bad nonce, disabled device, missing device, fail-closed Graph errors, compliance pass/fail/API-failure, no mapping matched, device-id mismatch, nonce single-use), store (CRUD + bootstrap token lifecycle).
  • management/server/http/handlers/entra_join/4 tests with full HTTP round-trip (real cert + signed nonce, error code mapping, malformed JSON).
  • client/internal/enroll/entradevice/7 tests covering PFX load (happy, wrong password, missing file), enroller round-trip, structured server error decode, input validation, trailing /join/entra tolerance.

Live-tenant verification

Run end-to-end against a real Entra tenant using the Docker harness:

Scenario Input Expected Actual
Happy path, wildcard mapping real device, compliance off success, peer created
Happy path, specific-group mapping same real device success, peer created
Device not in mapped Entra group real device, non-matching mapping 403 no_mapping_matched
Device absent from Entra bogus device GUID 403 device_disabled
Compliance on, compliant device compliant device id success, peer created
Compliance on, non-compliant device non-compliant device id 403 device_not_compliant

Every reject path is atomic — zero rows written on any 4xx/5xx outcome. See docs/ENTRA_DEVICE_AUTH.md#live-tenant-verification-results for the full matrix.

End-to-end Phase 2 client run

Built netbird.exe from this branch; ran:

netbird entra-enroll --management-url http://localhost:33073/join/entra \
  --entra-tenant 5a7a81b2-99cc-45fc-b6d1-cd01ba176c26 \
  --entra-pfx device.pfx \
  --entra-pfx-password-env NB_ENTRA_PFX_PASSWORD

ENROLMENT SUCCESS. Peer appeared in Postgres (peers + group_peers rows), /join/entra stripped from the saved ManagementURL, EntraEnroll state persisted to the profile config file.

Explicit non-goals / follow-ups

  1. Windows cert store + TPM-backed CNG signing. Scoped, researched, deliberately deferred. CertProvider interface is shaped so it drops in next to PFXProvider without enroller changes. Two routes documented: CGO+smimesign vs pure-Go ncrypt.dll syscalls. See docs/ENTRA_DEVICE_AUTH.md#future-work.
  2. enrollmentBootstrapToken proto field on LoginRequest. Manager.ValidateBootstrapToken is implemented + tested but not yet invoked from the gRPC Login path (requires a proto regen). Post-enrolment Login currently works because the WG pubkey already identifies the peer; the bootstrap token just adds belt-and-braces for a narrow race window.
  3. Continuous revalidation (Phase 5). Reserved field revalidation_interval exists on the integration; no background scheduler yet.
  4. Auto-routing in netbird up. Today entra-enroll is a separate subcommand. A follow-up can detect /join/entra in the URL passed to up and run the enrolment automatically.
  5. Dashboard UI (Phase 4). Admin configuration happens via the REST API today. UI work in netbirdio/dashboard is a separate PR.

Compatibility + safety

  • Zero breaking changes. All new types live in new packages; the only existing file modifications are:
    • management/server/http/handler.go — adds two bypass paths (/join/entra/*) + one installEntraDeviceAuth call (best-effort — logs a warning and skips if interfaces don't match).
    • management/server/permissions/modules/module.go — adds one Module constant.
    • client/internal/profilemanager/config.go — adds one nullable EntraEnroll field.
    • idp/dex/{config,provider}.go — route SQLite3 construction through a build-tagged shim.
  • Feature-flagged by data. If no EntraDeviceAuth row exists for an account, the /join/entra endpoints simply return 404 integration_not_found. No observable behaviour change for existing deployments.
  • Server accepts CGO_ENABLED=0 builds. Everything new compiles on the standard pipeline without a C toolchain.

Branch commits

9f04a0c entra_device: address SonarCloud feedback (complexity + creds)
688239f docs: bump Entra device auth status header (Phase 2 shipped)
cb3c676 entra_device: defer Windows cert-store provider to a follow-up
d4228d8 client: netbird entra-enroll subcommand (Phase 2 complete)
d38984e client/entradevice: PFX CertProvider + Enroller + tests (Phase 2 core)
50c249d docs: record live-tenant verification matrix
9b27c0f entra-test: seed-account helper + expose Postgres port
17e718d entra-test: fix docker-compose flags + config for real bring-up
b47a05a docs: user-facing documentation for Entra device auth
783117e enroll-tester: add --demo mode for in-process end-to-end verification
e7667d5 entra_device: test harness (Dockerfile + compose + synthetic client) + dex CGO shim
8c921a7 entra_device: unit tests for enrolment flow + peer-registration integration
09c4bc2 entra_device: wire into AccountManager + HTTP router
bd3a7e4 entra_device: Phase 1 server skeleton

~6,300 insertions, 43 files. Every line committed has been either unit-tested or live-tested.

Summary by CodeRabbit

  • New Features

    • Entra/Intune device authentication: challenge/enroll flow, tenant/device enrollment, mapping resolution (strict/union), one-shot bootstrap tokens, and admin APIs for integrations and mappings.
    • CLI enrollment: new command to perform device enrollment, persist enrollment state locally, and display enrollment result (peer ID, URL, resolved groups).
  • Documentation

    • Full Entra device auth guide and end-to-end testing docs.
  • Tests & Tools

    • Extensive tests and local tooling (enroll-tester, demo server, make-pfx, seed-account, Docker Compose).

Add a new peer-registration method that authenticates devices against Entra/Intune,
analogous to setup keys. Admins map Entra security groups to NetBird auto-groups with
configurable resolution (strict_priority / union), revocation, expiry, ephemeral and
extra-DNS-labels flags per mapping.

The feature lives on the dedicated /join/entra path so it never mixes with the
existing gRPC Login/Sync flow.

This commit lays down the self-contained Phase 1 skeleton:

- management/server/types/entra_device_auth.go: domain model.
- management/server/integrations/entra_device:
  - errors.go           stable error codes (no_mapping_matched etc.)
  - activity.go          activity codes registered via RegisterActivityMap
  - types.go             challenge/enroll DTOs + internal value types
  - nonce_store.go       in-memory single-use nonce store
  - resolution.go        multi-group mapping resolution (heart of the feature)
  - resolution_test.go   unit tests covering priority, union, tie-breaking,
                         revoked/expired filtering, wildcard, fallback
  - cert_validator.go    Entra device cert chain + proof-of-possession check
  - graph_client.go      Microsoft Graph app-only client (device, groups, compliance)
  - store.go             package-local Store interface + MemoryStore
  - manager.go           orchestrator (challenge + enroll + Graph + resolve + enroll)
- management/server/http/handlers/entra_join/handler.go: /join/entra routes
- management/server/http/handlers/entra_device_auth/handler.go: admin CRUD
- management/server/permissions/modules/module.go: new EntraDeviceAuth module

No existing code paths are modified; the new packages can be wired into the
management server's router without touching the gRPC server, AccountManager
or SQL store in this commit.

Pending (later phases):
- SQL-backed Store impl + gorm auto-migrate registration
- AccountManager.EnrollEntraDevicePeer wiring + PeerEnroller closure
- Proto: optional enrollmentBootstrapToken field on LoginRequest
- Router wiring in management/server/http/handler.go
- OpenAPI schemas + codegen
- Client (Windows device-cert provider, URL-path routing)
- Dashboard admin pages
- Continuous re-validation scheduler
Link the Entra device enrolment flow with NetBird's peer registration + group
assignment plumbing so /join/entra/enroll actually creates peers.

* DefaultAccountManager.EnrollEntraDevicePeer (entra_device_enroll.go):
  creates the NetBird peer from a pre-resolved Entra identity + mapping input.
  Mirrors the essential data-writing portion of AddPeer (IP allocation with
  retries on unique-constraint collisions, group assignment incl. All-group,
  ephemeral tracking, network serial bump, integrated peer validator hook) but
  drops the setup-key / user-JWT branches since the Entra cert already
  authenticated the device. Emits a PeerAddedWithEntraDevice activity event
  with full structured meta (device id, all matched mapping ids, resolution
  mode, applied auto-groups) for audit trails.

* entraDevicePeerEnroller (entra_device_enroll.go): adapter that exposes
  AsEntraDevicePeerEnroller() ed.PeerEnroller on the account manager so the
  entra_device.Manager can call back without a circular import on the server
  package.

* SQLStore (integrations/entra_device/sql_store.go): gorm-backed Store
  implementation that auto-migrates the two new tables (entra_device_auth +
  entra_device_auth_mappings) and keeps bootstrap tokens in a TTL-bound
  in-memory cache.

* Install wiring (http/handlers/entra_device_auth/wiring.go): one-call
  factory that builds the Store + Manager, attaches the PeerEnroller, mounts
  the device-facing /join/entra routes on the rootRouter (unauthenticated,
  device cert is the credential) and the admin CRUD routes on the /api
  subrouter (permission-gated via modules.EntraDeviceAuth).

* handler.go: bypass paths for /join/entra/{challenge,enroll} so the shared
  auth middleware doesn't reject anonymous device requests. Calls the new
  installEntraDeviceAuth helper, which is a best-effort install (logs a
  warning and skips the integration if the store doesn't expose *gorm.DB or
  the manager doesn't implement AsEntraDevicePeerEnroller, so the binary
  keeps building even if those assumptions change upstream).

Compile status: go vet of the new and touched packages is clean (empty
output after filtering the pre-existing upstream idp/dex CGO issue unrelated
to this work).
…ration

Adds comprehensive unit tests covering every layer of the Entra device
authentication feature, including a direct verification that the enrolment
flow invokes the account-manager-side peer enroller with the correctly
resolved mapping configuration.

New test files:

* nonce_store_test.go:
  - Issue produces distinct, non-empty, non-expired nonces
  - Consume succeeds exactly once per nonce (single-use)
  - Unknown nonces are rejected
  - Expired nonces are rejected
  - Concurrent Issue/Consume is race-free

* cert_validator_test.go:
  - RSA-PSS happy path (primary Windows CNG emission mode)
  - RSA-PKCS1v15 happy path (alternate SCEP/CNG emission mode)
  - ECDSA happy path (Intune-on-macOS style)
  - Tampered signature rejected with invalid_signature
  - Expired / not-yet-valid cert rejected with invalid_cert_chain
  - Empty chain, garbage base64, garbage DER each rejected
  - Chain verification against an empty trust pool correctly rejects

* manager_test.go (end-to-end Manager.Enroll):
  - Happy path: cert + signed nonce + Graph device + group lookup -> the
    PeerEnroller is called with the exact AutoGroups / Ephemeral /
    AllowExtraDNSLabels / AccountID / EntraDeviceMapping that the mapping
    resolution produced. This is the primary integration-with-peer-
    registration assertion: whatever the integration decides, exactly
    that flows into the NetBird AddPeer plumbing.
  - Union-mode multi-mapping merge: confirms that when a device matches
    two mappings, AutoGroups are unioned, Ephemeral ORs, AllowExtraDNSLabels
    ANDs. Every matched mapping ID is recorded in the EnrollPeerInput.
  - Rejects unknown tenant / disabled integration / bad nonce
  - Rejects disabled Entra device / missing-in-Graph device
  - Fails CLOSED on transient Graph errors (CodeGroupLookupFailed)
  - RequireIntuneCompliant: compliant device enrolls; non-compliant is
    rejected; compliance API failure also fails closed
  - No mapping matched: PeerEnroller is NOT called
  - Device-id mismatch between cert and request is rejected before Graph
    is consulted
  - Nonce single-use: replaying the same nonce fails the second time

* store_test.go:
  - MemoryStore integration + mapping CRUD, with per-account isolation
  - Bootstrap token single-use semantics for both MemoryStore and SQLStore
  - SQLStore bootstrap-token TTL expiry

* entra_join/handler_test.go (HTTP-level round-trip):
  - GET /join/entra/challenge returns base64 nonce + expiry
  - POST /join/entra/enroll with a valid self-signed cert + correctly
    signed nonce completes enrolment, returns 200 with bootstrap token
    and resolved auto-groups, and invokes the fake PeerEnroller with
    the expected input (verifying end-to-end wiring from HTTP to the
    account-manager hook)
  - Unknown tenant -> 404 with code integration_not_found
  - Malformed JSON body -> 400

All test files vet clean.

Note: the tests cannot be executed locally on this Windows machine due to
the pre-existing upstream netbirdio/dex CGO-gating bug (requires gcc) that
blocks any build that transitively imports management/server/types. The
tests run fine on Linux CI or on Windows with CGO_ENABLED=1 + a C compiler.
…+ dex CGO shim

Adds the server-side test harness recommended for Path-A verification:

* idp/dex: split sqlite3 storage construction behind a CGO build tag.
  sqlite_cgo.go (cgo) calls (&sql.SQLite3{File: ...}).Open; sqlite_nocgo.go
  (!cgo) returns a clear error. Fixes the pre-existing upstream build break
  that prevented 'go build ./...' and 'go test ./...' under CGO_ENABLED=0.
  All Entra unit tests + HTTP handler tests now run green on Windows
  without a C toolchain (confirmed locally).

* management/Dockerfile.entra-test: multi-stage Dockerfile that builds
  netbird-mgmt from source with CGO_ENABLED=1, produces a slim Debian-slim
  runtime image, exposes 33073 (gRPC) + 33081 (HTTP/join/entra), includes
  a healthcheck that probes /join/entra/challenge.

* tools/entra-test/docker-compose.yml: brings up Postgres + management
  from the feature branch. Auto-migrates the two new gorm tables on boot.

* tools/entra-test/enroll-tester/: standalone Go program that impersonates
  a NetBird device. Generates a self-signed RSA cert (CN = device id),
  fetches /challenge, signs the nonce with RSA-PSS SHA-256, POSTs /enroll,
  prints the resolved auto-groups + bootstrap token. Lets you exercise the
  full enrolment round-trip before the real Windows client is implemented.
  Verified to build and run clean under CGO_ENABLED=0.

* tools/entra-test/TESTING.md: step-by-step walkthrough covering Docker
  bring-up, Entra app registration, admin-API integration + mapping
  creation, synthetic client invocation, expected error codes, and what
  the harness does NOT yet cover (real Windows client = Phase 2).

With these pieces the feature is ready for end-to-end server-side testing
against a real Entra tenant. The missing Phase-2 work (client URL routing,
Windows CNG signing, tray UI) is independent of this harness.
Spins up the real entra_join handler in-process, seeds a wildcard mapping,
and enrols against itself using a fake Graph client. Lets you verify the
full HTTP -> handler -> validator -> resolver -> PeerEnroller pipeline
with a single command, no Docker or Entra tenant required:

  enroll-tester --demo -v

Exercised end-to-end: cert chain parsing, RSA-PSS signature verification
against a freshly generated RSA key, single-use nonce lifecycle, wildcard
mapping resolution, PeerEnroller invocation with correctly populated input,
and bootstrap-token issuance. Confirmed locally: returns 200 OK with the
expected peer_id, matched mapping ids, resolved auto-groups, and a
64-char bootstrap token.

The only piece this does NOT exercise is the real HTTPGraphClient against
a live Entra tenant; use the full docker-compose walkthrough in
tools/entra-test/TESTING.md for that.
Add docs/ENTRA_DEVICE_AUTH.md with:

* Feature overview + when to use it
* Conceptual model: integration, mappings, resolution modes, wildcards,
  tenant-only fallback, stable error codes with HTTP-status mapping
* Azure portal walkthrough for the app registration + required Graph
  application permissions
* Full REST admin API reference (CRUD for integration + mappings)
* /join/entra device-enrolment protocol reference including signature
  formats (RSA-PSS / PKCS1v15 / ECDSA) and the exact nonce-signing rule
* Architecture diagram + package layout
* Security notes (fail-closed Graph, bypass paths, bootstrap token TTL,
  client-secret encryption caveat)
* Current implementation status table so readers know what's done vs.
  deferred to Phase 2/4/5
Fixes uncovered while actually running the stack:

* Use a single --port 33073 (NetBird multiplexes gRPC + HTTP via cmux);
  --mgmt-api-port doesn't exist.
* Supply NB_STORE_ENGINE + NB_STORE_ENGINE_POSTGRES_DSN env vars so the
  server can reach Postgres (NetBird reads these directly).
* Replace placeholder DataStoreEncryptionKey with a real 32-byte base64
  value so the field encryptor initialises cleanly.
* Align Dockerfile EXPOSE + HEALTHCHECK to the single port.

Verified: docker compose up -d brings the stack healthy, /join/entra/challenge
responds with fresh nonces, and both new gorm tables (entra_device_auth +
entra_device_auth_mappings) appear in Postgres.
Added during the first real-tenant run-through:

* tools/entra-test/seed-account/: small Go utility that inserts a minimal
  but valid NetBird Account row + required groups via gorm. Uses netbird's
  own types.NewNetwork() + types.Account so JSON-encoded fields (network_net,
  settings.Extra, etc.) round-trip correctly. Needed because the /join/entra
  flow creates peers under an existing account, and the normal AccountManager
  signup requires SSO/IdP which isn't wired in the test harness.

* docker-compose.yml: expose postgres on host:5432 so seed-account (and
  psql) can connect from the host. Flagged in-line as test-harness-only.

Verified end-to-end against a real Entra tenant:
  1. /join/entra/challenge returned a nonce
  2. POST /enroll validated the synthetic cert + signature
  3. Server authenticated to the tenant via Graph OAuth2 client-credentials
  4. Graph found the device and enumerated groups
  5. Wildcard mapping resolved -> auto_groups=[test-group-1]
  6. Peer created in Postgres with IP + DNS label + All-group membership
  7. Bootstrap token issued
Document the six end-to-end scenarios exercised against a real Entra tenant
using the enroll-tester harness. All accept + reject paths (happy path with
wildcard and specific-group mappings, no_mapping_matched, device_disabled,
compliance-on with compliant device, compliance-on with non-compliant device)
behaved as specified with zero database side effects on any reject.
First half of Phase 2 (the real NetBird client). Lands the pluggable cert
provider + enrolment orchestration + unit tests. Still pending: profile
persistence, URL-path routing in auth.go, CLI flags, end-to-end run.

New package client/internal/enroll/entradevice:

* provider.go - CertProvider interface with two required methods
  (CertChainDER, SignNonce, DeviceID). PFXProvider implementation loads a
  .pfx file (such as Intune deploys via PKCS Certificate profile), signs
  nonces with RSA-PSS SHA-256 or ECDSA. Interface shaped so a future CNG
  provider (TPM-backed non-exportable key) can drop in without changes.

* enroller.go - drives GET /join/entra/challenge + POST /join/entra/enroll.
  Returns a structured *Error when the server responds with a machine-
  readable error body so clients can surface specific UI for codes like
  no_mapping_matched or device_not_compliant.

* state.go - EntraEnrollState DTO for per-profile persistence (tenant id,
  device id, peer id, enrolled_at, resolved auto-groups, matched mapping
  ids).

* enroller_test.go - 7 tests covering PFX load (happy path, wrong password,
  missing file), enrolment round-trip against httptest server, structured
  server error decode, input validation, and trailing /join/entra in
  BaseURL tolerance. All green locally.

New go.mod dependency: software.sslmate.com/src/go-pkcs12 v0.7.1 (the
stdlib golang.org/x/crypto/pkcs12 is frozen and lacks DecodeChain/Encode).
Add a first-class subcommand that drives the Entra enrolment flow against
a /join/entra endpoint and persists the result into the active profile.

* client/cmd/entra_enroll.go: new cobra command. Loads the active profile,
  reads PFX (path + password env var), derives the WG pubkey from the
  profile's WG private key, runs the Enroller, persists the returned state
  into profilemanager.Config.EntraEnroll, and strips /join/entra from the
  saved ManagementURL so subsequent 'netbird up' goes straight to gRPC.
  Includes a dev-path fallback for Windows boxes without admin rights
  (util.WriteJsonWithRestrictedPermission -> util.WriteJson).

* client/cmd/root.go: register the new subcommand.

* client/internal/profilemanager/config.go: add EntraEnroll *EntraEnrollState
  field to Config (JSON, optional). Duplicated the state type here instead
  of importing entradevice to avoid a circular dep.

* tools/entra-test/make-pfx: standalone helper that produces a self-signed
  PFX with a supplied Entra device id as Subject CN. Used for local E2E
  verification when Intune-issued PFXes aren't available.

* tools/entra-test/.gitignore: exclude built binaries and generated PFXes.

End-to-end verification run against the live Docker stack + real Entra
tenant:

  netbird entra-enroll \\
    --management-url http://localhost:33073/join/entra \\
    --entra-tenant   5a7a81b2-99cc-45fc-b6d1-cd01ba176c26 \\
    --entra-pfx      tools/entra-test/device.pfx \\
    --entra-pfx-password-env NB_ENTRA_PFX_PASSWORD

Returned ENROLMENT SUCCESS, persisted EntraEnroll state into the profile,
and the peer was created in Postgres with the correct mapping-resolved
auto-groups + All-group membership.
Tried landing a Windows cert-store provider via github/smimesign/certstore.
It builds cleanly but requires CGO (links against crypt32.dll + ncrypt.dll
via #cgo LDFLAGS). Our build pipeline is CGO_ENABLED=0 and the dev box has
no C toolchain; switching to CGO is a substantial infrastructure change
(Dockerfiles, CI, goreleaser, cross-compile).

Rather than land half-done or force a CGO migration, roll back cleanly:

* Remove the smimesign/certstore dep from go.mod / go.sum.
* Delete the provider_certstore_{windows,stub}.go stubs.
* Revert entra_enroll.go to PFX-only CLI (as was working and live-tested).
* Add a 'Future work' section to docs/ENTRA_DEVICE_AUTH.md spelling out
  the two viable implementation routes (CGO/smimesign vs pure-Go ncrypt)
  so whoever picks this up next doesn't have to re-do the research.

The existing CertProvider interface is unchanged, so either future
implementation drops in next to PFXProvider without touching the enroller.

The PFX path we landed + live-tested still works:
  - Intune PKCS Certificate profile can deploy PFX files to both Windows
    and macOS devices (not just Windows cert store).
  - The server validates any RSA/ECDSA cert the client presents.
  - Only tradeoff vs cert store: private key is in a password-protected
    file on disk rather than TPM-bound. Acceptable pattern for many
    enterprise deployments.
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 24, 2026

CLA assistant check
All committers have signed the CLA.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds end-to-end Entra/Intune device enrollment: client CLI and PFX handling, a one-shot enrollment flow against management /join/entra, server nonce/certificate validation, Microsoft Graph queries, mapping resolution, persistence (memory/SQL), admin APIs, wiring, tests, and local dev tooling for E2E testing.

Changes

Cohort / File(s) Summary
Client CLI & profile
client/cmd/entra_enroll.go, client/cmd/root.go, client/internal/profilemanager/config.go, client/internal/profilemanager/config_entra.go
New entra-enroll Cobra command; resolves PFX/password, loads/updates profile, derives WG pubkey, persists EntraEnrollState into profile config and honors --force. Adds EntraEnroll field and state struct.
Client enrolment implementation & tests
client/internal/enroll/entradevice/enroller.go, .../enroller_test.go, .../provider.go, .../state.go
Adds Enroller performing GET challenge → POST enroll, base64 nonce handling, nonce signing via CertProvider, PFX loader (LoadPFX), DER chain encoding, structured error type, default timeout, and comprehensive unit tests.
Management HTTP wiring & handlers
management/server/http/handler.go, management/server/http/handlers/entra_join/handler.go, .../handler_test.go, management/server/http/handlers/entra_device_auth/handler.go, .../wiring.go
Installs unauthenticated /join/entra endpoints and admin CRUD wiring for Entra integration/mappings; adds installation helper that conditionally mounts handlers and permission/account resolver hooks; admin handlers implement CRUD, validation, and JSON error payloads.
Integration core: manager, resolver, errors, types & tests
management/server/integrations/entra_device/manager.go, resolution.go, errors.go, types.go, manager_test.go, resolution_test.go
Implements Manager: nonce issuance/consumption, cert+signature validation flow, Graph queries, optional Intune compliance, mapping resolution (strict/union/tenant-fallback), bootstrap token issuance/validation, typed errors, DTOs, and extensive tests.
Certificate validation & graphs
management/server/integrations/entra_device/cert_validator.go, ..._test.go, graph_client.go
Adds CertValidator (chain parsing, optional trust roots, RSA/ECDSA signature verification with fallbacks) and an HTTP GraphClient (token acquisition, pagination, device/compliance queries). Includes tests.
Persistence: store, SQL store, nonce & token caches
management/server/integrations/entra_device/store.go, .../sql_store.go, nonce_store.go, ..._test.go, store_test.go
Defines Store interface with MemoryStore and GORM-backed SQLStore, migrations, mapping CRUD, bootstrap token caching with TTL/consumption, and an in-memory nonce store (TTL, GC, constant-time consume). Tests added.
AccountManager adapter (peer creation)
management/server/entra_device_enroll.go
Adds peer-enrollment adapter to create peers from Entra enrollments: validate input, avoid WG-key duplicates, allocate IP/DNS with retry, persist peer, attach groups, emit activity, and handle compensation/deletes.
Activity codes, types, models
management/server/integrations/entra_device/activity.go, management/server/types/entra_device_auth.go
Registers Entra-specific activity codes; adds domain models for EntraDeviceAuth and EntraDeviceAuthMapping, mapping resolution constants and helpers.
Dex SQLite build-tag refactor
idp/dex/config.go, idp/dex/provider.go, idp/dex/sqlite_cgo.go, idp/dex/sqlite_nocgo.go
Extracts SQLite opening into openSQLite with cgo and !cgo variants to surface clear errors when SQLite (CGO) is unavailable.
Dev tooling & docs
tools/entra-test/*, docs/ENTRA_DEVICE_AUTH.md, management/server/integrations/entra_device/README.md, management/Dockerfile.entra-test
Adds enroll-tester, make-pfx, seed-account tools, Docker Compose and Dockerfile for local E2E, testing docs, and comprehensive integration documentation.
Go module
go.mod
Adds software.sslmate.com/src/go-pkcs12 v0.7.1 for PKCS#12/PFX handling.
Misc
tools/entra-test/.gitignore
Ignores local test artifacts (netbird.exe, device.pfx).

Sequence Diagram(s)

sequenceDiagram
    participant CLI as Client CLI
    participant MgmtHTTP as Management HTTP (/join/entra)
    participant Validator as CertValidator
    participant Graph as Microsoft Graph
    participant Enroller as PeerEnroller/Store

    CLI->>MgmtHTTP: GET /join/entra/challenge
    MgmtHTTP->>MgmtHTTP: Issue nonce (NonceStore)
    MgmtHTTP-->>CLI: {nonce, expires_at}

    CLI->>CLI: load PFX, sign nonce, derive WG pubkey
    CLI->>MgmtHTTP: POST /join/entra/enroll (cert_chain, nonce, signature, wg_pubkey,...)

    MgmtHTTP->>Validator: Validate cert_chain + signature
    Validator-->>MgmtHTTP: DeviceIdentity

    MgmtHTTP->>Graph: Device lookup / TransitiveMemberOf / IsCompliant
    Graph-->>MgmtHTTP: group IDs, compliance

    MgmtHTTP->>MgmtHTTP: Resolve mappings -> resolved mapping
    MgmtHTTP->>Enroller: EnrollEntraDevicePeer(EnrollPeerInput)
    Enroller-->>MgmtHTTP: Enroll result (peerID, config)
    MgmtHTTP-->>CLI: EnrollResponse (peerID, bootstrap_token, resolved groups, mapping IDs)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • mlsmaycon
  • bcmmbaga

Poem

🐰 I hopped with my PFX snug under fur,
I signed the nonce, my whiskers a blur.
Graph counted my groups, mappings set the way,
A bootstrap token glowed — now off I play! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.74% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly summarizes the main change: introduction of Entra/Intune device authentication as a new peer-registration method. It is concise and specific.
Description check ✅ Passed The PR description is comprehensive and well-structured, covering what is added, deployment story, design overview, code organization, testing results, non-goals, compatibility, and commit history. All required template sections are addressed in substance, though some formal checkboxes were not explicitly ticked.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread tools/entra-test/docker-compose.yml Fixed
Comment thread tools/entra-test/docker-compose.yml Fixed
Comment thread tools/entra-test/seed-account/main.go Fixed
First-pass response to PR netbirdio#5977 review feedback from SonarCloud.

Complexity refactors (extract helpers to bring cognitive-complexity and
function-length under SonarCloud thresholds):

* management/server/integrations/entra_device/resolution.go: split
  ResolveMapping into filterCandidates + handleNoCandidates helpers, and
  factor the sort + per-mapping merge logic out of resolveUnion into
  sortByPriorityThenID + appendNewAutoGroups + mergeMinExpiry. Same
  semantics; all 11 resolution_test.go cases still pass.

* management/server/integrations/entra_device/manager.go: split Enroll
  into 7 focused helpers (loadEnabledIntegration, consumeNonce +
  decodeNonceBytes, validateCertAndDeviceID, verifyWithGraph,
  resolveMappingForAccount, enrollPeer, issueBootstrapToken). Same
  externally-visible behaviour; the manager_test.go suite still passes
  end-to-end including the HTTP handler integration tests.

Hard-coded-credential findings (dev-test-harness only, flagged BLOCKER +
MAJOR by SonarCloud):

* tools/entra-test/docker-compose.yml: parametrise Postgres password via
  \$\{NB_TEST_PG_PASSWORD:-netbird\} so operators can override via a .env
  file. // NOSONAR annotations on the fixtures that legitimately have a
  default for local-dev use.

* tools/entra-test/seed-account/main.go: move the DSN default away from
  a single hard-coded string into an env-var-override helper
  (envOrDefault). Honours NB_TEST_PG_{USER,PASSWORD,DB} consistently with
  the compose file.

Pre-existing issues intentionally NOT addressed (not introduced by this
PR):

* client/internal/profilemanager/config.go file-length (751/750) \u2014
  config.go was ~730 lines before this PR; adding the 20-line
  EntraEnrollState struct crossed the threshold. Splitting config.go
  into multiple files is out of scope for this feature PR.

* management/server/http/handler.go NewAPIHandler complexity + 23
  parameters \u2014 function existed before, this PR adds one call
  (installEntraDeviceAuth) that doesn't change the signature.
thvevirtue pushed a commit to thvevirtue/netbird that referenced this pull request Apr 24, 2026
First-pass response to PR netbirdio#5977 review feedback from SonarCloud.

Complexity refactors (extract helpers to bring cognitive-complexity and
function-length under SonarCloud thresholds):

* management/server/integrations/entra_device/resolution.go: split
  ResolveMapping into filterCandidates + handleNoCandidates helpers, and
  factor the sort + per-mapping merge logic out of resolveUnion into
  sortByPriorityThenID + appendNewAutoGroups + mergeMinExpiry. Same
  semantics; all 11 resolution_test.go cases still pass.

* management/server/integrations/entra_device/manager.go: split Enroll
  into 7 focused helpers (loadEnabledIntegration, consumeNonce +
  decodeNonceBytes, validateCertAndDeviceID, verifyWithGraph,
  resolveMappingForAccount, enrollPeer, issueBootstrapToken). Same
  externally-visible behaviour; the manager_test.go suite still passes
  end-to-end including the HTTP handler integration tests.

Hard-coded-credential findings (dev-test-harness only, flagged BLOCKER +
MAJOR by SonarCloud):

* tools/entra-test/docker-compose.yml: parametrise Postgres password via
  \$\{NB_TEST_PG_PASSWORD:-netbird\} so operators can override via a .env
  file. // NOSONAR annotations on the fixtures that legitimately have a
  default for local-dev use.

* tools/entra-test/seed-account/main.go: move the DSN default away from
  a single hard-coded string into an env-var-override helper
  (envOrDefault). Honours NB_TEST_PG_{USER,PASSWORD,DB} consistently with
  the compose file.

Pre-existing issues intentionally NOT addressed (not introduced by this
PR):

* client/internal/profilemanager/config.go file-length (751/750) \u2014
  config.go was ~730 lines before this PR; adding the 20-line
  EntraEnrollState struct crossed the threshold. Splitting config.go
  into multiple files is out of scope for this feature PR.

* management/server/http/handler.go NewAPIHandler complexity + 23
  parameters \u2014 function existed before, this PR adds one call
  (installEntraDeviceAuth) that doesn't change the signature.

Co-Authored-By: Oz <oz-agent@warp.dev>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (11)
management/server/integrations/entra_device/activity.go-9-20 (1)

9-20: ⚠️ Potential issue | 🟡 Minor

Add safeguards to prevent silent activity ID collisions from external dependencies.

activity.RegisterActivityMap uses maps.Copy, which silently overwrites duplicate keys. The IDs 200–209 are currently collision-free in this repo, but with the external dependency on github.com/netbirdio/management-integrations and no central allocation mechanism, future code could introduce collisions without any warning. This would silently corrupt activity logging for affected entries.

Recommend either:

  • Centrally reserve the 200–209 range in management/server/activity/codes.go alongside existing constants (0–112), or
  • Add a test in this package that asserts none of these IDs exist in the registry before init() runs, preventing runtime surprises.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/activity.go` around lines 9 - 20,
The new Entra device activity constants (PeerAddedWithEntraDevice,
EntraDeviceAuthCreated, EntraDeviceAuthUpdated, EntraDeviceAuthDeleted,
EntraDeviceAuthMappingCreated, EntraDeviceAuthMappingUpdated,
EntraDeviceAuthMappingDeleted, EntraDeviceAuthMappingRevoked,
GroupAddedToEntraDeviceMapping, GroupRemovedFromEntraDeviceMapping) risk silent
ID collisions because activity.RegisterActivityMap uses maps.Copy; either
reserve the 200–209 range in the central activity codes (add them to the central
constants in management/server/activity/codes.go alongside 0–112) or add a
pre-init test in this package that queries the activity registry (via the same
registry API used by activity.RegisterActivityMap) to assert none of these
numeric IDs already exist before init() runs, failing the test if any collision
is found so duplicates cannot be introduced silently.
tools/entra-test/docker-compose.yml-9-11 (1)

9-11: ⚠️ Potential issue | 🟡 Minor

Header comment contradicts the actual port layout.

Lines 9-11 claim :33073 is gRPC and :33081 is HTTP (admin API + /join/entra/*), but the service only exposes :33073 and the comment at lines 49-51 correctly notes that cmux multiplexes both protocols on that single port. The top-of-file banner will mislead anyone reading it before scrolling down.

📝 Proposed fix
 # The management server exposes:
-#   - :33073  gRPC (normal NetBird agent login flow)
-#   - :33081  HTTP, including the admin API and /join/entra/*
+#   - :33073  gRPC + HTTP (cmux-multiplexed): agent Login/Sync,
+#             admin API, and /join/entra/*
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/entra-test/docker-compose.yml` around lines 9 - 11, Update the
top-of-file header comment in the docker-compose.yml to match the actual port
layout: remove the separate :33081 HTTP entry and state that both gRPC and HTTP
(including the admin API and /join/entra/*) are multiplexed on :33073 via cmux,
so the header aligns with the later cmux comment.
management/server/integrations/entra_device/README.md-57-69 (1)

57-69: ⚠️ Potential issue | 🟡 Minor

Phase 1 status checklist is stale relative to this PR.

Per the PR objectives, several boxes that are still unchecked here actually ship in this same PR:

  • "Wired into the main router" — done via installEntraDeviceAuth in management/server/http/handler.go.
  • "Full integration with AccountManager.AddPeer" — done via AccountManager.EnrollEntraDevicePeer / AsEntraDevicePeerEnroller.
  • "Admin CRUD handlers" — done under management/server/http/handlers/entra_device_auth.

The proto field enrollmentBootstrapToken and dashboard/OpenAPI items are legitimately deferred follow-ups and should stay unchecked. Please refresh the list so first-time readers aren't misled about the current state.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/README.md` around lines 57 - 69,
Update the Phase 1 checklist in the README to reflect work shipped in this PR:
mark "Wired into the main router" as done and reference installEntraDeviceAuth
(management/server/http/handler.go), mark "Full integration with
AccountManager.AddPeer" as done and reference
AccountManager.EnrollEntraDevicePeer / AsEntraDevicePeerEnroller, and mark
"Admin CRUD handlers" as done and reference
management/server/http/handlers/entra_device_auth; leave "Proto field
enrollmentBootstrapToken" and dashboard/OpenAPI items unchecked as noted.
management/server/integrations/entra_device/asn1.go-5-9 (1)

5-9: ⚠️ Potential issue | 🟡 Minor

Comment claims "function variable" but implementation is a plain function — comment is misleading.

The doc says this is "wrapped behind a package-level function variable so unit tests can substitute it," but realASN1Unmarshal is a regular function declaration, not a variable. No tests actually override this function or asn1Unmarshal in the codebase.

Since the architecture already uses a two-layer wrapper (stdlib → realASN1Unmarshalasn1Unmarshal var in cert_validator.go) and tests do not mock either, the "so unit tests can substitute it" claim in the comment is inaccurate and should be removed to keep the comment honest.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/asn1.go` around lines 5 - 9, The
comment above realASN1Unmarshal is misleading: it claims the function is
"wrapped behind a package-level function variable so unit tests can substitute
it" but realASN1Unmarshal is a plain function and tests do not override it or
asn1Unmarshal; update the comment to accurately describe the function (e.g.,
"simple helper that directly calls encoding/asn1.Unmarshal") and remove the
incorrect clause about being a package-level function variable or substitutable
in tests; leave the realASN1Unmarshal function and the asn1Unmarshal variable
(in cert_validator.go) unchanged.
tools/entra-test/TESTING.md-187-199 (1)

187-199: ⚠️ Potential issue | 🟡 Minor

"What this does NOT test" section is stale re: the client CLI.

Per the PR objectives, client Phase 2 — the netbird entra-enroll CLI with PFX CertProvider and enroller logic — is included in this PR (client/cmd/entra_enroll.go). Only Windows cert-store / TPM-backed CNG signing is explicitly deferred. The current wording ("The real NetBird Windows client is not yet wired to the /join/entra/* path. That's Phase 2 … and has not been implemented.") reads as if no client exists yet.

Consider narrowing the scope to what is actually deferred (Windows cert-store integration) rather than the whole client path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/entra-test/TESTING.md` around lines 187 - 199, Update the "What this
does NOT test" paragraph to avoid implying the entire client is unimplemented:
state that the client-side CLI and enroller (client/cmd/entra_enroll.go) and the
PFX CertProvider are present in this PR, but Windows cert-store / TPM-backed CNG
signing (integration with Cert:\LocalMachine\My) is deferred to Phase 2; keep
instructions about server-side verification and the current test harness but
narrow the scope of the deferred work to "Windows cert-store / TPM-backed
signing" only.
docs/ENTRA_DEVICE_AUTH.md-272-272 (1)

272-272: ⚠️ Potential issue | 🟡 Minor

Fix markdownlint hits (MD040, MD058, MD055/MD056).

Per static-analysis hints:

  • Line 272: the ASCII architecture diagram is inside a fenced block without a language. Use text (or asciidoc) to satisfy MD040.
  • Line 327 and the second table around Line 350: tables must be surrounded by blank lines (MD058). The MD055/MD056 warnings around Line 335 are a downstream effect of the missing blank line between the last table row and the "Observations from the runs:" paragraph — the linter folds the paragraph into the table.
📝 Proposed doc fixes
 ## Architecture
 
-```
+```text
                       ┌─────────────────────────────────────┐
                       │  Device (Entra-joined)              │
 Docker test harness + the synthetic `enroll-tester` tool. The following
 scenarios were all executed end-to-end through Microsoft Graph:
+
 | Scenario                                     | Configuration                               | Input                       | Expected result          | Actual   |
 |----------------------------------------------|---------------------------------------------|-----------------------------|--------------------------|----------|
 | Compliance on, non-compliant device          | `require_intune_compliant: true`            | non-compliant device id     | `403 device_not_compliant` | ✅     |
+
 Observations from the runs:

Apply the same "blank line before the table" fix to the "Current implementation status" table that starts at Line 350.

Also applies to: 323-348

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/ENTRA_DEVICE_AUTH.md` at line 272, The fenced ASCII diagram block needs
a language label and the tables need surrounding blank lines: change the diagram
fence from ``` to ```text (or ```asciidoc) where the ASCII architecture diagram
appears and insert a blank line before and after each markdown table (including
the table that summarizes runs and the "Current implementation status" table)
and ensure there is a blank line between the last table row and the following
paragraph ("Observations from the runs:") so the linter (MD040, MD058,
MD055/MD056) recognizes the code block and tables correctly.
management/server/integrations/entra_device/store.go-70-76 (1)

70-76: ⚠️ Potential issue | 🟡 Minor

SaveEntraDeviceAuth leaks stale byTenant entries on tenant change.

If an integration is saved with tenant A, then re-saved with tenant B, m.byTenant["A"] still points at the (now tenant-B) auth. A later GetEntraDeviceAuthByTenant("A") returns a record whose TenantID is B. Clear the old index entry before writing the new one (read auths[auth.AccountID] first, remove its previous byTenant key if it differs). Low-severity since this store is dev-only, but worth a one-liner fix.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/store.go` around lines 70 - 76,
SaveEntraDeviceAuth currently leaves stale entries in m.byTenant when an auth's
TenantID changes; update SaveEntraDeviceAuth to first load the previous auth
from m.auths using auth.AccountID, and if that previous auth exists and its
TenantID is non-empty and different from auth.TenantID, delete the old key from
m.byTenant; then store/update m.auths[auth.AccountID] and, if auth.TenantID !=
"", set m.byTenant[auth.TenantID] = auth. Use the existing function
SaveEntraDeviceAuth and the maps m.auths and m.byTenant when implementing this
fix.
management/server/integrations/entra_device/cert_validator.go-147-154 (1)

147-154: ⚠️ Potential issue | 🟡 Minor

ECDSA curve fallback to P-256 silently coerces the key.

x509.ParseCertificate populates Curve for all curves Go supports. If pub.Curve == nil here, the cert either parsed incorrectly or carries a curve the Go stdlib doesn't recognize — mutating the caller's *ecdsa.PublicKey and assuming P-256 can (a) mutate a shared leaf.PublicKey visible to later callers, and (b) produce a "successful" verify against the wrong curve if the key happens to share coordinates. Safer to reject:

🛡️ Proposed fix
-		if pub.Curve == nil {
-			// Fall back to P-256, which is what Windows CNG + most Intune
-			// SCEP profiles emit.
-			pub.Curve = elliptic.P256()
-		}
+		if pub.Curve == nil {
+			return fmt.Errorf("ecdsa public key has no curve set")
+		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/cert_validator.go` around lines
147 - 154, Don't silently coerce pub.Curve to P-256: instead of mutating
pub.Curve = elliptic.P256() when pub.Curve == nil, treat a nil curve as an error
condition and return a descriptive error (e.g., "unsupported or unknown ECDSA
curve") from the verification code path that uses ecdsa.Verify; update the block
around pub.Curve, elliptic.P256(), and ecdsa.Verify to bail out on nil curve and
only call ecdsa.Verify when pub.Curve is non-nil so you don't mutate the
caller's *ecdsa.PublicKey or risk verifying with the wrong curve.
management/server/http/handlers/entra_device_auth/handler.go-192-197 (1)

192-197: ⚠️ Potential issue | 🟡 Minor

Conflates storage error with "no integration" and returns 409 for 5xx.

If GetEntraDeviceAuth returns a non-nil error (DB outage, etc.), the client gets 409 Conflict — configure the Entra device auth integration before adding mappings, which is both misleading to the caller and confusing for on-call. Separate the branches:

🛡️ Proposed fix
-	integ, err := h.Store.GetEntraDeviceAuth(r.Context(), accountID)
-	if err != nil || integ == nil {
-		httpJSON(w, http.StatusConflict, apiError{"no_integration",
-			"configure the Entra device auth integration before adding mappings"})
-		return
-	}
+	integ, err := h.Store.GetEntraDeviceAuth(r.Context(), accountID)
+	if err != nil {
+		httpJSON(w, http.StatusInternalServerError, apiError{"internal_error", err.Error()})
+		return
+	}
+	if integ == nil {
+		httpJSON(w, http.StatusConflict, apiError{"no_integration",
+			"configure the Entra device auth integration before adding mappings"})
+		return
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/http/handlers/entra_device_auth/handler.go` around lines
192 - 197, The current handler conflates storage errors and a missing
integration by treating any non-nil err the same as integ == nil; update the
GetEntraDeviceAuth branch in handler.go (the call to h.Store.GetEntraDeviceAuth)
to first check for err and, if err != nil, return an
http.StatusInternalServerError via httpJSON with an appropriate apiError
indicating a storage/backend error, and only if err == nil && integ == nil
return the existing http.StatusConflict response telling the client to configure
the Entra device auth integration; ensure you reference the same apiError shape
and use the same httpJSON helper so calling code/clients remain consistent.
management/server/http/handlers/entra_device_auth/wiring.go-101-110 (1)

101-110: ⚠️ Potential issue | 🟡 Minor

Permissive fallback when Permissions == nil should at least log loudly.

Install() has no guard against w.Permissions == nil, and the fallback in buildPermissionChecker silently returns (true, nil) for any operation. This is called out in the doc comment but there's no runtime breadcrumb — a misconfigured deployment would expose admin CRUD to every authenticated user with no trace. Consider either rejecting nil in Install (treat it the same as missing PeerEnroller) or emitting a loud warning so operators notice.

🛡️ Proposed fix
 func buildPermissionChecker(pm permissions.Manager) PermissionChecker {
 	if pm == nil {
+		log.Warn("entra_device_auth: no permissions.Manager wired; admin CRUD is permissive. Do not use in production.")
 		return func(context.Context, string, string, string) (bool, error) {
 			return true, nil
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/http/handlers/entra_device_auth/wiring.go` around lines 101
- 110, buildPermissionChecker currently returns a silent permissive checker when
permissions.Manager is nil which can expose admin actions; change Install and
wiring so nil permissions are either rejected or loudly logged: in Install
(where w.Permissions is used) validate that w.Permissions != nil and return an
error (treat like missing PeerEnroller) or if you must allow nil, modify
buildPermissionChecker to wrap the permissive fallback with a
processLogger.warn/error (include traceable context) so any deployment missing
permissions.Manager produces a clear runtime breadcrumb; refer to
buildPermissionChecker, PermissionChecker, permissions.Manager, Install and
w.Permissions when making the change.
management/server/integrations/entra_device/manager.go-109-126 (1)

109-126: ⚠️ Potential issue | 🟡 Minor

Inconsistent nonce handling: trim applied to Consume, not to base64 decode.

At Line 111 you pass strings.TrimSpace(req.Nonce) to Consume, but at Line 118 you call base64.RawURLEncoding.DecodeString(req.Nonce) on the untrimmed input. A client that sends a nonce with stray whitespace (or a trailing newline) will have the nonce successfully consumed (single-use) and then fail base64 decoding, burning an otherwise valid nonce with no way to retry.

Additionally, consuming the nonce before decoding means any malformed base64 permanently burns the nonce — consider decoding first, then consuming only after the payload is confirmed parseable.

🛠️ Proposed fix
-	// 2. Consume nonce (single-use; this also ensures the nonce was issued by
-	//    this server instance and has not yet expired).
-	ok, err := m.NonceStore.Consume(strings.TrimSpace(req.Nonce))
-	if err != nil {
-		return nil, NewError(CodeInternal, "nonce store error", err)
-	}
-	if !ok {
-		return nil, NewError(CodeInvalidNonce, "nonce is unknown, already consumed, or expired", nil)
-	}
-	nonceBytes, err := base64.RawURLEncoding.DecodeString(req.Nonce)
-	if err != nil {
-		// Clients sometimes use the padded URL or Std alphabets. Fall back.
-		if b, e2 := base64.StdEncoding.DecodeString(req.Nonce); e2 == nil {
-			nonceBytes = b
-		} else {
-			return nil, NewError(CodeInvalidNonce, "nonce is not base64", err)
-		}
-	}
+	// 2. Decode first, then consume (single-use). This avoids burning a
+	//    nonce on malformed input.
+	nonce := strings.TrimSpace(req.Nonce)
+	nonceBytes, err := base64.RawURLEncoding.DecodeString(nonce)
+	if err != nil {
+		// Clients sometimes use the padded URL or Std alphabets. Fall back.
+		if b, e2 := base64.StdEncoding.DecodeString(nonce); e2 == nil {
+			nonceBytes = b
+		} else {
+			return nil, NewError(CodeInvalidNonce, "nonce is not base64", err)
+		}
+	}
+	ok, err := m.NonceStore.Consume(nonce)
+	if err != nil {
+		return nil, NewError(CodeInternal, "nonce store error", err)
+	}
+	if !ok {
+		return nil, NewError(CodeInvalidNonce, "nonce is unknown, already consumed, or expired", nil)
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/manager.go` around lines 109 -
126, The code currently calls m.NonceStore.Consume(strings.TrimSpace(req.Nonce))
before attempting base64.RawURLEncoding.DecodeString(req.Nonce), which can burn
a valid nonce if the raw value contains whitespace or is non-base64; change the
flow to first trim the nonce into a local variable (e.g., nonce :=
strings.TrimSpace(req.Nonce)), attempt base64 decoding using nonce (try
RawURLEncoding then fall back to StdEncoding), and only call
m.NonceStore.Consume(nonce) after successful decode/parsing; update all error
returns to reference the trimmed nonce decode errors and ensure Consume is
passed the same trimmed nonce value.
🧹 Nitpick comments (22)
tools/entra-test/.gitignore (1)

2-2: Consider ignoring all PFX artifacts, not only device.pfx.

This reduces the chance of accidentally committing sensitive test certs when filenames vary.

Suggested update
 netbird.exe
-device.pfx
+*.pfx
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/entra-test/.gitignore` at line 2, Replace the specific device.pfx
ignore with a broader pattern to catch all PFX certificate artifacts (e.g., add
a "*.pfx" entry) so any test cert filenames are ignored; update the .gitignore
entry that currently lists "device.pfx" to the wildcard pattern to ensure all
.pfx files are excluded from commits.
idp/dex/sqlite_nocgo.go (1)

16-20: Minor: errors.New would be slightly more idiomatic here.

The error string has no format verbs, so errors.New(...) (or a package-level var errSQLiteNoCGO = errors.New(...)) would avoid the fmt.Errorf overhead/lint noise. Non-blocking.

♻️ Optional refactor
 import (
-	"fmt"
+	"errors"
 	"log/slog"

 	"github.com/dexidp/dex/storage"
 )

 // openSQLite is a no-CGO stub. Dex's sqlite3 backend requires CGO; when this
 // binary is built with CGO_ENABLED=0 we reject sqlite storage with a clear
 // message pointing operators at an alternative (Postgres) or a CGO build.
 func openSQLite(_ string, _ *slog.Logger) (storage.Storage, error) {
-	return nil, fmt.Errorf(
-		"sqlite3 storage is not available: this binary was built with CGO_ENABLED=0; " +
-			"rebuild with CGO_ENABLED=1 or switch to a postgres storage backend")
+	return nil, errors.New(
+		"sqlite3 storage is not available: this binary was built with CGO_ENABLED=0; " +
+			"rebuild with CGO_ENABLED=1 or switch to a postgres storage backend")
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@idp/dex/sqlite_nocgo.go` around lines 16 - 20, The openSQLite function
returns a static error via fmt.Errorf though there are no format verbs; replace
fmt.Errorf(...) with errors.New(...) to be more idiomatic (or define a
package-level var like errSQLiteNoCGO = errors.New("...") and return that) and
update openSQLite to return that errors.New/var instead of fmt.Errorf to remove
unnecessary formatting overhead and lint noise.
tools/entra-test/enroll-tester/tls.go (1)

8-10: Pin MinVersion on the insecure TLS config.

Even in a dev-only helper, setting MinVersion: tls.VersionTLS12 (or tls.VersionTLS13) is essentially free and addresses the missing-ssl-minversion-go lint without changing behavior.

🛡️ Proposed tweak
 func insecureTLSConfig() *tls.Config {
-	return &tls.Config{InsecureSkipVerify: true} //nolint:gosec // opt-in dev flag
+	return &tls.Config{
+		InsecureSkipVerify: true, //nolint:gosec // opt-in dev flag (--insecure)
+		MinVersion:         tls.VersionTLS12,
+	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/entra-test/enroll-tester/tls.go` around lines 8 - 10, The
insecureTLSConfig function currently returns &tls.Config{InsecureSkipVerify:
true} and should also pin the minimum TLS version to avoid the
missing-ssl-minversion-go lint; update insecureTLSConfig to set MinVersion:
tls.VersionTLS12 (or tls.VersionTLS13) on the tls.Config while keeping
InsecureSkipVerify: true so behavior is unchanged but the lint is satisfied.
client/internal/enroll/entradevice/provider.go (2)

118-123: Prefer ecdsa.SignASN1 over the manual asn1.Marshal({R,S}).

ecdsa.SignASN1 (Go 1.15+) produces exactly the ASN.1-DER {R, S} shape you're constructing here and avoids the anonymous struct + math/big import. Functionally equivalent, just the idiomatic form for Go 1.25.

♻️ Proposed simplification
 	case *ecdsa.PrivateKey:
-		r, s, err := ecdsa.Sign(rand.Reader, k, digest[:])
-		if err != nil {
-			return nil, fmt.Errorf("ecdsa sign: %w", err)
-		}
-		return asn1.Marshal(struct{ R, S *big.Int }{r, s})
+		return ecdsa.SignASN1(rand.Reader, k, digest[:])

You can then drop the encoding/asn1 and math/big imports.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/enroll/entradevice/provider.go` around lines 118 - 123,
Replace the manual two-step ECDSA signing + ASN.1 marshaling in the switch
branch that handles *ecdsa.PrivateKey: instead of calling ecdsa.Sign to get r,s
and then asn1.Marshal(struct{R,S *big.Int}{r,s}), call
ecdsa.SignASN1(rand.Reader, k, digest[:]) to produce the DER-encoded {R,S}
directly; remove the now-unused encoding/asn1 and math/big imports after this
change. Ensure you keep the existing error handling around the sign call and
return the byte slice from ecdsa.SignASN1.

124-129: Fallback comment overstates RSA-PSS guarantee.

Passing crypto.SHA256 (a crypto.Hash) as opts to crypto.Signer.Sign means:

  • For an RSA opaque signer, the default is PKCS1v15, not PSS (PSS requires *rsa.PSSOptions).
  • For ECDSA opaque signers, output is the standard ASN.1-DER encoding.

So the comment "they MUST handle RSA-PSS" is misleading — this fallback will in practice produce PKCS1v15 for RSA keys. Since line 116 states the server accepts PKCS1v15 too, the behavior is fine, but please update the comment so a future contributor doesn't assume PSS here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/enroll/entradevice/provider.go` around lines 124 - 129, The
fallback comment incorrectly claims opaque signers "MUST handle RSA-PSS"; update
the comment near the call to p.signer.Sign(rand.Reader, digest[:],
crypto.SHA256) to state that passing crypto.SHA256 yields PKCS1v15 for RSA
opaque signers (and ASN.1-DER for ECDSA), not RSA-PSS, and note that the server
also accepts PKCS1v15 per the earlier comment so behavior is intentional; keep
reference to p.signer.Sign and the use of crypto.SHA256 so maintainers
understand the actual signing defaults.
tools/entra-test/config/management.json (1)

37-37: Nit: label the test-only encryption key.

This is obviously a dev fixture, but since it looks like a real base64 key (and scanners will flag it), consider adding a comment in the surrounding docs/README or renaming the value (e.g., "<test-only-not-a-secret>" placeholder regenerated on first run) to avoid anyone grepping this out of git history and re-using it. Not a blocker.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/entra-test/config/management.json` at line 37, Update the
DataStoreEncryptionKey entry so it’s clearly marked as a test-only placeholder:
replace the existing base64-looking value for the "DataStoreEncryptionKey" field
with an explicit sentinel like "<test-only-not-a-secret>" (or a value generated
at first-run) and add a short note in the surrounding docs/README explaining it
is a dev fixture and must not be used in production; ensure the string
"DataStoreEncryptionKey" is what you change so scanners/grep will match the
placeholder instead of the real-looking key.
management/Dockerfile.entra-test (2)

28-35: Install build deps before COPY . . to keep the layer cache warm.

apt-get install gcc libc6-dev currently lives in the same RUN as go build, which is downstream of COPY . .. Any source edit invalidates that whole layer and re-runs apt-get update && apt-get install. Moving the OS package install into its own RUN before COPY . . (right after the COPY go.mod go.sum step) makes it cache alongside the module download.

♻️ Proposed ordering
 WORKDIR /src

+# Build deps for CGO (Dex SQLite). Kept in a stable layer to preserve cache
+# across source edits.
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends gcc libc6-dev && \
+    rm -rf /var/lib/apt/lists/*
+
 # Prime the module cache separately so source edits don't invalidate it.
 COPY go.mod go.sum ./
 RUN --mount=type=cache,target=/go/pkg/mod \
     --mount=type=cache,target=/root/.cache/go-build \
     go mod download

 COPY . .

 RUN --mount=type=cache,target=/go/pkg/mod \
     --mount=type=cache,target=/root/.cache/go-build \
-    apt-get update && \
-    apt-get install -y --no-install-recommends gcc libc6-dev && \
-    rm -rf /var/lib/apt/lists/* && \
     go build -trimpath -ldflags "-s -w" -o /out/netbird-mgmt ./management
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/Dockerfile.entra-test` around lines 28 - 35, Move the OS package
installation (apt-get update && apt-get install -y --no-install-recommends gcc
libc6-dev) into its own RUN layer placed before the full source COPY . .
(ideally immediately after the existing COPY go.mod go.sum step) so the package
install is cached independently of source changes; then keep the go build step
(the RUN using --mount=type=cache targets and go build -trimpath -ldflags "-s
-w" -o /out/netbird-mgmt ./management) in its original RUN after COPY . . to
ensure only the build runs when sources change.

44-55: Run the runtime container as non-root.

Trivy flags the image as root-user. Even for a test image, adding a non-root USER is a one-liner and avoids downstream surprises when the same Dockerfile gets adapted/copied.

🔒 Proposed change
 COPY --from=builder /out/netbird-mgmt /usr/local/bin/netbird-mgmt
+
+RUN useradd --system --uid 10001 --user-group --no-create-home netbird
+USER 10001:10001
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/Dockerfile.entra-test` around lines 44 - 55, The image is running
as root; add a non-root user and switch to it before ENTRYPOINT to avoid running
/usr/local/bin/netbird-mgmt as root. Create or use a dedicated unprivileged user
(e.g., "netbird"), ensure the binary and any needed directories are
owned/readable by that user (adjust ownership after COPY with chown in the
Dockerfile), then add a USER netbird line just before ENTRYPOINT so HEALTHCHECK
and ENTRYPOINT run unprivileged; keep EXPOSE and HEALTHCHECK as-is.
management/server/integrations/entra_device/nonce_store_test.go (1)

60-91: Concurrent test can report a misleading nil error.

At lines 79-82 the worker sends on errs whenever err != nil || !ok, but when ok == false with err == nil it still enqueues a nil. The reader then prints unexpected error ... %v<nil>, obscuring the real failure (first-consume returning ok=false). Report the two conditions distinctly:

♻️ Proposed fix
-				ok, err := s.Consume(n)
-				if err != nil || !ok {
-					errs <- err
-					return
-				}
+				ok, err := s.Consume(n)
+				if err != nil {
+					errs <- err
+					return
+				}
+				if !ok {
+					errs <- fmt.Errorf("consume returned ok=false for freshly issued nonce")
+					return
+				}

(Requires adding fmt to the imports.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/nonce_store_test.go` around lines
60 - 91, The concurrent test TestNonceStore_ConcurrentIssueAndConsume can
enqueue a nil into errs when s.Consume returns ok==false but err==nil; change
the worker goroutine so it reports the two failure modes distinctly: when err !=
nil send that error (as now), and when ok == false send a non-nil error
describing the unexpected consume result (e.g., using fmt.Errorf with context
including the nonce); reference the goroutine block that calls s.Issue() and
s.Consume(), ensure the errs channel only receives non-nil errors, and add the
fmt import to the test file.
management/server/http/handler.go (2)

218-229: "Not configured" path will log a WARN on every non-Entra deployment.

Both type assertions fail in the common case where this integration isn't used, producing Entra device auth: ... skipping install warnings on every server start. Since the design is explicitly data-gated (return 404 if no integration), a missing interface is the expected shape for most deployments, not a warning condition. Consider demoting to Debugf, or only warning when at least one EntraDeviceAuth row exists in the DB.

♻️ Proposed tweak
 	dbProvider, ok := accountManager.GetStore().(entra_device_auth.DBProvider)
 	if !ok {
-		log.WithContext(ctx).Warnf("Entra device auth: store does not expose *gorm.DB; skipping install")
+		log.WithContext(ctx).Debugf("Entra device auth: store does not expose *gorm.DB; skipping install")
 		return
 	}
 	enrollerProvider, ok := accountManager.(interface {
 		AsEntraDevicePeerEnroller() entra_device.PeerEnroller
 	})
 	if !ok {
-		log.WithContext(ctx).Warnf("Entra device auth: account manager does not implement AsEntraDevicePeerEnroller; skipping install")
+		log.WithContext(ctx).Debugf("Entra device auth: account manager does not implement AsEntraDevicePeerEnroller; skipping install")
 		return
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/http/handler.go` around lines 218 - 229, The current WARN
logs in the non-configured path are noisy because the type assertions on
accountManager.GetStore() to entra_device_auth.DBProvider and on accountManager
to the AsEntraDevicePeerEnroller interface are expected to fail for most
deployments; change these Warnf calls to debug-level logs
(log.WithContext(ctx).Debugf) OR gate the Warnf behind a check for existing
EntraDeviceAuth rows (query via the store/DBProvider for any EntraDeviceAuth
entries) so you only warn when the integration is actually present but unusable;
update the two locations referencing accountManager.GetStore(),
entra_device_auth.DBProvider, accountManager.(AsEntraDevicePeerEnroller), and
the EntraDeviceAuth existence check accordingly.

94-102: Remove the redundant /join/entra/* bypass entries.

The bypass.ShouldBypass() function is called exclusively by authMiddleware.Handler (line 76), which is attached only to the /api subrouter (line 155). Since entra_device_auth.Install() registers the Entra device-side endpoints on rootRouter rather than the /api subrouter, requests to /join/entra/challenge and /join/entra/enroll never pass through authMiddleware and therefore never consult the bypass list. These two entries are unreachable and should be removed to avoid misleading future readers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/http/handler.go` around lines 94 - 102, Remove the two
unreachable bypass registrations: delete the calls to
bypass.AddBypassPath("/join/entra/challenge") and
bypass.AddBypassPath("/join/entra/enroll") (and their accompanying comment)
because bypass.ShouldBypass() is only used by authMiddleware.Handler which is
mounted on the /api subrouter, while entra_device_auth.Install() registers those
endpoints on rootRouter so they never consult the bypass list; update any nearby
comment to avoid implying those paths need bypass entries.
client/internal/profilemanager/config.go (1)

183-195: Flag for drift risk between the two EntraEnrollState definitions.

The struct here is a verbatim copy of client/internal/enroll/entradevice/state.go. The import-cycle justification is reasonable, but any future field addition/rename must be applied in both files or JSON round-trips through the profile will silently drop fields. A small shared types-only subpackage (e.g. client/internal/enroll/entradevice/enrollstate, no cross-deps) would collapse the duplication, or at minimum add a compile-time sanity check/test that asserts field-name parity.

Not a blocker given the comment is explicit; leaving as an optional cleanup.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/profilemanager/config.go` around lines 183 - 195, The
duplicate EntraEnrollState risks drifting from
client/internal/enroll/entradevice's definition; add a compile-time/test-time
parity check rather than changing package layout: create a unit test (e.g.
profilemanager_parity_test) that imports client/internal/enroll/entradevice and
compares reflect.TypeOf(profilemanager.EntraEnrollState) against
reflect.TypeOf(entradevice.EntraEnrollState), iterating fields and asserting
identical JSON tag names and field counts, failing the test if any mismatch is
found so future edits must keep both structs in sync (or optionally replace the
duplicate by moving the struct into a small shared types-only package and
updating references to that type).
client/cmd/entra_enroll.go (3)

245-261: Use path/filepath.Dir instead of the hand-rolled filepathDir.

The custom helper misses Windows volume/UNC handling and trailing-separator cleanup that filepath.Dir already gets right. Since configPath is a filesystem path here, the stdlib version is a drop-in replacement.

♻️ Proposed fix
-	if err := os.MkdirAll(filepathDir(configPath), 0o755); err != nil {
+	if err := os.MkdirAll(filepath.Dir(configPath), 0o755); err != nil {
 		return nil, fmt.Errorf("mkdir %s: %w", configPath, err)
 	}

And delete the helper:

-func filepathDir(p string) string {
-	for i := len(p) - 1; i >= 0; i-- {
-		if p[i] == '\\' || p[i] == '/' {
-			return p[:i]
-		}
-	}
-	return "."
-}

Add "path/filepath" to the import block.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/cmd/entra_enroll.go` around lines 245 - 261, Replace the custom
filepathDir helper with the standard library implementation: call
filepath.Dir(configPath) where filepathDir(configPath) is used (e.g., the
os.MkdirAll call before writing config) and remove the filepathDir function
entirely; also add "path/filepath" to the import block so the code compiles.
Ensure the rest of the function (returning cfg after util.WriteJson) remains
unchanged.

134-176: Minor: the 60s enrolment timeout also bounds util.WriteJson.

ctx is threaded into util.WriteJson at line 174 after en.Enrol has already consumed some of the 60-second budget. If enrolment is slow, config persistence can fail under ctx cancellation even though enrolment succeeded — and we'd lose the EntraEnrollState but the server would consider the device enrolled. Consider a fresh context.Background() (or a new short timeout) for the persistence step so a successful enrolment is never thrown away by the timer.

-		if err := util.WriteJson(ctx, configPath, cfg); err != nil {
+		if err := util.WriteJson(context.Background(), configPath, cfg); err != nil {
 			return fmt.Errorf("persist profile config: %w", err)
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/cmd/entra_enroll.go` around lines 134 - 176, The current ctx created
by context.WithTimeout(…, 60*time.Second) is passed to en.Enrol and later to
util.WriteJson, risking WriteJson failing if the enrolment consumed the timeout;
create a fresh context for persistence (e.g. context.Background() or a short
separate timeout via context.WithTimeout) and use that new context when calling
util.WriteJson (and any persistence-related work) so successful en.Enrol results
(profilemanager.EntraEnrollState) are always saved even if the original
enrolment timeout expires.

147-155: Use errors.As to survive error wrapping.

The type assertion silently falls through if the enroller ever wraps the server error (e.g. fmt.Errorf("enrol: %w", err)), so the user stops seeing the stable code / HTTP status that is the whole point of this branch. Defensive against future refactors of the enroller.

♻️ Proposed fix
 		state, err := en.Enrol(ctx)
 		if err != nil {
-			// Surface structured server errors with their stable code.
-			if structured, ok := err.(*entradevice.Error); ok {
+			// Surface structured server errors with their stable code.
+			var structured *entradevice.Error
+			if errors.As(err, &structured) {
 				cmd.PrintErrf("Enrolment rejected: %s (HTTP %d)\n  %s\n",
 					structured.Code, structured.HTTPStatus, structured.Message)
 				return fmt.Errorf("enrolment failed: %s", structured.Code)
 			}
 			return fmt.Errorf("enrolment failed: %w", err)
 		}

(Also add "errors" to the import block.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/cmd/entra_enroll.go` around lines 147 - 155, The error handling block
currently uses a concrete type assertion against *entradevice.Error which will
fail if the enroller wraps the server error; change it to use errors.As to
unwrap and detect entradevice.Error (ensure "errors" is added to imports), then
log the structured fields via cmd.PrintErrf (as done now) and return the stable
code as before; specifically replace the type assertion on err with an errors.As
check that assigns to a variable of type *entradevice.Error to preserve behavior
when errors are wrapped.
management/server/integrations/entra_device/store_test.go (1)

65-70: Guard the slice length before indexing.

If a regression causes ListEntraDeviceMappings to return a different count, this test panics with an opaque "index out of range" rather than failing with a useful assertion.

♻️ Proposed fix
 	all, err := s.ListEntraDeviceMappings(ctx, "acct-1")
 	require.NoError(t, err)
+	require.Len(t, all, 2)
 	assert.ElementsMatch(t,
 		[]string{m1.ID, m2.ID},
 		[]string{all[0].ID, all[1].ID},
 	)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/store_test.go` around lines 65 -
70, The test indexes into the returned slice "all" without checking its length,
which can cause an index-out-of-range panic; update the test around the
ListEntraDeviceMappings call to first assert/require the expected length (e.g.,
require.Len/ assert.Len on "all" for 2) or otherwise guard that all has at least
two elements, then compare the IDs (m1.ID, m2.ID) against the collected IDs from
"all" (or use assert.ElementsMatch on the extracted IDs) so the failure is a
clear test assertion rather than a panic.
client/internal/enroll/entradevice/enroller_test.go (1)

11-11: Drop the stray fmt import and var _ = fmt.Sprintf sink.

fmt isn't referenced anywhere else in this test; the comment about "Go versions without error.As shortcut" is inaccurate (errors.As has no relation to fmt). Just remove the import and the sink.

♻️ Proposed cleanup
-	"fmt"
 	"math/big"
 // Compile-time assertions.
 var _ CertProvider = (*PFXProvider)(nil)
-
-// Silence unused-import warning on Go versions without error.As shortcut.
-var _ = fmt.Sprintf

Also applies to: 250-250

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/enroll/entradevice/enroller_test.go` at line 11, Remove the
unused fmt import and the dummy sink var _ = fmt.Sprintf from enroller_test.go
(and the duplicate occurrence at the other location), delete or adjust the
misleading comment about "Go versions without error.As shortcut" since errors.As
is unrelated to fmt, and run `go vet`/`go test` to ensure there are no other
unused-import artifacts; specifically update/remove the import line referencing
"fmt" and the variable named using fmt.Sprintf so the file no longer imports or
references fmt.
management/server/integrations/entra_device/nonce_store.go (1)

74-98: Consider replacing the O(n) constant-time scan with a direct map lookup.

Given nonces are 32 random bytes (~256 bits of entropy) URL-safe base64 encoded, timing attacks to recover a nonce are not a practical concern. The current loop is O(n) per Consume, and the break on match at Line 86 partially defeats the constant-time property anyway (iteration order of Go maps is randomized, so the early exit leaks little, but the "constant-time across entries" claim in the comment is overstated).

A plain map lookup is O(1), simpler to reason about, and secure for this input distribution.

♻️ Suggested simplification
 // Consume implements NonceStore using a constant-time equality check.
 func (s *InMemoryNonceStore) Consume(nonce string) (bool, error) {
 	s.mu.Lock()
 	defer s.mu.Unlock()
-
-	now := time.Now().UTC()
-	// Scan with constant-time comparison to avoid leaking which nonces exist
-	// via timing. The overhead is negligible given the store's small size.
-	var found string
-	for key := range s.entries {
-		if subtle.ConstantTimeCompare([]byte(key), []byte(nonce)) == 1 {
-			found = key
-			break
-		}
-	}
-	if found == "" {
-		return false, nil
-	}
-	e := s.entries[found]
-	delete(s.entries, found)
-	if now.After(e.expiresAt) {
-		return false, nil
-	}
-	return true, nil
+	e, ok := s.entries[nonce]
+	if !ok {
+		return false, nil
+	}
+	delete(s.entries, nonce)
+	if time.Now().UTC().After(e.expiresAt) {
+		return false, nil
+	}
+	return true, nil
 }

The crypto/subtle import would also become unused after this change.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/nonce_store.go` around lines 74 -
98, Replace the O(n) constant-time scan in InMemoryNonceStore.Consume with a
direct map lookup: lock s.mu, check s.entries[nonce] via normal map access to
get the entry and presence flag, delete the key if present, then check expiresAt
(e.expiresAt) and return accordingly; remove the crypto/subtle usage/import
since the constant-time loop is no longer needed. Ensure behavior (lock/unlock
via s.mu, deletion of the found entry, and returning false for missing or
expired nonces) remains unchanged.
management/server/integrations/entra_device/graph_client.go (1)

78-123: bearer() holds the mutex across the HTTP token call.

c.mu.Lock() at Line 79 is held through the full HTTPClient.Do (up to 15s default timeout). All concurrent enrollments serialize behind a single token fetch on startup and on every renewal. For the expected load this is probably fine, but consider a double-checked-lock pattern (check under lock → release → fetch → re-acquire to store) if enrollment concurrency is ever a concern.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/graph_client.go` around lines 78
- 123, The bearer() method currently holds c.mu for the entire HTTP call,
blocking concurrent callers; change it to a double-checked-lock: inside bearer()
check token/tokenExp under c.mu and if valid return, otherwise release the lock,
perform the HTTP request using c.HTTPClient.Do(ctx, ...) to fetch a new token,
then re-acquire c.mu and check token/tokenExp again (another goroutine may have
refreshed it) before writing c.token and c.tokenExp; reference the bearer
function, c.mu, c.token, c.tokenExp, and c.HTTPClient.Do when making the changes
and ensure errors and response parsing behave the same.
management/server/http/handlers/entra_device_auth/handler.go (1)

111-132: applyDTOToAuth mutates the stored pointer before validation.

When existing != nil, auth is the exact pointer returned by the store. applyDTOToAuth writes all fields (TenantID/ClientID/FallbackAutoGroups/…) before the tenant_id && client_id check at Line 116. If validation fails, the mutations aren't persisted via SQLStore (which re-reads), but the MemoryStore returns the live map entry — subsequent reads see a half-applied DTO even though SaveEntraDeviceAuth was never called. Apply changes to a local copy and only commit when all validation passes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/http/handlers/entra_device_auth/handler.go` around lines
111 - 132, The handler currently calls applyDTOToAuth on the live pointer
returned from the store (auth == existing), mutating the stored object before
validation; change the logic to make a shallow copy of the existing
EntraDeviceAuth (or construct a new types.NewEntraDeviceAuth(accountID) when
nil), apply applyDTOToAuth to that copy, run the tenant_id/client_id validation,
and only if validation passes assign/replace the original pointer or call
SaveEntraDeviceAuth with the modified copy; ensure this prevents in-place writes
from affecting MemoryStore entries when SaveEntraDeviceAuth is not invoked and
keep references to functions/types applyDTOToAuth, existing, auth,
SaveEntraDeviceAuth, and types.NewEntraDeviceAuth to locate the relevant code.
management/server/integrations/entra_device/resolution.go (1)

177-181: Remove the no-op "defensive fallback" or make it meaningful.

if i == len(candidates)-1 && len(out.AutoGroups) == 0 { out.AutoGroups = []string{} } only changes a nil slice to an empty slice on the last iteration, which is almost never observable (and isn't inside union semantics). If you actually want stable JSON rendering, initialize out.AutoGroups = []string{} once up-front instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/resolution.go` around lines 177 -
181, The fallback block that sets out.AutoGroups = []string{} only on the last
loop iteration is a no-op; either remove that conditional entirely or initialize
out.AutoGroups once before iterating candidates so it's reliably an empty slice
rather than nil; locate the loop using candidates and the index i in
resolution.go and remove the if i == len(candidates)-1 && len(out.AutoGroups) ==
0 { ... } block or add out.AutoGroups = []string{} at the point where out is
constructed prior to the loop.
management/server/entra_device_enroll.go (1)

176-180: Drop the unused validatedPeer binding.

_ = validatedPeer // peer object returned to caller is not needed by Manager is just a discarded return; use _ in the destructuring directly.

♻️ Proposed refactor
-	validatedPeer, netMap, checks, _, err := am.networkMapController.GetValidatedPeerWithMap(ctx, false, input.AccountID, newPeer)
+	_, netMap, checks, _, err := am.networkMapController.GetValidatedPeerWithMap(ctx, false, input.AccountID, newPeer)
 	if err != nil {
 		return nil, fmt.Errorf("failed to build network map for entra peer: %w", err)
 	}
-	_ = validatedPeer // peer object returned to caller is not needed by Manager
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/entra_device_enroll.go` around lines 176 - 180, The
destructuring assigns validatedPeer but then discards it; change the call to use
the blank identifier directly (e.g., _, netMap, checks, _, err :=
am.networkMapController.GetValidatedPeerWithMap(ctx, false, input.AccountID,
newPeer)) and remove the subsequent discard line `_ = validatedPeer` to
eliminate the unused binding in entra_device_enroll.go (reference
GetValidatedPeerWithMap and the validatedPeer variable).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d94437e4-028c-4571-8bd6-08beae90f2fa

📥 Commits

Reviewing files that changed from the base of the PR and between f732b01 and e986187.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (48)
  • client/cmd/entra_enroll.go
  • client/cmd/root.go
  • client/internal/enroll/entradevice/enroller.go
  • client/internal/enroll/entradevice/enroller_test.go
  • client/internal/enroll/entradevice/provider.go
  • client/internal/enroll/entradevice/state.go
  • client/internal/profilemanager/config.go
  • docs/ENTRA_DEVICE_AUTH.md
  • go.mod
  • idp/dex/config.go
  • idp/dex/provider.go
  • idp/dex/sqlite_cgo.go
  • idp/dex/sqlite_nocgo.go
  • management/Dockerfile.entra-test
  • management/server/entra_device_enroll.go
  • management/server/http/handler.go
  • management/server/http/handlers/entra_device_auth/handler.go
  • management/server/http/handlers/entra_device_auth/wiring.go
  • management/server/http/handlers/entra_join/handler.go
  • management/server/http/handlers/entra_join/handler_test.go
  • management/server/integrations/entra_device/README.md
  • management/server/integrations/entra_device/activity.go
  • management/server/integrations/entra_device/asn1.go
  • management/server/integrations/entra_device/cert_validator.go
  • management/server/integrations/entra_device/cert_validator_test.go
  • management/server/integrations/entra_device/errors.go
  • management/server/integrations/entra_device/graph_client.go
  • management/server/integrations/entra_device/manager.go
  • management/server/integrations/entra_device/manager_test.go
  • management/server/integrations/entra_device/nonce_store.go
  • management/server/integrations/entra_device/nonce_store_test.go
  • management/server/integrations/entra_device/resolution.go
  • management/server/integrations/entra_device/resolution_test.go
  • management/server/integrations/entra_device/sql_store.go
  • management/server/integrations/entra_device/store.go
  • management/server/integrations/entra_device/store_test.go
  • management/server/integrations/entra_device/types.go
  • management/server/permissions/modules/module.go
  • management/server/types/entra_device_auth.go
  • tools/entra-test/.gitignore
  • tools/entra-test/TESTING.md
  • tools/entra-test/config/management.json
  • tools/entra-test/docker-compose.yml
  • tools/entra-test/enroll-tester/demo.go
  • tools/entra-test/enroll-tester/main.go
  • tools/entra-test/enroll-tester/tls.go
  • tools/entra-test/make-pfx/main.go
  • tools/entra-test/seed-account/main.go

Comment thread management/Dockerfile.entra-test Outdated
Comment thread management/server/http/handlers/entra_join/handler.go Outdated
Comment thread management/server/integrations/entra_device/cert_validator.go Outdated
Comment thread management/server/integrations/entra_device/graph_client.go
Comment on lines +18 to +27
type SQLStore struct {
DB *gorm.DB

// BootstrapTTL controls how long a bootstrap token remains valid.
BootstrapTTL time.Duration

mu sync.Mutex
tokens map[string]bootstrapEntry
tokenOps int
}
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

In-memory bootstrap tokens do not survive restart or scale to multiple instances.

The tokens map is process-local. In a multi-instance (HA/load-balanced) management deployment, a client's POST /join/entra/enroll might land on node A while the subsequent gRPC Login that echoes the bootstrap token lands on node B — node B has no knowledge of the token and will reject it. Similarly, a management restart between enroll and Login will invalidate all pending enrollments.

Given enrollment is a public-facing feature, either:

  • Persist bootstrap tokens in the DB (with TTL column + periodic GC), which also fits naturally with the existing *gorm.DB, or
  • Document prominently that this integration is currently single-instance only, and gate multi-instance deployments accordingly.

A hashed form (e.g., SHA-256 of the random token) is preferable in either store so that a DB/memory dump does not leak still-valid tokens.

Also applies to: 133-181

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/sql_store.go` around lines 18 -
27, The in-memory tokens map on SQLStore (tokens, bootstrapEntry, tokenOps) is
not resilient across restarts or multiple instances; persist bootstrap tokens in
the existing *gorm.DB instead: add a BootstrapToken model/table with fields for
id/hash (store SHA-256 of the random token), plain_token should never be stored,
created_at and expires_at (TTL) and an index on expires_at, run an automatic
migration, and replace reads/writes that reference SQLStore.tokens (and methods
handling enroll/Login/validate like the bootstrap handling code between lines
~133-181) to use DB lookups against the hashed token; also implement a periodic
GC job (or DB-native TTL cleanup) to remove expired rows and ensure token
comparison uses constant-time/hashing semantics so leaked DB dumps don’t reveal
live tokens.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged as a known production gap for HA / multi-instance deployments. The SQLStore in-memory token map is explicitly documented in the package README and docs/ENTRA_DEVICE_AUTH.md (Known production gaps) as a follow-up: persist hashed tokens in the main DB with expires_at + periodic GC. Not done in this round because it is a schema/migration change that deserves its own PR; wanted to keep the review-round commit scoped to correctness fixes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only users with a collaborator, contributor, member, or owner role can interact with CodeRabbit.

Comment on lines +158 to +173
func (s *SQLStore) ConsumeBootstrapToken(_ context.Context, peerID, token string) (bool, error) {
s.mu.Lock()
defer s.mu.Unlock()
entry, ok := s.tokens[peerID]
if !ok {
return false, nil
}
delete(s.tokens, peerID)
if entry.token != token {
return false, nil
}
if time.Now().UTC().After(entry.expiresAt) {
return false, nil
}
return true, nil
}
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bootstrap token can be burned by anyone who knows/guesses the peerID.

At Line 165 the cached entry is deleted unconditionally, before the token or expiry is verified. Any caller who can invoke ConsumeBootstrapToken with a matching peerID and any wrong token will evict the legitimate entry, causing the real client's subsequent gRPC Login to fail. Since gRPC Login is the public path that feeds this check, this is a DoS surface against in-flight enrollments.

Delete only after confirming both match and non-expiry (or only on success):

🔒️ Proposed fix
 func (s *SQLStore) ConsumeBootstrapToken(_ context.Context, peerID, token string) (bool, error) {
 	s.mu.Lock()
 	defer s.mu.Unlock()
 	entry, ok := s.tokens[peerID]
 	if !ok {
 		return false, nil
 	}
-	delete(s.tokens, peerID)
 	if entry.token != token {
 		return false, nil
 	}
+	// Token matched — consume it exactly once regardless of expiry.
+	delete(s.tokens, peerID)
 	if time.Now().UTC().After(entry.expiresAt) {
 		return false, nil
 	}
 	return true, nil
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (s *SQLStore) ConsumeBootstrapToken(_ context.Context, peerID, token string) (bool, error) {
s.mu.Lock()
defer s.mu.Unlock()
entry, ok := s.tokens[peerID]
if !ok {
return false, nil
}
delete(s.tokens, peerID)
if entry.token != token {
return false, nil
}
if time.Now().UTC().After(entry.expiresAt) {
return false, nil
}
return true, nil
}
func (s *SQLStore) ConsumeBootstrapToken(_ context.Context, peerID, token string) (bool, error) {
s.mu.Lock()
defer s.mu.Unlock()
entry, ok := s.tokens[peerID]
if !ok {
return false, nil
}
if entry.token != token {
return false, nil
}
// Token matched — consume it exactly once regardless of expiry.
delete(s.tokens, peerID)
if time.Now().UTC().After(entry.expiresAt) {
return false, nil
}
return true, nil
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/sql_store.go` around lines 158 -
173, The ConsumeBootstrapToken method currently deletes the cached entry
unconditionally before validating the token and expiry, allowing anyone who
knows a peerID to burn the token; change the logic in ConsumeBootstrapToken so
you first look up entry := s.tokens[peerID], then verify entry.token == token
and that time.Now().UTC().Before(entry.expiresAt), and only after both checks
pass call delete(s.tokens, peerID) and return true; keep the existing mutex
(s.mu.Lock()/defer s.mu.Unlock()) and only remove the map entry on successful
validation.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 07bf8cf: SQLStore.ConsumeBootstrapToken now validates prior existence + token match BEFORE calling delete, closing the DoS surface where any caller with a guessed peerID could evict the real entry by supplying a wrong token.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only users with a collaborator, contributor, member, or owner role can interact with CodeRabbit.

Comment thread management/server/integrations/entra_device/store.go
Comment thread management/server/integrations/entra_device/store.go
Comment on lines +40 to +63
type EntraDeviceAuth struct {
ID string `gorm:"primaryKey"`
AccountID string `gorm:"uniqueIndex"`

// Entra configuration
TenantID string
ClientID string
ClientSecret string `gorm:"column:client_secret"` // encrypt at rest via existing secret storage
Issuer string
Audience string

// Behaviour flags
Enabled bool
RequireIntuneCompliant bool
AllowTenantOnlyFallback bool
FallbackAutoGroups []string `gorm:"serializer:json"`
MappingResolution MappingResolution

// Optional continuous revalidation interval. 0 = join-only validation.
RevalidationInterval time.Duration

CreatedAt time.Time
UpdatedAt time.Time `gorm:"autoUpdateTime:false"`
}
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Look for an existing pattern for encrypting secret columns at rest in the
# management DB (gorm serializer, wrapper types, or a secret-store helper).
rg -nP --type=go -C3 '(?i)(encrypt|seal|secret).{0,40}(serializer|Scanner|Valuer|gorm)' management/
rg -nP --type=go -C2 'gorm:"[^"]*serializer:' management/
rg -nP --type=go -C3 'ClientSecret\s+string' management/server/types/ management/server/account.go management/server/idp/

Repository: netbirdio/netbird

Length of output: 23441


🏁 Script executed:

# Check EventMeta() implementation to ensure ClientSecret is not exposed
rg -n 'func.*EventMeta' management/server/types/entra_device_auth.go -A 20

# Look for documentation on secret management
find management/ -type f \( -name "*.md" -o -name "*.go" \) | xargs rg -l "secret.*management|vault|KMS" | head -5

# Check how other sensitive fields (KeySecret, etc.) are handled in migrations or at-rest
rg -n 'KeySecret|ClientSecret' management/server/migration/ -B 2 -A 2

Repository: netbirdio/netbird

Length of output: 2881


ClientSecret is stored in plaintext despite the comment claiming encrypt-at-rest.

The field is declared as a plain string with only gorm:"column:client_secret" for column naming; no encryption serializer, wrapper type, or secret-store integration is applied. The comment "encrypt at rest via existing secret storage" does not reflect implemented functionality. An Entra app's client secret can mint app-only Graph tokens, making DB leakage a material security risk.

Before release, either:

  • Wrap the field in a type using gorm's Serializer mechanism (matching the project's pattern for sensitive fields), or
  • Store only a reference and keep the secret in a dedicated secret store (KMS/Vault/environment).

EventMeta() correctly excludes ClientSecret from event payloads, so no regression risk there.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/types/entra_device_auth.go` around lines 40 - 63, The
ClientSecret field on EntraDeviceAuth is currently a plaintext string despite
the comment; change its storage to an encrypted serializer or secret reference:
either replace ClientSecret string with a wrapped type (e.g., EncryptedString or
SecretRef) and add the gorm tag `serializer:<YourSerializer>` to the
EntraDeviceAuth struct so GORM uses the project’s existing encryption
serializer, or alter the field to store only a secret identifier (e.g.,
ClientSecretRef string) and integrate retrieval from the project's secret store
when needed; update any usages of EntraDeviceAuth.ClientSecret (creation, load,
persist) to use the new type or secret lookup and ensure EventMeta() continues
to exclude the secret.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged as a known production gap. client_secret is called out in both the package README and docs/ENTRA_DEVICE_AUTH.md under Known production gaps as a must-do before the integration is exposed to a real tenant; rotating the column to the existing encrypted-column/serialiser pattern is deliberately deferred to a follow-up PR so the review round stays focused on correctness.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only users with a collaborator, contributor, member, or owner role can interact with CodeRabbit.

Comment thread tools/entra-test/TESTING.md Outdated
@thvevirtue thvevirtue force-pushed the feature/entra-device-auth branch from 19db31a to 9f04a0c Compare April 24, 2026 06:31
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
management/server/integrations/entra_device/manager.go (2)

60-76: Clock field is set but never consumed.

Manager.Clock is initialized in NewManager but no method in this file reads it — the only time.Now() usage in the file is literally inside this initializer. Either plumb m.Clock() into places that need a timestamp (e.g. wherever expiry or issuance time gets stamped) or drop the field to avoid giving the false impression that time is injectable for tests.

♻️ Drop the unused field (if no caller needs it yet)
 type Manager struct {
 	Store        Store
 	NonceStore   NonceStore
 	Cert         *CertValidator
 	NewGraph     func(tenantID, clientID, clientSecret string) GraphClient
 	PeerEnroller PeerEnroller
-
-	// Clock overridable for tests.
-	Clock func() time.Time
 }

 // NewManager constructs a manager with sensible defaults. Callers are expected
 // to set PeerEnroller before handling enrolments.
 func NewManager(store Store) *Manager {
 	return &Manager{
 		Store:      store,
 		NonceStore: NewInMemoryNonceStore(0),
 		Cert:       NewCertValidator(nil, nil),
 		NewGraph: func(tenantID, clientID, clientSecret string) GraphClient {
 			return NewHTTPGraphClient(tenantID, clientID, clientSecret)
 		},
-		Clock: func() time.Time { return time.Now().UTC() },
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/manager.go` around lines 60 - 76,
Remove the unused Clock injection to avoid misleading testability: delete the
Clock field from the Manager struct and remove the Clock initializer in
NewManager (the func() time.Time { return time.Now().UTC() } entry), and update
any callers/constructors that set Manager.Clock so they no longer reference it;
if a timestamp is later needed, reintroduce and wire m.Clock() into the specific
methods that stamp issuance/expiry (refer to Manager and NewManager for where to
edit).

195-230: Graph flow reads cleanly.

Nice separation: existence → enabled → transitive groups → optional compliance, with identity mutated in-place so downstream stages (mapping resolution, peer enrol, activity emission) have authoritative Entra state rather than trusting the client echo. One minor nit you may want to tidy later: CodeGroupLookupFailed is used as the bucket for device lookup and compliance lookup failures too — a more granular code set (CodeDeviceLookupFailed / CodeComplianceLookupFailed) would make operator dashboards easier to slice, but it's not worth blocking on.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/manager.go` around lines 195 -
230, Replace the generic CodeGroupLookupFailed usage in verifyWithGraph with
more specific error codes: use a new CodeDeviceLookupFailed for the
graph.Device(...) error path (the NewError call that wraps "graph device lookup
failed") and use a new CodeComplianceLookupFailed for the graph.IsCompliant(...)
error path (the NewError call that wraps "graph Intune compliance lookup
failed"); update the error code constants (and any switch/metrics/dashboards
that depend on them) so NewError can be called with CodeDeviceLookupFailed and
CodeComplianceLookupFailed, leaving the transitiveMemberOf path as
CodeGroupLookupFailed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@management/server/integrations/entra_device/manager.go`:
- Around line 159-179: The code trims the nonce only for NonceStore.Consume but
passes the raw encoded string to decodeNonceBytes, causing
valid-but-whitespace-padded nonces to be consumed then fail decoding; update
Manager.consumeNonce to normalize (e.g., call strings.TrimSpace) the encoded
nonce once into a local variable and use that normalized value for both
m.NonceStore.Consume(...) and decodeNonceBytes(...), ensuring you still return
the same NewError codes from consumeNonce on failures.
- Around line 119-136: The current flow creates the peer via m.enrollPeer(...)
and only afterwards calls m.issueBootstrapToken(...), which can fail and leave
an orphaned peer; change the flow to generate the bootstrap token before
creating the peer and pass it into the transactional peer-creation call (e.g.,
add a token field to EnrollPeerInput and have
enrollPeer/DefaultAccountManager.EnrollEntraDevicePeer persist the token
together with the peer and via StoreBootstrapToken inside the same DB
transaction), so peer creation + token persistence is atomic; if you cannot make
that change immediately, implement best-effort compensation inside this
function: on issueBootstrapToken error attempt to delete the created peer (call
the peer delete method) and log/report any failure to delete before returning
the error.

---

Nitpick comments:
In `@management/server/integrations/entra_device/manager.go`:
- Around line 60-76: Remove the unused Clock injection to avoid misleading
testability: delete the Clock field from the Manager struct and remove the Clock
initializer in NewManager (the func() time.Time { return time.Now().UTC() }
entry), and update any callers/constructors that set Manager.Clock so they no
longer reference it; if a timestamp is later needed, reintroduce and wire
m.Clock() into the specific methods that stamp issuance/expiry (refer to Manager
and NewManager for where to edit).
- Around line 195-230: Replace the generic CodeGroupLookupFailed usage in
verifyWithGraph with more specific error codes: use a new CodeDeviceLookupFailed
for the graph.Device(...) error path (the NewError call that wraps "graph device
lookup failed") and use a new CodeComplianceLookupFailed for the
graph.IsCompliant(...) error path (the NewError call that wraps "graph Intune
compliance lookup failed"); update the error code constants (and any
switch/metrics/dashboards that depend on them) so NewError can be called with
CodeDeviceLookupFailed and CodeComplianceLookupFailed, leaving the
transitiveMemberOf path as CodeGroupLookupFailed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e0505143-307d-4315-8b15-d647554f47de

📥 Commits

Reviewing files that changed from the base of the PR and between e986187 and 19db31a.

📒 Files selected for processing (4)
  • management/server/integrations/entra_device/manager.go
  • management/server/integrations/entra_device/resolution.go
  • tools/entra-test/docker-compose.yml
  • tools/entra-test/seed-account/main.go
✅ Files skipped from review due to trivial changes (1)
  • tools/entra-test/docker-compose.yml
🚧 Files skipped from review as they are similar to previous changes (1)
  • tools/entra-test/seed-account/main.go

Comment thread management/server/integrations/entra_device/manager.go
Comment on lines +159 to +179
func (m *Manager) consumeNonce(encoded string) ([]byte, error) {
ok, err := m.NonceStore.Consume(strings.TrimSpace(encoded))
if err != nil {
return nil, NewError(CodeInternal, "nonce store error", err)
}
if !ok {
return nil, NewError(CodeInvalidNonce, "nonce is unknown, already consumed, or expired", nil)
}
return decodeNonceBytes(encoded)
}

// decodeNonceBytes tolerates both RawURL and Std base64 alphabets.
func decodeNonceBytes(encoded string) ([]byte, error) {
if b, err := base64.RawURLEncoding.DecodeString(encoded); err == nil {
return b, nil
}
if b, err := base64.StdEncoding.DecodeString(encoded); err == nil {
return b, nil
}
return nil, NewError(CodeInvalidNonce, "nonce is not base64", nil)
}
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Normalize the nonce once, before both Consume and decode.

Consume sees a trimmed string but decodeNonceBytes gets the raw encoded. If a client sends a nonce with trailing whitespace/newline (curl-from-file is a common offender), Consume succeeds against the stored trimmed value, then decodeNonceBytes fails base64 decode and returns CodeInvalidNonce. The nonce is now burned and the client has no way to retry without a fresh /challenge.

🛠️ Normalize once
 func (m *Manager) consumeNonce(encoded string) ([]byte, error) {
-	ok, err := m.NonceStore.Consume(strings.TrimSpace(encoded))
+	encoded = strings.TrimSpace(encoded)
+	ok, err := m.NonceStore.Consume(encoded)
 	if err != nil {
 		return nil, NewError(CodeInternal, "nonce store error", err)
 	}
 	if !ok {
 		return nil, NewError(CodeInvalidNonce, "nonce is unknown, already consumed, or expired", nil)
 	}
 	return decodeNonceBytes(encoded)
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (m *Manager) consumeNonce(encoded string) ([]byte, error) {
ok, err := m.NonceStore.Consume(strings.TrimSpace(encoded))
if err != nil {
return nil, NewError(CodeInternal, "nonce store error", err)
}
if !ok {
return nil, NewError(CodeInvalidNonce, "nonce is unknown, already consumed, or expired", nil)
}
return decodeNonceBytes(encoded)
}
// decodeNonceBytes tolerates both RawURL and Std base64 alphabets.
func decodeNonceBytes(encoded string) ([]byte, error) {
if b, err := base64.RawURLEncoding.DecodeString(encoded); err == nil {
return b, nil
}
if b, err := base64.StdEncoding.DecodeString(encoded); err == nil {
return b, nil
}
return nil, NewError(CodeInvalidNonce, "nonce is not base64", nil)
}
func (m *Manager) consumeNonce(encoded string) ([]byte, error) {
encoded = strings.TrimSpace(encoded)
ok, err := m.NonceStore.Consume(encoded)
if err != nil {
return nil, NewError(CodeInternal, "nonce store error", err)
}
if !ok {
return nil, NewError(CodeInvalidNonce, "nonce is unknown, already consumed, or expired", nil)
}
return decodeNonceBytes(encoded)
}
// decodeNonceBytes tolerates both RawURL and Std base64 alphabets.
func decodeNonceBytes(encoded string) ([]byte, error) {
if b, err := base64.RawURLEncoding.DecodeString(encoded); err == nil {
return b, nil
}
if b, err := base64.StdEncoding.DecodeString(encoded); err == nil {
return b, nil
}
return nil, NewError(CodeInvalidNonce, "nonce is not base64", nil)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/manager.go` around lines 159 -
179, The code trims the nonce only for NonceStore.Consume but passes the raw
encoded string to decodeNonceBytes, causing valid-but-whitespace-padded nonces
to be consumed then fail decoding; update Manager.consumeNonce to normalize
(e.g., call strings.TrimSpace) the encoded nonce once into a local variable and
use that normalized value for both m.NonceStore.Consume(...) and
decodeNonceBytes(...), ensuring you still return the same NewError codes from
consumeNonce on failures.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 07bf8cf: consumeNonce trims the input once and passes the trimmed value to both NonceStore.Consume and decodeNonceBytes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rate Limit Exceeded

@thvevirtue have exceeded the limit for the number of chat messages per hour. Please wait 59 minutes and 24 seconds before sending another message.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

♻️ Duplicate comments (4)
management/server/integrations/entra_device/store.go (1)

31-138: ⚠️ Potential issue | 🔴 Critical

MemoryStore doc still claims goroutine-safety while all receivers access plain maps without synchronization.

Lines 31–33 state "It is goroutine-safe" but the struct has no mutex and every receiver (GetEntraDeviceAuth, SaveEntraDeviceAuth, DeleteEntraDeviceAuth, the mapping CRUD, and the token ops) touches m.auths/m.byTenant/m.mappings/m.tokens unguarded. Concurrent access from /join/entra/enroll plus an admin mutation on the same account will race and can panic at runtime.

ConsumeBootstrapToken (lines 131–138) additionally violates its "single-use" contract: the read-compare (got, ok := m.tokens[peerID]) and the delete(m.tokens, peerID) are separate steps, so two concurrent consumers can both observe the match and both return true before either deletes.

Either add a sync.RWMutex to MemoryStore and wrap reads with RLock / writes (including the full read-compare-delete in ConsumeBootstrapToken) with Lock, or downgrade the doc comment to "not goroutine-safe; callers must serialize access" and rely on SQLStore for production.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/store.go` around lines 31 - 138,
The MemoryStore claims goroutine-safety but uses plain maps without
synchronization; add a sync.RWMutex field (e.g., mu) to MemoryStore and protect
all accessors: use mu.RLock()/RUnlock() in read-only methods
(GetEntraDeviceAuth, GetEntraDeviceAuthByTenant, ListEntraDeviceMappings,
GetEntraDeviceMapping) and mu.Lock()/Unlock() in mutating methods
(SaveEntraDeviceAuth, DeleteEntraDeviceAuth, SaveEntraDeviceMapping,
DeleteEntraDeviceMapping, StoreBootstrapToken) and perform the
read-compare-delete in ConsumeBootstrapToken entirely under mu.Lock() to ensure
single-use semantics; update the constructor NewMemoryStore to initialize the
mutex (no other behavioral changes).
management/server/types/entra_device_auth.go (1)

40-63: ⚠️ Potential issue | 🟠 Major

ClientSecret is persisted in plaintext despite the "encrypt at rest" comment.

The field is a plain string with only gorm:"column:client_secret" for column naming — no serializer:-based encrypt-at-rest wrapper, no secret-store indirection. An Entra app's client secret can mint app-only Graph tokens against the tenant, so a DB leak (backup, replica snapshot, log line) is a direct privilege-escalation vector.

Before release, either wrap the column in an encrypted serializer matching the project's pattern for other sensitive fields, or store a reference and resolve it from a dedicated secret store (KMS/Vault/env). EventMeta() already excludes the secret from activity payloads, so that side is fine.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/types/entra_device_auth.go` around lines 40 - 63, The
ClientSecret field on the EntraDeviceAuth struct is stored as a plain string;
change it to either use the project's encrypted serializer or to store a
secret-reference resolved from the secret store: replace the ClientSecret string
field (in EntraDeviceAuth) with the project's encrypted/serialized secret
wrapper (matching other sensitive fields' serializer tag) or with a reference
type (e.g., SecretRef) and update persistence/loader logic to resolve it from
KMS/Vault/env; ensure GORM tags reflect the serializer or reference column and
adjust any create/update/load code paths that read/write
EntraDeviceAuth.ClientSecret accordingly.
management/server/integrations/entra_device/manager.go (2)

119-136: ⚠️ Potential issue | 🟠 Major

Orphan-peer risk: bootstrap-token failure after peer creation.

If issueBootstrapToken fails (DB blip on StoreBootstrapToken, or the vanishingly rare rand.Read error) the peer has already been created with its WG pubkey. The client surfaces the error, and any retry collides with the duplicate-pubkey guard downstream, leaving the device stuck until an admin manually deletes the orphan row.

Prefer generating the token up-front and persisting it inside the same transaction as peer creation (extend EnrollPeerInput/EnrollEntraDevicePeer). If atomic persistence is too invasive short-term, at least best-effort compensate by deleting the just-created peer on issueBootstrapToken error before returning.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/manager.go` around lines 119 -
136, The current flow calls enrollPeer(...) first and then
issueBootstrapToken(...), which can leave an orphan peer if token issuance
fails; you should either (preferred) generate the token before creating the peer
and extend EnrollPeerInput / EnrollEntraDevicePeer to accept and persist the
bootstrap token within the same DB transaction inside enrollPeer, or (minimal
change) implement best-effort compensation: on error from
issueBootstrapToken(ctx, result.PeerID) call a peer-deletion routine (e.g.,
deletePeer/deleteEntraDevicePeer by PeerID) to remove the just-created peer, log
any deletion error, and then return the original token issuance error; update
the call sites and types referenced (enrollPeer, issueBootstrapToken,
EnrollPeerInput/EnrollEntraDevicePeer, and any deletePeer method) accordingly so
retries won’t hit duplicate-pubkey collisions.

159-168: ⚠️ Potential issue | 🟡 Minor

Normalize the nonce once, then use the same value for both Consume and decode.

Consume is called with strings.TrimSpace(encoded) but decodeNonceBytes gets the raw encoded string. A client that sends the nonce with trailing whitespace/newline will succeed on the store (which was presumably issued with the trimmed value match) and then fail base64 decode — burning the nonce and forcing a fresh /challenge round-trip.

🛠️ Normalize once
 func (m *Manager) consumeNonce(encoded string) ([]byte, error) {
-	ok, err := m.NonceStore.Consume(strings.TrimSpace(encoded))
+	encoded = strings.TrimSpace(encoded)
+	ok, err := m.NonceStore.Consume(encoded)
 	if err != nil {
 		return nil, NewError(CodeInternal, "nonce store error", err)
 	}
 	if !ok {
 		return nil, NewError(CodeInvalidNonce, "nonce is unknown, already consumed, or expired", nil)
 	}
 	return decodeNonceBytes(encoded)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/manager.go` around lines 159 -
168, In consumeNonce, normalize the incoming nonce once (e.g., n :=
strings.TrimSpace(encoded)) and use that normalized value for both
m.NonceStore.Consume and decodeNonceBytes so you don't consume a trimmed nonce
but attempt to decode the raw string; update calls to m.NonceStore.Consume(n)
and return decodeNonceBytes(n) while keeping existing error handling in NewError
unchanged.
🧹 Nitpick comments (13)
client/internal/enroll/entradevice/state.go (1)

11-36: Duplicated struct with profilemanager.EntraEnrollState — consider a shared location or conversion helper.

This struct is intentionally duplicated in client/internal/profilemanager/config.go to avoid an import cycle (documented there). Without a conversion helper or shared schema test, future changes to one copy (e.g. adding a field) can silently drift out of the other, breaking round-tripping between the enroller result and the persisted profile. Consider moving the struct to a small leaf package (e.g. client/internal/enroll/entradevice/state or a new shared/entradevice/state) that both packages can import, or add a unit test that marshals one and unmarshals into the other to guard against drift.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/enroll/entradevice/state.go` around lines 11 - 36, The
EntraEnrollState struct is duplicated between EntraEnrollState in this file and
profilemanager.EntraEnrollState; extract the canonical type into a small shared
package (e.g., enroll/entradevice/state) and update both usages to import that
package (replace the local struct declarations with the shared type), or
alternatively add a unit test that marshals the enroller's EntraEnrollState and
unmarshals into profilemanager.EntraEnrollState (and vice versa) to detect
drift; locate occurrences by the type name EntraEnrollState and the
profilemanager.EntraEnrollState reference and update imports/usages accordingly.
tools/entra-test/seed-account/main.go (1)

110-120: Local-only default DSN with baked-in password.

Acknowledged this is a dev fixture (hence the NOSONAR suppression and comment). Consider also gating the default credentials behind an explicit -i-know-this-is-insecure flag, or pulling them from the compose file's .env — so a contributor can't accidentally run seed-account against a misconfigured prod database with the weak fallback taking effect. Non-blocking.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/entra-test/seed-account/main.go` around lines 110 - 120, The defaultDSN
function currently returns hard-coded local dev credentials (using envOrDefault)
which can silently be used against a non-dev DB; change it so the insecure
fallback is only used when an explicit opt-in is provided (e.g., add and check a
boolean flag like allowInsecureDefaults or a CLI flag `-i-know-this-is-insecure`
before returning the baked-in user/pass/db), or alternatively load the
credentials from the docker-compose/.env file instead of hard-coding; update
defaultDSN to consult that flag (or the parsed .env) and fail fast with a clear
error if neither a DSN nor explicit opt-in is present.
client/internal/enroll/entradevice/enroller_test.go (3)

249-250: Drop the fmt.Sprintf placeholder.

fmt isn't referenced anywhere else in this test. Remove the import and the var _ = fmt.Sprintf line; they don't do anything useful and are confusing (error.As has always been available in the errors package and is already used via require.ErrorAs).

♻️ Suggested change
 	"encoding/json"
-	"fmt"
 	"math/big"
-// Silence unused-import warning on Go versions without error.As shortcut.
-var _ = fmt.Sprintf
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/enroll/entradevice/enroller_test.go` around lines 249 - 250,
Remove the unused fmt import and the placeholder variable var _ = fmt.Sprintf
from the test; locate the import block in enroller_test.go and delete the "fmt"
import along with the var _ = fmt.Sprintf line so there are no leftover unused
symbols (fmt.Sprintf) in the file.

209-210: Assert on LoadPFX errors in these setup lines.

Lines 210 and 228 drop the error from LoadPFX. If PFX generation regresses, later assertions will trip on a nil cert with an opaque NPE. Same fix as elsewhere in the file — use require.NoError.

♻️ Suggested change
-	cert, _ := LoadPFX(pfxPath, pfxPass)
+	cert, err := LoadPFX(pfxPath, pfxPass)
+	require.NoError(t, err)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/enroll/entradevice/enroller_test.go` around lines 209 - 210,
The test setup drops the error from LoadPFX which can lead to nil cert NPEs;
update the setup where makeSelfSignedPFX and LoadPFX are called (symbols:
makeSelfSignedPFX, LoadPFX, cert) to assert the LoadPFX call succeeded by using
require.NoError(t, err) (or equivalent) immediately after calling LoadPFX and
before using cert so failures surface clearly during test setup.

114-117: Don't silently swallow malformed enroll request bodies in the fake server.

If JSON decoding fails, fs.gotEnroll becomes a zero-value pointer and downstream assertions (tenant-1, device-001, cert chain length, etc.) will fail with confusing "expected X got empty" messages that hide the real cause. Fail the test loudly instead.

♻️ Suggested change
-	mux.HandleFunc("/join/entra/enroll", func(w http.ResponseWriter, r *http.Request) {
-		var req enrollReq
-		_ = json.NewDecoder(r.Body).Decode(&req)
-		fs.gotEnroll = &req
+	mux.HandleFunc("/join/entra/enroll", func(w http.ResponseWriter, r *http.Request) {
+		var req enrollReq
+		if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
+			t.Errorf("fake /enroll: decode body: %v", err)
+			w.WriteHeader(http.StatusBadRequest)
+			return
+		}
+		fs.gotEnroll = &req
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/enroll/entradevice/enroller_test.go` around lines 114 - 117,
The fake server handler registered with mux.HandleFunc("/join/entra/enroll",
...) currently ignores JSON decode errors for enrollReq and sets fs.gotEnroll to
a zero-value pointer; change the handler to check the error returned by
json.NewDecoder(r.Body).Decode(&req) and, if non-nil, fail the test loudly
(e.g., call t.Fatalf or equivalent) with the decode error included, and only
assign fs.gotEnroll = &req when decode succeeds so downstream assertions see the
real cause instead of empty values.
management/server/integrations/entra_device/nonce_store.go (2)

54-72: Expired-nonce GC only runs on Issue — expired entries can linger indefinitely on idle stores.

If enrollments are bursty followed by a long quiet period, expired nonces sit in entries until the next Issue. Not a security issue (they'll be rejected on consume), but a minor memory footprint concern on long-lived single-node deployments.

Either run a periodic goroutine GC or also GC on Consume. Given the existing gcLocked is already mutex-aware, calling it in Consume every N operations is a one-liner.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/nonce_store.go` around lines 54 -
72, Expired nonces are only purged in Issue, so entries can accumulate when the
store is idle; update Consume (method on InMemoryNonceStore) to increment ops
and call gcLocked(time.Now().UTC()) under the same gcEvery condition as Issue
(use s.mu lock already used in Consume) so GC runs every N operations even on
consume paths; reference the existing fields/methods: s.ops, s.gcEvery, s.mu,
s.entries, Consume, Issue, and gcLocked to mirror the one-line GC invocation
pattern used in Issue.

74-98: The "constant-time" nonce scan doesn't buy anything and is strictly slower than a direct map lookup.

Two issues with the current Consume:

  1. The comment claims constant-time to avoid leaking which nonces exist, but (a) Go map iteration order is randomized so the first-match break still leaks the match's iteration position, and (b) subtle.ConstantTimeCompare short-circuits on unequal byte-length inputs. Net: it's not constant-time in a meaningful sense.
  2. The threat model doesn't require timing resistance here — nonces are 32 random bytes from crypto/rand and base64url-encoded (~256 bits of entropy). An attacker cannot guess a value to probe for. A direct map[string]entry lookup is O(1), correct, and easier to reason about.
♻️ Suggested replacement
-// Consume implements NonceStore using a constant-time equality check.
 func (s *InMemoryNonceStore) Consume(nonce string) (bool, error) {
 	s.mu.Lock()
 	defer s.mu.Unlock()
 
-	now := time.Now().UTC()
-	// Scan with constant-time comparison to avoid leaking which nonces exist
-	// via timing. The overhead is negligible given the store's small size.
-	var found string
-	for key := range s.entries {
-		if subtle.ConstantTimeCompare([]byte(key), []byte(nonce)) == 1 {
-			found = key
-			break
-		}
-	}
-	if found == "" {
-		return false, nil
-	}
-	e := s.entries[found]
-	delete(s.entries, found)
-	if now.After(e.expiresAt) {
-		return false, nil
-	}
-	return true, nil
+	e, ok := s.entries[nonce]
+	if !ok {
+		return false, nil
+	}
+	delete(s.entries, nonce)
+	if time.Now().UTC().After(e.expiresAt) {
+		return false, nil
+	}
+	return true, nil
 }

Drop the crypto/subtle import once this is applied.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/nonce_store.go` around lines 74 -
98, The Consume method in InMemoryNonceStore should use a direct map lookup
instead of the constant-time iteration: replace the loop that uses
subtle.ConstantTimeCompare with an existence check like e, ok :=
s.entries[nonce]; if !ok return false, nil; then delete(s.entries, nonce) and
check e.expiresAt as before; update the function Consume accordingly and remove
the unused crypto/subtle import. This keeps the existing mutex usage
(s.mu.Lock()/defer s.mu.Unlock()) and uses the entries map and e.expiresAt
identifiers to locate the changes.
management/Dockerfile.entra-test (1)

38-44: Runtime image runs as root (Trivy DS-0002).

For a local test image this is low-risk, but adding a non-root USER keeps you aligned with the production management/Dockerfile and silences the scanner.

♻️ Suggested minimal hardening
 RUN apt-get update && \
     apt-get install -y --no-install-recommends ca-certificates curl && \
-    rm -rf /var/lib/apt/lists/*
+    rm -rf /var/lib/apt/lists/* && \
+    useradd --system --uid 10001 --no-create-home --shell /usr/sbin/nologin netbird

 COPY --from=builder /out/netbird-mgmt /usr/local/bin/netbird-mgmt
+
+USER 10001:10001
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/Dockerfile.entra-test` around lines 38 - 44, The image currently
runs as root (base FROM debian:bookworm-slim) — create a non-root user and
switch to it: add steps after copying the binary (COPY --from=builder
/out/netbird-mgmt /usr/local/bin/netbird-mgmt) to create a dedicated user/group,
chown the /usr/local/bin/netbird-mgmt binary (and any runtime dirs) to that
user, and set USER to that non-root account so the container runs unprivileged;
keep the changes minimal and ensure permissions allow execution by the new user.
client/internal/enroll/entradevice/provider.go (1)

110-130: SignNonce fallback for opaque signers silently uses PSS defaults and won't interop with RSA PKCS1v15 TPM signers.

The default: branch calls p.signer.Sign(rand.Reader, digest, crypto.SHA256). For opaque RSA signers this produces a PKCS1v15 signature (the standard library's crypto.Signer contract for RSA), which the server accepts — good. But the comment says "they MUST handle RSA-PSS" which is the opposite of what the fallback actually requests (a nil opts / crypto.SHA256 yields PKCS1v15, not PSS). Either fix the comment or pass &rsa.PSSOptions{Hash: crypto.SHA256} for consistency with the explicit RSA branch.

Non-blocking until the CNG/TPM provider lands, but worth aligning before that PR.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/enroll/entradevice/provider.go` around lines 110 - 130, The
fallback in PFXProvider.SignNonce currently calls p.signer.Sign(rand.Reader,
digest[:], crypto.SHA256) which requests PKCS1v15 for RSA signers but the
comment says signers must handle RSA-PSS; update the implementation and comment
to be consistent: call p.signer.Sign(rand.Reader, digest[:],
&rsa.PSSOptions{Hash: crypto.SHA256}) for the default branch (so opaque signers
are requested to produce PSS like rsa.SignPSS) and adjust the comment to reflect
that RSA-PSS is required (or alternatively, if you intend to allow PKCS1v15,
change the comment to state PKCS1v15 fallback); reference SignNonce,
PFXProvider, p.signer.Sign and rsa.SignPSS when making the change.
tools/entra-test/enroll-tester/main.go (1)

143-147: Optional: mirror the enroller's dual-alphabet tolerance for nonce decoding.

client/internal/enroll/entradevice/enroller.go tolerates both base64.RawURLEncoding and base64.StdEncoding when decoding the challenge nonce, while this tester only accepts RawURLEncoding. The server contract is URL-safe base64, so this works today, but keeping the tester as tolerant as the real client would prevent it from being the first thing that breaks if the encoding is ever tweaked on either side.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/entra-test/enroll-tester/main.go` around lines 143 - 147, The tester
currently decodes the challenge nonce using only base64.RawURLEncoding (in the
block that decodes challenge.Nonce into rawNonce); update that logic to mirror
enroller.go by trying RawURLEncoding first and if it fails, retry with
base64.StdEncoding before calling die. Ensure you reference the same variables
and error path (challenge.Nonce, rawNonce, err) and preserve the existing
die("decode nonce: %v", err) behavior when both decodes fail.
client/cmd/entra_enroll.go (1)

147-155: Use errors.As and filepath.Dir instead of the hand-rolled equivalents.

Two small refactors:

  1. err.(*entradevice.Error) at line 149 misses wrapped errors. The enroller returns the structured *Error directly in some paths, but if any future path wraps it with fmt.Errorf("...: %w", err) (and some existing paths in enroller.go already do), the structured branch silently skips and you lose the stable Code/HTTPStatus in the user-facing output.
  2. filepathDir reimplements path/filepath.Dir but doesn't handle Windows drive letters, trailing separators, or "."/"/" edge cases the same way the stdlib does. This CLI already targets Windows (see doc comment about C:\ProgramData\NetBird\device.pfx).
♻️ Proposed fix
@@
 import (
 	"context"
+	"errors"
 	"fmt"
 	"net/url"
 	"os"
+	"path/filepath"
 	"strings"
 	"time"
@@
-			if structured, ok := err.(*entradevice.Error); ok {
+			var structured *entradevice.Error
+			if errors.As(err, &structured) {
 				cmd.PrintErrf("Enrolment rejected: %s (HTTP %d)\n  %s\n",
 					structured.Code, structured.HTTPStatus, structured.Message)
 				return fmt.Errorf("enrolment failed: %s", structured.Code)
 			}
@@
-	if err := os.MkdirAll(filepathDir(configPath), 0o755); err != nil {
+	if err := os.MkdirAll(filepath.Dir(configPath), 0o755); err != nil {
@@
-func filepathDir(p string) string {
-	for i := len(p) - 1; i >= 0; i-- {
-		if p[i] == '\\' || p[i] == '/' {
-			return p[:i]
-		}
-	}
-	return "."
-}

Also applies to: 254-261

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/cmd/entra_enroll.go` around lines 147 - 155, The code currently uses a
direct type assertion err.(*entradevice.Error) which misses wrapped errors and a
custom filepathDir helper that reimplements stdlib behavior; update the error
handling in the Entra enrolment command to use errors.As to detect a wrapped
*entradevice.Error (capture into a variable, then print
structured.Code/structured.HTTPStatus/structured.Message and return the stable
Code) and replace usages of filepathDir with the standard filepath.Dir to ensure
correct path semantics on Windows and edge cases; apply the same changes to the
other occurrence noted (the block around lines 254-261) so both places handle
wrapped errors and paths correctly.
management/server/integrations/entra_device/manager.go (1)

271-279: Defensively guard against a nil result with nil err.

PeerEnroller is an injected interface; a misbehaving implementation that returns (nil, nil) will crash the manager at the caller (result.PeerID on line 123 / response struct access). Cheap belt-and-braces:

🛡️ Suggested fix
 	result, err := m.PeerEnroller.EnrollEntraDevicePeer(ctx, in)
 	if err != nil {
 		if e, ok := AsError(err); ok {
 			return nil, e
 		}
 		return nil, NewError(CodeInternal, "peer enrolment failed", err)
 	}
+	if result == nil {
+		return nil, NewError(CodeInternal, "peer enroller returned nil result", nil)
+	}
 	return result, nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/manager.go` around lines 271 -
279, The handler calling PeerEnroller.EnrollEntraDevicePeer (result, err :=
m.PeerEnroller.EnrollEntraDevicePeer(ctx, in)) must defensively check for a nil
result even when err == nil to avoid panics when accessing result.PeerID; after
the call, if result == nil { return nil, NewError(CodeInternal, "peer enrolment
returned nil result", nil) } (or include context) so the manager method fails
gracefully; update the method containing that call to perform this nil-check
before returning result.
management/server/types/entra_device_auth.go (1)

160-170: Consider defaulting Enabled: false and enabling only after credentials are validated.

NewEntraDeviceAuth currently constructs with Enabled: true despite having empty TenantID, ClientID, and ClientSecret. While the handler validates these fields before calling SaveEntraDeviceAuth (preventing the unconfigured state from persisting in production), the default is semantically misleading—a new integration should not be marked enabled until it is actually configured and validated.

The challenge endpoint (/join/entra/challenge) does not check integration status and will issue nonces regardless, which is intentional for the protocol design, but setting Enabled: true upfront creates ambiguity about what enabled means and could lead to bugs if future code paths assume the enabled state implies valid credentials. Defaulting to Enabled: false makes the intent clearer: the integration is ready only after the admin handler sets credentials and flips the flag to true.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/types/entra_device_auth.go` around lines 160 - 170,
NewEntraDeviceAuth currently sets Enabled: true even when
TenantID/ClientID/ClientSecret are empty; change the constructor to default
Enabled: false so new Entra integrations are created disabled until credentials
are validated and explicitly enabled (e.g., by the admin handler that calls
SaveEntraDeviceAuth). Update NewEntraDeviceAuth to set Enabled to false and
ensure any code paths that expect an enabled integration (such as the admin
enable flow) explicitly flip Enabled to true after validation; keep
/join/entra/challenge behavior unchanged as noted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/internal/enroll/entradevice/provider.go`:
- Around line 101-108: DeviceID currently returns p.leaf.Subject.CommonName
verbatim which can mismatch server-side extractDeviceID (which trims whitespace
and strips a leading "CN="); update PFXProvider.DeviceID to normalize the CN the
same way as the server by trimming whitespace and removing a leading "CN="
prefix before returning, and add "strings" to the import block; reference the
PFXProvider.DeviceID function and ensure the normalized value is used for the
return and error check.

In `@docs/ENTRA_DEVICE_AUTH.md`:
- Around line 323-347: Fix the markdown lint issues in the "Live-tenant
verification results" section by adding a blank line immediately before and
after the verification-results table and by making every table row use leading
and trailing pipes consistently (ensure rows such as the one currently missing
pipes are updated); also update the fenced architecture diagram (the code block
at the earlier fenced block around line 272) to include a language hint like
```text so it renders as a plain preformatted block. Ensure changes touch the
"Live-tenant verification results" table and the fenced architecture diagram in
this document.
- Around line 319-322: The client_secret column is stored in plain-text and must
be moved to the existing encrypted-column pattern before shipping; either
implement a migration that renames or replaces client_secret with the
encrypted-column variant and updates all code paths that read/write it to use
the encryption helpers (use the same encryption utilities and column naming
convention used elsewhere in the codebase), update schema/migrations and any
tests to use the new encrypted field, and update this doc to remove the warning;
if you don't want to implement it in this PR, open a tracked issue and add its
link to this document so the rotation of client_secret to the encrypted-column
pattern is scheduled and cannot be dropped.

In `@management/server/entra_device_enroll.go`:
- Around line 207-219: The parseIP function mishandles IPv6 addresses with
brackets because it strips the port by last-colon and leaves brackets, causing
net.ParseIP to fail; update parseIP to try net.ParseIP(s) first, then if that
fails call net.SplitHostPort(s) (or attempt to SplitHostPort and on success
parse the host portion) so that inputs like "[::1]:8080" and "host:port" are
handled correctly, and return nil only if parsing still fails; reference the
parseIP function and use net.SplitHostPort to extract the host for parsing.

In `@management/server/http/handler.go`:
- Around line 94-102: Remove the redundant bypass registrations for the Entra
device enrolment endpoints: delete the calls to
bypass.AddBypassPath("/join/entra/challenge") and
bypass.AddBypassPath("/join/entra/enroll") since these routes are mounted on
rootRouter (unauthenticated) and authMiddleware is only applied to the /api
subrouter; leave other bypass.AddBypassPath calls (e.g., for /api/instance,
/api/setup) intact. Ensure no other code paths expect those bypass entries and
run tests to confirm behavior.

In `@management/server/http/handlers/entra_device_auth/wiring.go`:
- Around line 49-110: Install currently permits a nil Permissions silently via
buildPermissionChecker returning an always-true checker, which creates a
fail-open auth bypass; update Install to require w.Permissions != nil (return an
error like "entra_device_auth.Install: Permissions is nil") or introduce an
explicit opt-in flag on Wiring (e.g. InsecureAllowAllForTests bool) and only
allow the permit-all branch in buildPermissionChecker when that flag is true;
adjust buildPermissionChecker to return a deny-by-default checker when pm is nil
and ensure Install enforces the chosen policy (check Wiring.Permissions or
Wiring.InsecureAllowAllForTests) before constructing the Handler.

In `@management/server/integrations/entra_device/graph_client.go`:
- Around line 205-213: In the pagination loop handling wrap.NextLink in
graph_client.go, don't silently break when strings.HasPrefix(wrap.NextLink,
c.baseURL()) is false; instead return an error (or at least log the unexpected
host and return an error) so callers know pagination was truncated; modify the
block that currently does "break" to construct a descriptive error that includes
wrap.NextLink and c.baseURL(), log it via the existing logger and return it up
the call path (the change should touch the pagination loop around wrap.NextLink,
the code that currently sets path/q, and the function's error return flow).

In `@management/server/integrations/entra_device/manager.go`:
- Around line 268-270: The code currently sets EntraDeviceMapping =
MatchedMappingIDs[0] which is misleading for union resolution; change the
assignment so that when resolved.ResolutionMode == "union" you leave
EntraDeviceMapping empty/cleared (and rely on resolved.MatchedMappingIDs for the
full set), and only set EntraDeviceMapping = resolved.MatchedMappingIDs[0] for
strict_priority (or equivalent single-winner modes); update the branch around
the existing assignment (referencing EntraDeviceMapping, MatchedMappingIDs, and
ResolutionMode) and ensure downstream audit/event usage (e.g.,
PeerAddedWithEntraDevice / EnrollPeerInput.EntraDeviceMapping) will instead
reference MatchedMappingIDs for union cases.
- Around line 183-212: The cert validation currently allows a missing
EntraDeviceID to pass through; after calling m.Cert.Validate in
validateCertAndDeviceID, explicitly check if identity.EntraDeviceID == "" and
return a NewError(CodeInvalidCertChain, "certificate missing Entra device ID",
nil) so certs that lack the device ID extracted by extractDeviceID (from
Cert.Validate) are rejected early; do this before any comparison with
req.EntraDeviceID or calling graph.Device.
- Around line 66-76: NewManager currently calls NewCertValidator(nil, nil)
leaving Cert.TrustRoots nil so chain verification is skipped; update
NewManager/Manager to fail-fast when m.Cert.TrustRoots == nil (or require an
explicit opt-in flag like Manager.AllowUntrustedCertChains) by returning an
error or panicking at construction or by rejecting in Enroll if TrustRoots is
nil, or alternatively initialize Cert.TrustRoots with the Microsoft Device Auth
CA set by default; also update the NewManager godoc and README to state that
TrustRoots must be configured for production and document the
AllowUntrustedCertChains override if you add it.

In `@management/server/integrations/entra_device/README.md`:
- Around line 57-69: The "Phase 1" checklist in
management/server/integrations/entra_device/README.md is out of date and marks
items as incomplete even though they are implemented (e.g., wiring into the main
router via the entra-join handler, integration in
DefaultAccountManager.EnrollEntraDevicePeer, and admin CRUD handlers under
management/server/http/handlers/entra_device_auth/). Update the checklist to
reflect current status by toggling the relevant items to complete or remove the
Phase 1 section and instead add a single pointer to docs/ENTRA_DEVICE_AUTH.md as
the canonical source of truth so readers won't be misled.

In `@tools/entra-test/config/management.json`:
- Line 37: The file contains a hardcoded DataStoreEncryptionKey value
("DataStoreEncryptionKey") in the JSON; add a short explicit comment near that
field (or an entry in tools/entra-test/TESTING.md) stating this key is for the
ephemeral test harness only and MUST NOT be used in production or copied as a
template, and mention to rotate/remove it if repurposed; ensure the comment
references the DataStoreEncryptionKey name so reviewers and future readers see
the warning clearly.
- Around line 2-24: The test fixture currently points at production hosts in the
"Stuns", "TURNConfig.Turns", and "Signal" entries (entries using keys "Proto"
and "URI"); update these to target the local docker-compose test services (or
remove them if in-harness peer connectivity isn’t required): replace
"stun:stun.wiretrustee.com:3468" and any "turn:turn.wiretrustee.com:..." URIs
under "TURNConfig.Turns" with the local STUN/TURN service hostnames/ports used
by your docker-compose, and change the "Signal" entry to the local signal
service and correct protocol for gRPC (set "Proto" to the gRPC-appropriate value
and point "URI" at the docker-compose signal endpoint) so the harness uses local
test services rather than production hosts.

In `@tools/entra-test/seed-account/main.go`:
- Around line 46-67: The seed currently always calls types.NewNetwork() and
assigns it to Account.Network before upserting with db.Save(acct), which causes
re-seeding to re-randomize the subnet; change the flow to first attempt to load
an existing types.Account by the provided account ID from the DB, and if an
account exists keep its Network (do not call types.NewNetwork()), otherwise
create the new network and assign it; if you still want to permit overwrites, at
minimum emit a warning via the logger when you are about to overwrite an
existing Account.Network (reference types.NewNetwork, types.Account.Network and
the db.Save(acct) upsert path to locate where to add the load-check and
conditional assignment).

---

Duplicate comments:
In `@management/server/integrations/entra_device/manager.go`:
- Around line 119-136: The current flow calls enrollPeer(...) first and then
issueBootstrapToken(...), which can leave an orphan peer if token issuance
fails; you should either (preferred) generate the token before creating the peer
and extend EnrollPeerInput / EnrollEntraDevicePeer to accept and persist the
bootstrap token within the same DB transaction inside enrollPeer, or (minimal
change) implement best-effort compensation: on error from
issueBootstrapToken(ctx, result.PeerID) call a peer-deletion routine (e.g.,
deletePeer/deleteEntraDevicePeer by PeerID) to remove the just-created peer, log
any deletion error, and then return the original token issuance error; update
the call sites and types referenced (enrollPeer, issueBootstrapToken,
EnrollPeerInput/EnrollEntraDevicePeer, and any deletePeer method) accordingly so
retries won’t hit duplicate-pubkey collisions.
- Around line 159-168: In consumeNonce, normalize the incoming nonce once (e.g.,
n := strings.TrimSpace(encoded)) and use that normalized value for both
m.NonceStore.Consume and decodeNonceBytes so you don't consume a trimmed nonce
but attempt to decode the raw string; update calls to m.NonceStore.Consume(n)
and return decodeNonceBytes(n) while keeping existing error handling in NewError
unchanged.

In `@management/server/integrations/entra_device/store.go`:
- Around line 31-138: The MemoryStore claims goroutine-safety but uses plain
maps without synchronization; add a sync.RWMutex field (e.g., mu) to MemoryStore
and protect all accessors: use mu.RLock()/RUnlock() in read-only methods
(GetEntraDeviceAuth, GetEntraDeviceAuthByTenant, ListEntraDeviceMappings,
GetEntraDeviceMapping) and mu.Lock()/Unlock() in mutating methods
(SaveEntraDeviceAuth, DeleteEntraDeviceAuth, SaveEntraDeviceMapping,
DeleteEntraDeviceMapping, StoreBootstrapToken) and perform the
read-compare-delete in ConsumeBootstrapToken entirely under mu.Lock() to ensure
single-use semantics; update the constructor NewMemoryStore to initialize the
mutex (no other behavioral changes).

In `@management/server/types/entra_device_auth.go`:
- Around line 40-63: The ClientSecret field on the EntraDeviceAuth struct is
stored as a plain string; change it to either use the project's encrypted
serializer or to store a secret-reference resolved from the secret store:
replace the ClientSecret string field (in EntraDeviceAuth) with the project's
encrypted/serialized secret wrapper (matching other sensitive fields' serializer
tag) or with a reference type (e.g., SecretRef) and update persistence/loader
logic to resolve it from KMS/Vault/env; ensure GORM tags reflect the serializer
or reference column and adjust any create/update/load code paths that read/write
EntraDeviceAuth.ClientSecret accordingly.

---

Nitpick comments:
In `@client/cmd/entra_enroll.go`:
- Around line 147-155: The code currently uses a direct type assertion
err.(*entradevice.Error) which misses wrapped errors and a custom filepathDir
helper that reimplements stdlib behavior; update the error handling in the Entra
enrolment command to use errors.As to detect a wrapped *entradevice.Error
(capture into a variable, then print
structured.Code/structured.HTTPStatus/structured.Message and return the stable
Code) and replace usages of filepathDir with the standard filepath.Dir to ensure
correct path semantics on Windows and edge cases; apply the same changes to the
other occurrence noted (the block around lines 254-261) so both places handle
wrapped errors and paths correctly.

In `@client/internal/enroll/entradevice/enroller_test.go`:
- Around line 249-250: Remove the unused fmt import and the placeholder variable
var _ = fmt.Sprintf from the test; locate the import block in enroller_test.go
and delete the "fmt" import along with the var _ = fmt.Sprintf line so there are
no leftover unused symbols (fmt.Sprintf) in the file.
- Around line 209-210: The test setup drops the error from LoadPFX which can
lead to nil cert NPEs; update the setup where makeSelfSignedPFX and LoadPFX are
called (symbols: makeSelfSignedPFX, LoadPFX, cert) to assert the LoadPFX call
succeeded by using require.NoError(t, err) (or equivalent) immediately after
calling LoadPFX and before using cert so failures surface clearly during test
setup.
- Around line 114-117: The fake server handler registered with
mux.HandleFunc("/join/entra/enroll", ...) currently ignores JSON decode errors
for enrollReq and sets fs.gotEnroll to a zero-value pointer; change the handler
to check the error returned by json.NewDecoder(r.Body).Decode(&req) and, if
non-nil, fail the test loudly (e.g., call t.Fatalf or equivalent) with the
decode error included, and only assign fs.gotEnroll = &req when decode succeeds
so downstream assertions see the real cause instead of empty values.

In `@client/internal/enroll/entradevice/provider.go`:
- Around line 110-130: The fallback in PFXProvider.SignNonce currently calls
p.signer.Sign(rand.Reader, digest[:], crypto.SHA256) which requests PKCS1v15 for
RSA signers but the comment says signers must handle RSA-PSS; update the
implementation and comment to be consistent: call p.signer.Sign(rand.Reader,
digest[:], &rsa.PSSOptions{Hash: crypto.SHA256}) for the default branch (so
opaque signers are requested to produce PSS like rsa.SignPSS) and adjust the
comment to reflect that RSA-PSS is required (or alternatively, if you intend to
allow PKCS1v15, change the comment to state PKCS1v15 fallback); reference
SignNonce, PFXProvider, p.signer.Sign and rsa.SignPSS when making the change.

In `@client/internal/enroll/entradevice/state.go`:
- Around line 11-36: The EntraEnrollState struct is duplicated between
EntraEnrollState in this file and profilemanager.EntraEnrollState; extract the
canonical type into a small shared package (e.g., enroll/entradevice/state) and
update both usages to import that package (replace the local struct declarations
with the shared type), or alternatively add a unit test that marshals the
enroller's EntraEnrollState and unmarshals into profilemanager.EntraEnrollState
(and vice versa) to detect drift; locate occurrences by the type name
EntraEnrollState and the profilemanager.EntraEnrollState reference and update
imports/usages accordingly.

In `@management/Dockerfile.entra-test`:
- Around line 38-44: The image currently runs as root (base FROM
debian:bookworm-slim) — create a non-root user and switch to it: add steps after
copying the binary (COPY --from=builder /out/netbird-mgmt
/usr/local/bin/netbird-mgmt) to create a dedicated user/group, chown the
/usr/local/bin/netbird-mgmt binary (and any runtime dirs) to that user, and set
USER to that non-root account so the container runs unprivileged; keep the
changes minimal and ensure permissions allow execution by the new user.

In `@management/server/integrations/entra_device/manager.go`:
- Around line 271-279: The handler calling PeerEnroller.EnrollEntraDevicePeer
(result, err := m.PeerEnroller.EnrollEntraDevicePeer(ctx, in)) must defensively
check for a nil result even when err == nil to avoid panics when accessing
result.PeerID; after the call, if result == nil { return nil,
NewError(CodeInternal, "peer enrolment returned nil result", nil) } (or include
context) so the manager method fails gracefully; update the method containing
that call to perform this nil-check before returning result.

In `@management/server/integrations/entra_device/nonce_store.go`:
- Around line 54-72: Expired nonces are only purged in Issue, so entries can
accumulate when the store is idle; update Consume (method on InMemoryNonceStore)
to increment ops and call gcLocked(time.Now().UTC()) under the same gcEvery
condition as Issue (use s.mu lock already used in Consume) so GC runs every N
operations even on consume paths; reference the existing fields/methods: s.ops,
s.gcEvery, s.mu, s.entries, Consume, Issue, and gcLocked to mirror the one-line
GC invocation pattern used in Issue.
- Around line 74-98: The Consume method in InMemoryNonceStore should use a
direct map lookup instead of the constant-time iteration: replace the loop that
uses subtle.ConstantTimeCompare with an existence check like e, ok :=
s.entries[nonce]; if !ok return false, nil; then delete(s.entries, nonce) and
check e.expiresAt as before; update the function Consume accordingly and remove
the unused crypto/subtle import. This keeps the existing mutex usage
(s.mu.Lock()/defer s.mu.Unlock()) and uses the entries map and e.expiresAt
identifiers to locate the changes.

In `@management/server/types/entra_device_auth.go`:
- Around line 160-170: NewEntraDeviceAuth currently sets Enabled: true even when
TenantID/ClientID/ClientSecret are empty; change the constructor to default
Enabled: false so new Entra integrations are created disabled until credentials
are validated and explicitly enabled (e.g., by the admin handler that calls
SaveEntraDeviceAuth). Update NewEntraDeviceAuth to set Enabled to false and
ensure any code paths that expect an enabled integration (such as the admin
enable flow) explicitly flip Enabled to true after validation; keep
/join/entra/challenge behavior unchanged as noted.

In `@tools/entra-test/enroll-tester/main.go`:
- Around line 143-147: The tester currently decodes the challenge nonce using
only base64.RawURLEncoding (in the block that decodes challenge.Nonce into
rawNonce); update that logic to mirror enroller.go by trying RawURLEncoding
first and if it fails, retry with base64.StdEncoding before calling die. Ensure
you reference the same variables and error path (challenge.Nonce, rawNonce, err)
and preserve the existing die("decode nonce: %v", err) behavior when both
decodes fail.

In `@tools/entra-test/seed-account/main.go`:
- Around line 110-120: The defaultDSN function currently returns hard-coded
local dev credentials (using envOrDefault) which can silently be used against a
non-dev DB; change it so the insecure fallback is only used when an explicit
opt-in is provided (e.g., add and check a boolean flag like
allowInsecureDefaults or a CLI flag `-i-know-this-is-insecure` before returning
the baked-in user/pass/db), or alternatively load the credentials from the
docker-compose/.env file instead of hard-coding; update defaultDSN to consult
that flag (or the parsed .env) and fail fast with a clear error if neither a DSN
nor explicit opt-in is present.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 41c325b7-7d63-40f4-b533-638100a0091a

📥 Commits

Reviewing files that changed from the base of the PR and between 19db31a and 9f04a0c.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (48)
  • client/cmd/entra_enroll.go
  • client/cmd/root.go
  • client/internal/enroll/entradevice/enroller.go
  • client/internal/enroll/entradevice/enroller_test.go
  • client/internal/enroll/entradevice/provider.go
  • client/internal/enroll/entradevice/state.go
  • client/internal/profilemanager/config.go
  • docs/ENTRA_DEVICE_AUTH.md
  • go.mod
  • idp/dex/config.go
  • idp/dex/provider.go
  • idp/dex/sqlite_cgo.go
  • idp/dex/sqlite_nocgo.go
  • management/Dockerfile.entra-test
  • management/server/entra_device_enroll.go
  • management/server/http/handler.go
  • management/server/http/handlers/entra_device_auth/handler.go
  • management/server/http/handlers/entra_device_auth/wiring.go
  • management/server/http/handlers/entra_join/handler.go
  • management/server/http/handlers/entra_join/handler_test.go
  • management/server/integrations/entra_device/README.md
  • management/server/integrations/entra_device/activity.go
  • management/server/integrations/entra_device/asn1.go
  • management/server/integrations/entra_device/cert_validator.go
  • management/server/integrations/entra_device/cert_validator_test.go
  • management/server/integrations/entra_device/errors.go
  • management/server/integrations/entra_device/graph_client.go
  • management/server/integrations/entra_device/manager.go
  • management/server/integrations/entra_device/manager_test.go
  • management/server/integrations/entra_device/nonce_store.go
  • management/server/integrations/entra_device/nonce_store_test.go
  • management/server/integrations/entra_device/resolution.go
  • management/server/integrations/entra_device/resolution_test.go
  • management/server/integrations/entra_device/sql_store.go
  • management/server/integrations/entra_device/store.go
  • management/server/integrations/entra_device/store_test.go
  • management/server/integrations/entra_device/types.go
  • management/server/permissions/modules/module.go
  • management/server/types/entra_device_auth.go
  • tools/entra-test/.gitignore
  • tools/entra-test/TESTING.md
  • tools/entra-test/config/management.json
  • tools/entra-test/docker-compose.yml
  • tools/entra-test/enroll-tester/demo.go
  • tools/entra-test/enroll-tester/main.go
  • tools/entra-test/enroll-tester/tls.go
  • tools/entra-test/make-pfx/main.go
  • tools/entra-test/seed-account/main.go
✅ Files skipped from review due to trivial changes (6)
  • tools/entra-test/.gitignore
  • go.mod
  • management/server/integrations/entra_device/nonce_store_test.go
  • tools/entra-test/docker-compose.yml
  • management/server/integrations/entra_device/types.go
  • tools/entra-test/TESTING.md
🚧 Files skipped from review as they are similar to previous changes (13)
  • management/server/integrations/entra_device/asn1.go
  • tools/entra-test/enroll-tester/demo.go
  • management/server/integrations/entra_device/activity.go
  • management/server/integrations/entra_device/resolution_test.go
  • management/server/permissions/modules/module.go
  • management/server/http/handlers/entra_device_auth/handler.go
  • tools/entra-test/make-pfx/main.go
  • management/server/http/handlers/entra_join/handler_test.go
  • client/internal/enroll/entradevice/enroller.go
  • management/server/integrations/entra_device/store_test.go
  • management/server/integrations/entra_device/sql_store.go
  • idp/dex/sqlite_nocgo.go
  • management/server/integrations/entra_device/manager_test.go

Comment thread client/internal/enroll/entradevice/provider.go Outdated
Comment thread docs/ENTRA_DEVICE_AUTH.md Outdated
Comment thread docs/ENTRA_DEVICE_AUTH.md Outdated
Comment thread management/server/entra_device_enroll.go
Comment thread management/server/http/handler.go Outdated
Comment thread management/server/integrations/entra_device/manager.go Outdated
Comment thread management/server/integrations/entra_device/README.md Outdated
Comment thread tools/entra-test/config/management.json Outdated
Comment thread tools/entra-test/config/management.json
Comment thread tools/entra-test/seed-account/main.go Outdated
Security / correctness fixes surfaced by CodeRabbit on PR netbirdio#5977:

* cert_validator: reject leaves with empty Subject CN instead of returning
  an empty EntraDeviceID; manager.validateCertAndDeviceID fails closed on
  the same condition as a second line of defence.
* graph_client: escape deviceID for OData single-quoted string literals in
  Device() and IsCompliant() to close an injection vector (pair with the
  dev-only TrustRoots=nil path). Pagination now fails closed when
  @odata.nextLink host differs from the configured base URL instead of
  silently truncating group enumeration.
* manager.consumeNonce: normalise the encoded nonce once so Consume and
  decodeNonceBytes see the same value (avoids burning the nonce on
  trailing whitespace).
* manager.enrollPeer -> issueBootstrapToken: on token-issue failure after
  peer creation, best-effort compensate by deleting the orphan peer.
  Required a new DeletePeer hook on the PeerEnroller interface, wired
  through DefaultAccountManager via the existing deletePeers() helper.
  Test doubles get a no-op stub.
* manager.enrollPeer: under union resolution, leave EntraDeviceMapping
  empty instead of misleading audit metadata with an arbitrary
  MatchedMappingIDs[0].
* MemoryStore: add sync.Mutex and lock every receiver so the doc-claimed
  goroutine-safety actually holds; the ConsumeBootstrapToken
  read+compare+delete is now atomic.
* SQLStore.ConsumeBootstrapToken: validate the supplied token (and prior
  existence) BEFORE deleting so a caller with a wrong token cannot evict
  the real entry.
* entra_join handler: switch to http.MaxBytesReader so oversized bodies
  produce a real 413 Payload Too Large instead of a silent 400
  invalid_json. realIP now only consults X-Forwarded-For / X-Real-IP when
  Handler.TrustForwardedHeaders is explicitly set (opt-in trusted-proxy
  gate for an unauthenticated endpoint); XFF left-most entry is taken,
  RemoteAddr has its port stripped.
* entra_device_enroll.parseIP: handle [host]:port IPv6 form via
  net.SplitHostPort.
* entra_device_auth.wiring: fail-closed in Install when Permissions is
  nil; the permit-all checker is now only reachable via an explicit
  InsecureAllowAllForTests opt-in.
* http.handler: drop the redundant /join/entra/* bypass registrations
  (those routes are on rootRouter and never cross authMiddleware); upgrade
  the best-effort install logs from Warnf to Errorf and include concrete
  types so a wiring mismatch is actually visible.
* client/entradevice.PFXProvider.DeviceID: normalise Subject CN the same
  way the server does (trim whitespace, strip a leading CN=) so the
  client-supplied device id and the server-extracted one cannot mismatch
  over whitespace.

Test harness + docs:

* management/Dockerfile.entra-test: healthcheck switched from
  GET /join/entra/challenge (stateful, allocates a single-use nonce every
  30s and 404s on fresh containers) to a stateless TCP probe against the
  cmux-multiplexed port.
* tools/entra-test/TESTING.md: replace stale 33081 references with the
  actual multiplexed port 33073 everywhere (instructions, curl examples,
  enroll-tester --url, troubleshooting).
* tools/entra-test/config/management.json: null out STUN/TURN/Signal
  entries that pointed at production wiretrustee.com hosts; add an
  explicit _comment noting the DataStoreEncryptionKey is publicly visible
  and must NOT be reused in production.
* tools/entra-test/seed-account: reuse the existing Account.Network on
  re-seed so re-running the harness mid-debug does not randomise the
  subnet out from under already-allocated peers.
* entra_device/README.md + docs/ENTRA_DEVICE_AUTH.md: refresh the stale
  Phase 1 checklist; add a "Known production gaps" section tracking
  plaintext ClientSecret, in-memory bootstrap tokens (HA/restart) and the
  TrustRoots=nil default as explicit follow-ups. Fix markdown lint
  (table pipe style / blank-line separators / fenced-code language
  identifier for the architecture diagram).
@thvevirtue
Copy link
Copy Markdown
Author

@mlsmaycon @braginini @pascal-fischer @lixmal — when one of you has a moment, could I get a maintainer look on this?

Status since the PR was opened:

  • CLA signed (green).
  • Snyk green.
  • SonarCloud was red on 18 findings; addressed in 9f04a0c (complexity refactors in resolution.go + manager.go; hard-coded-creds findings parametrised via NB_TEST_PG_* env vars). SonarCloud re-scan is currently queued against the new HEAD and should flip to green.
  • CodeRabbit posted 28 line-level comments; all addressed (or explicitly deferred with an inline acknowledgement) in 07bf8cf. Replies are on each thread.

The net of 07bf8cf is:

  • Correctness / security: reject cert leaves with empty CN (two layers of defence), OData-escape deviceID in Graph $filter, fail-closed on unknown Graph pagination hosts, validate-before-delete in ConsumeBootstrapToken (DoS guard), add sync.Mutex to MemoryStore for real goroutine-safety, normalise nonce before consume+decode, fail-closed wiring when Permissions == nil (opt-in InsecureAllowAllForTests for unit tests), drop redundant /join/entra bypass paths, upgrade the best-effort install log from Warnf to Errorf with the concrete %T.
  • Resilience: best-effort orphan-peer compensation when issueBootstrapToken fails after peer creation (new PeerEnroller.DeletePeer hook + deletePeers on the account manager).
  • HTTP hardening: http.MaxBytesReader for a real 413 instead of 400 invalid_json; realIP only consults XFF / X-Real-IP when TrustForwardedHeaders is explicitly set; RemoteAddr port stripped; parseIP handles [host]:port IPv6 form.
  • Client: PFXProvider.DeviceID normalises CN the same way the server does.
  • Test harness / docs: Dockerfile healthcheck switched to a TCP probe (no more nonce churn / 404-on-boot), stale 33081 references in TESTING.md replaced with the actual multiplexed 33073, management.json stripped of production wiretrustee.com references, seed-account preserves an existing Account.Network on re-seed. Docs now carry a Known production gaps section tracking the three items I consciously deferred (plaintext client_secret, in-memory bootstrap tokens in HA, TrustRoots=nil default).

Three known production gaps deliberately deferred to follow-ups so this PR stays reviewable:

  1. Encrypting EntraDeviceAuth.client_secret at rest (needs a gorm serializer or secret-ref column; schema migration).
  2. Persisting bootstrap tokens in the DB (needs a new table + hashed storage + GC job, for HA / restart resilience).
  3. Flipping the CertValidator.TrustRoots == nil default to fail-closed (needs the Entra device auth CA set plumbed through wiring; should land alongside the CA delivery, not as a one-line default flip).

All three are explicitly documented in docs/ENTRA_DEVICE_AUTH.md and the package README as follow-ups so nothing ships silently. Happy to do them as separate PRs — would you prefer them landed before merge, or tracked as issues after?

Full re-test matrix (server unit tests, handler integration tests, client tests, and the live-tenant verification matrix in docs/ENTRA_DEVICE_AUTH.md#live-tenant-verification-results) is still green.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
tools/entra-test/seed-account/main.go (1)

63-84: Re-seed resets untracked scalar fields on an existing account row.

db.Save(acct) with a freshly constructed types.Account will overwrite CreatedAt, DomainCategory, IsDomainPrimaryAccount, embedded DNSSettings, etc. to zero/now on every run. For a harness seeder this is usually acceptable, but if testers rely on stable CreatedAt or tweak other fields between runs they'll lose them silently. Consider using clause.OnConflict{DoNothing: true} (or a Where(...).Updates(...) on just the fields you own) when the account already exists, and/or preserving existing.CreatedAt when reusing existing.Network.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/entra-test/seed-account/main.go` around lines 63 - 84, The current
seeder constructs a new types.Account (acct) and calls db.Save(acct), which
overwrites fields like CreatedAt, DomainCategory, IsDomainPrimaryAccount and
embedded DNSSettings on re-seed; change the save logic to avoid clobbering
existing rows by either using a CREATE-ON-CONFLICT do-nothing/merge strategy
(use gorm's clause.OnConflict{DoNothing:true} or a targeted OnConflict.Update
with only the fields you intend to update) or by first loading the existing
account (e.g., query by Id), preserving fields such as existing.CreatedAt and
existing.Network when present, and then performing a selective Update/Save of
owned fields only instead of blindly db.Save(acct).
management/server/integrations/entra_device/graph_client.go (1)

162-223: OData escape + fail-closed pagination LGTM.

odataEscape correctly handles the single-quote case per OData v4, and the pagination branch now returns an error instead of silently truncating the group list — exactly what was needed to avoid under-enrolling a device.

One small optional nit: at lines 104/147 body, _ := io.ReadAll(resp.Body) drops the read error. If the body is truncated mid-stream you'd log/return a misleading "unexpected EOF during JSON decode" rather than the underlying network error. Low priority.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/graph_client.go` around lines 162
- 223, In graphGET, the code currently ignores the error from
io.ReadAll(resp.Body) (body, _ := io.ReadAll(resp.Body)); change this to capture
and handle the error (body, err := io.ReadAll(resp.Body)) and if err != nil
return or log a wrapped error (including response status and URL) instead of
proceeding to json decode — this ensures network/truncate errors surface (refer
to function graphGET and the resp.Body read) and avoids misleading "unexpected
EOF during JSON decode" messages.
management/server/integrations/entra_device/sql_store.go (1)

161-177: Validate-before-delete fix looks correct; consider constant-time comparison.

The DoS concern from the prior round (any caller with a known peerID burning the legitimate entry with a wrong token) is cleanly resolved by deleting only after the token matches. One follow-up worth considering: entry.token != token at Line 168 is a non-constant-time comparison. The practical risk is small because bootstrap tokens are 256 bits of entropy hex-encoded, but crypto/subtle.ConstantTimeCompare is the idiomatic default for secret equality and costs nothing.

🛡️ Proposed tightening
 import (
 	"context"
+	"crypto/subtle"
 	"errors"
 	"sync"
 	"time"
@@
-	if entry.token != token {
-		return false, nil
-	}
+	if subtle.ConstantTimeCompare([]byte(entry.token), []byte(token)) != 1 {
+		return false, nil
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/sql_store.go` around lines 161 -
177, Replace the plain string comparison in ConsumeBootstrapToken with a
constant-time comparison: convert both entry.token and the incoming token to
byte slices and use crypto/subtle.ConstantTimeCompare to decide equality,
keeping the current logic of deleting the map entry only after a successful
match (delete(s.tokens, peerID) remains where it is) and preserving the expiry
check that follows; update imports to include crypto/subtle.
management/server/entra_device_enroll.go (1)

108-152: Consider a warn-level log when the retry loop exhausts.

The IP/DNS-label retry loop (max 10 attempts) is sound, but a persistent exhaustion is operationally interesting — it almost always signals either a very narrow network.Net pool or a DNS-label hot-spot worth alerting on. The per-attempt Tracef at Line 144 is too quiet for that.

♻️ Optional refactor
 	if err != nil {
+		log.WithContext(ctx).Warnf("entra enrolment IP/label allocation exhausted after %d attempts for account %s: %v",
+			maxAttempts, input.AccountID, err)
 		return nil, fmt.Errorf("failed to add entra-enrolled peer after %d attempts: %w", maxAttempts, err)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/entra_device_enroll.go` around lines 108 - 152, The retry
loop allocating IP/DNS (for example variables maxAttempts, freeIP, freeLabel and
function isUniqueConstraintError) currently only traces per-attempt collisions
but does not warn when all attempts fail; modify the post-loop error path (the
block that checks if err != nil after the for-loop) to emit a warn-level log via
log.WithContext(ctx).WithFields(...) including dns_label, ip and the number of
attempts (maxAttempts) and the final error before returning the formatted error,
so exhausted-retry conditions are visible at warn level.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/ENTRA_DEVICE_AUTH.md`:
- Around line 35-37: The zero-touch onboarding example uses the wrong command:
replace the scheduled task invocation of "netbird up --management-url
https://<your-mgmt>/join/entra" with the one-shot Entra enrollment command
"netbird entra-enroll" (which reads the PFX and tenant flags and performs
enrollment against /join/entra); if the daemon should also run afterwards,
append a separate "netbird up" step to start the long-running agent. Ensure the
doc mentions the enrollment is one-shot (entra-enroll) and that "netbird up" is
used only to start the daemon.

In `@management/server/http/handlers/entra_join/handler.go`:
- Around line 20-33: The Handler's TrustForwardedHeaders field is never set by
the wiring layer; add a TrustForwardedHeaders bool to the Wiring struct used by
entra_device_auth.Install() and propagate it into the Handler after creating it
with NewHandler. Update the Wiring type to include TrustForwardedHeaders, change
the call site in wiring.go so that after handler := NewHandler(m) you set
handler.TrustForwardedHeaders = wiring.TrustForwardedHeaders (or pass it into
Install if that pattern is used), and ensure the Install function
accepts/forwards the Wiring so realIP() will honor configured
X-Forwarded-For/X-Real-IP behavior.

In `@management/server/integrations/entra_device/cert_validator.go`:
- Around line 95-105: The VerifyOptions currently sets KeyUsages to include
x509.ExtKeyUsageAny which disables EKU checks; update the opts construction in
cert_validator.go (the VerifyOptions passed to leaf.Verify) to remove
x509.ExtKeyUsageAny and use only x509.ExtKeyUsageClientAuth (i.e., KeyUsages:
[]x509.ExtKeyUsage{x509.ExtKeyUsageClientAuth}) so certificate EKU is enforced
when calling leaf.Verify(opts); adjust any related tests or callers expecting
lax EKU behavior.

---

Nitpick comments:
In `@management/server/entra_device_enroll.go`:
- Around line 108-152: The retry loop allocating IP/DNS (for example variables
maxAttempts, freeIP, freeLabel and function isUniqueConstraintError) currently
only traces per-attempt collisions but does not warn when all attempts fail;
modify the post-loop error path (the block that checks if err != nil after the
for-loop) to emit a warn-level log via log.WithContext(ctx).WithFields(...)
including dns_label, ip and the number of attempts (maxAttempts) and the final
error before returning the formatted error, so exhausted-retry conditions are
visible at warn level.

In `@management/server/integrations/entra_device/graph_client.go`:
- Around line 162-223: In graphGET, the code currently ignores the error from
io.ReadAll(resp.Body) (body, _ := io.ReadAll(resp.Body)); change this to capture
and handle the error (body, err := io.ReadAll(resp.Body)) and if err != nil
return or log a wrapped error (including response status and URL) instead of
proceeding to json decode — this ensures network/truncate errors surface (refer
to function graphGET and the resp.Body read) and avoids misleading "unexpected
EOF during JSON decode" messages.

In `@management/server/integrations/entra_device/sql_store.go`:
- Around line 161-177: Replace the plain string comparison in
ConsumeBootstrapToken with a constant-time comparison: convert both entry.token
and the incoming token to byte slices and use crypto/subtle.ConstantTimeCompare
to decide equality, keeping the current logic of deleting the map entry only
after a successful match (delete(s.tokens, peerID) remains where it is) and
preserving the expiry check that follows; update imports to include
crypto/subtle.

In `@tools/entra-test/seed-account/main.go`:
- Around line 63-84: The current seeder constructs a new types.Account (acct)
and calls db.Save(acct), which overwrites fields like CreatedAt, DomainCategory,
IsDomainPrimaryAccount and embedded DNSSettings on re-seed; change the save
logic to avoid clobbering existing rows by either using a CREATE-ON-CONFLICT
do-nothing/merge strategy (use gorm's clause.OnConflict{DoNothing:true} or a
targeted OnConflict.Update with only the fields you intend to update) or by
first loading the existing account (e.g., query by Id), preserving fields such
as existing.CreatedAt and existing.Network when present, and then performing a
selective Update/Save of owned fields only instead of blindly db.Save(acct).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 557eb50c-28ba-4ee1-b589-2050bfa49b1d

📥 Commits

Reviewing files that changed from the base of the PR and between 9f04a0c and 07bf8cf.

📒 Files selected for processing (19)
  • client/internal/enroll/entradevice/provider.go
  • docs/ENTRA_DEVICE_AUTH.md
  • management/Dockerfile.entra-test
  • management/server/entra_device_enroll.go
  • management/server/http/handler.go
  • management/server/http/handlers/entra_device_auth/wiring.go
  • management/server/http/handlers/entra_join/handler.go
  • management/server/http/handlers/entra_join/handler_test.go
  • management/server/integrations/entra_device/README.md
  • management/server/integrations/entra_device/cert_validator.go
  • management/server/integrations/entra_device/graph_client.go
  • management/server/integrations/entra_device/manager.go
  • management/server/integrations/entra_device/manager_test.go
  • management/server/integrations/entra_device/sql_store.go
  • management/server/integrations/entra_device/store.go
  • tools/entra-test/TESTING.md
  • tools/entra-test/config/management.json
  • tools/entra-test/enroll-tester/demo.go
  • tools/entra-test/seed-account/main.go
✅ Files skipped from review due to trivial changes (2)
  • management/server/integrations/entra_device/README.md
  • tools/entra-test/TESTING.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • client/internal/enroll/entradevice/provider.go
  • tools/entra-test/enroll-tester/demo.go

Comment thread docs/ENTRA_DEVICE_AUTH.md
Comment on lines +20 to +33
// Handler serves the /join/entra/* routes.
type Handler struct {
Manager *ed.Manager

// TrustForwardedHeaders enables reading X-Forwarded-For / X-Real-IP from
// inbound requests. Should ONLY be enabled when the management server
// sits behind a known-good reverse proxy that strips client-supplied
// instances of these headers; otherwise callers can dictate the source
// IP persisted on the peer (the /join/entra path is unauthenticated).
TrustForwardedHeaders bool
}

// NewHandler constructs a handler using the given manager.
func NewHandler(m *ed.Manager) *Handler { return &Handler{Manager: m} }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm the wiring layer sets Handler.TrustForwardedHeaders from config.
rg -nP -C3 'NewHandler\s*\(|TrustForwardedHeaders' --type=go

Repository: netbirdio/netbird

Length of output: 9852


🏁 Script executed:

cat -n management/server/http/handlers/entra_device_auth/wiring.go | sed -n '75,90p'

Repository: netbirdio/netbird

Length of output: 753


🏁 Script executed:

rg -nP 'joinHandler.*TrustForwardedHeaders|TrustForwardedHeaders.*=' --type=go -A2 -B2

Repository: netbirdio/netbird

Length of output: 43


🏁 Script executed:

head -n 100 management/server/http/handlers/entra_device_auth/wiring.go | tail -n 50

Repository: netbirdio/netbird

Length of output: 2050


🏁 Script executed:

rg -n 'TrustForwardedHeaders' --type=go

Repository: netbirdio/netbird

Length of output: 503


🏁 Script executed:

rg -n 'type Wiring struct' -A 20 management/server/http/handlers/entra_device_auth/

Repository: netbirdio/netbird

Length of output: 2278


🏁 Script executed:

rg -n 'entra_device_auth.Install|entra_device_auth\.Install' --type=go -B 3 -A 3

Repository: netbirdio/netbird

Length of output: 2687


🏁 Script executed:

rg -n 'entrajoin\.NewHandler|&entrajoin\.Handler' --type=go

Repository: netbirdio/netbird

Length of output: 256


🏁 Script executed:

sed -n '220,240p' management/server/http/handler.go

Repository: netbirdio/netbird

Length of output: 619


TrustForwardedHeaders is not wired from configuration—always defaults to false.

The TrustForwardedHeaders field exists and is correctly used in realIP(), but the wiring layer never sets it. Neither entra_device_auth.Install() (the production path) nor NewHandler() accept or configure this value, so deployments behind a reverse proxy will default to ignoring X-Forwarded-For / X-Real-IP headers. The Wiring struct passed to Install needs a TrustForwardedHeaders field, and the handler instantiation in wiring.go should assign it after calling NewHandler.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/http/handlers/entra_join/handler.go` around lines 20 - 33,
The Handler's TrustForwardedHeaders field is never set by the wiring layer; add
a TrustForwardedHeaders bool to the Wiring struct used by
entra_device_auth.Install() and propagate it into the Handler after creating it
with NewHandler. Update the Wiring type to include TrustForwardedHeaders, change
the call site in wiring.go so that after handler := NewHandler(m) you set
handler.TrustForwardedHeaders = wiring.TrustForwardedHeaders (or pass it into
Install if that pattern is used), and ensure the Install function
accepts/forwards the Wiring so realIP() will honor configured
X-Forwarded-For/X-Real-IP behavior.

Comment on lines +95 to +105
opts := x509.VerifyOptions{
Roots: v.TrustRoots,
Intermediates: intermediates,
CurrentTime: now,
KeyUsages: []x509.ExtKeyUsage{x509.ExtKeyUsageClientAuth, x509.ExtKeyUsageAny},
}
if _, err := leaf.Verify(opts); err != nil {
return nil, NewError(CodeInvalidCertChain,
"certificate chain did not verify against configured trust roots", err)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Go's crypto/x509 VerifyOptions.KeyUsages, what is the behavior of including ExtKeyUsageAny alongside ExtKeyUsageClientAuth? Does it match leaves with any EKU or only those with the anyExtendedKeyUsage OID?

💡 Result:

Including ExtKeyUsageAny alongside ExtKeyUsageClientAuth in VerifyOptions.KeyUsages sets anyKeyUsage to true because the loop checks for ExtKeyUsageAny and breaks upon finding it. When anyKeyUsage is true, the chain is accepted without calling checkChainForKeyUsage on the chain certificates. This means it matches leaves (and chains) with any EKU, as no specific EKU matching is performed on the chain. It does not restrict to only those with the anyExtendedKeyUsage OID (2.5.29.37); instead, ExtKeyUsageAny in KeyUsages overrides other specified usages like ExtKeyUsageClientAuth, accepting any key usage.

Citations:


Remove ExtKeyUsageAny from KeyUsages — it disables EKU validation entirely.

Including ExtKeyUsageAny in VerifyOptions.KeyUsages causes Go's Certificate.Verify to skip EKU checking altogether. The presence of ExtKeyUsageAny in the list sets an internal flag that bypasses the checkChainForKeyUsage check, accepting leaves with any EKU—not just those with the anyExtendedKeyUsage OID or ClientAuth. This means leaves signed for CodeSigning, EmailProtection, or any other purpose will pass verification. Use KeyUsages: []x509.ExtKeyUsage{x509.ExtKeyUsageClientAuth} to enforce strict device-client-auth validation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/integrations/entra_device/cert_validator.go` around lines
95 - 105, The VerifyOptions currently sets KeyUsages to include
x509.ExtKeyUsageAny which disables EKU checks; update the opts construction in
cert_validator.go (the VerifyOptions passed to leaf.Verify) to remove
x509.ExtKeyUsageAny and use only x509.ExtKeyUsageClientAuth (i.e., KeyUsages:
[]x509.ExtKeyUsage{x509.ExtKeyUsageClientAuth}) so certificate EKU is enforced
when calling leaf.Verify(opts); adjust any related tests or callers expecting
lax EKU behavior.

…spots)

SonarCloud's rescan on the previous review-round commit flagged 10 new
code smells + 2 medium security hotspots on the Dockerfile. Refactored
to bring each touched function back under the cognitive-complexity and
length thresholds (all in scope for this PR):

* management/server/integrations/entra_device/cert_validator.go:
  split Validate into decodeCertChain / checkTimeWindow / verifyChain /
  verifyProofOfPossession helpers; extract the 16-line ECDSA case in
  verifySignature into verifyECDSA (+ mirror RSA into verifyRSA).

* management/server/entra_device_enroll.go:
  split EnrollEntraDevicePeer into validateEnrollInput, buildEntraPeer,
  geoEnrichPeer, allocateEntraPeer (with pickDNSLabel + persistEntraPeerTx
  inner helpers), and emitEntraPeerAddedEvent. Same external behaviour;
  the orchestrator is now short enough to read top-to-bottom.

* client/cmd/entra_enroll.go:
  extracted the cobra RunE body into runEntraEnroll + seven single-
  responsibility helpers (preflightEntraEnroll, loadOrCreateProfileConfig,
  maybeSkipAlreadyEnrolled, derivedWGPubKey, performEntraEnrolment,
  persistEnrolmentState, printEnrolmentSuccess).

* tools/entra-test/enroll-tester/main.go:
  split the 121-line main into parseFlags, buildHTTPClient, mustMakeCert,
  mustMakeWGPubKey, fetchChallenge, signChallenge, postEnroll,
  printEnrolmentSuccess. Introduced a small testerOpts struct so
  helpers don't take a forest of parameters.

* client/internal/profilemanager/config.go / config_entra.go:
  moved EntraEnrollState (the ~20-line struct + godoc) out to
  config_entra.go so config.go drops under the 750-line threshold. No
  API change for callers.

Dockerfile hotspots (test harness only):

* management/Dockerfile.entra-test: add a system netbird:netbird user
  and a USER directive so the runtime stage doesn't run as root.
  Annotate the source-tree COPY with a // NOSONAR explaining that this
  Dockerfile is a dev-only harness (no secrets expected in-tree, not
  used by goreleaser / production image path).

Intentionally NOT addressed (pre-existing, not introduced by this PR,
documented here for the record):

* management/server/http/handler.go NewAPIHandler cognitive complexity
  + 23 parameters. Function existed before this PR; the only edit in
  this branch adds one call (installEntraDeviceAuth) that does not
  affect the signature.

All affected unit tests still pass (entra_device, entra_join,
client/entradevice). Production go build is clean under CGO_ENABLED=0
across management + client + tools.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
management/server/entra_device_enroll.go (1)

63-70: ⚠️ Potential issue | 🟠 Major

Orphan peer leak when GetValidatedPeerWithMap (or any post-persist step) fails.

By the time you reach lines 63–70 the peer is already in the DB (line 165–167), the activity event has been emitted (line 219), and the caller has no way to know the peer was created — EnrollEntraDevicePeer returns (nil, err), so the entra_device.Manager can't invoke DeletePeer for compensation (it only does so on bootstrap-token failure, and only with a PeerID from a successful EnrollPeerResult). A subsequent retry from the device will hit the "peer has already been registered" guard in validateEnrollInput (line 89–91) and fail permanently until an operator intervenes.

Since the comment at lines 297–302 already asserts that "the client will resync these on first Sync", the simplest fix is to make GetValidatedPeerWithMap best-effort — log and return the result with an empty PeerConfig/Checks. Alternatively, defer a compensation call on named error.

🛡️ Proposed fix — make network-map lookup best-effort so enrolment doesn't leak an orphan peer
-	_, netMap, checks, _, err := am.networkMapController.GetValidatedPeerWithMap(ctx, false, input.AccountID, newPeer)
-	if err != nil {
-		return nil, fmt.Errorf("failed to build network map for entra peer: %w", err)
-	}
-
-	return &ed.EnrollPeerResult{
-		PeerID:        newPeer.ID,
-		NetbirdConfig: netbirdConfigToMap(am, settings),
-		PeerConfig:    peerConfigToMap(newPeer, netMap),
-		Checks:        checksToMaps(checks),
-	}, nil
+	var netMap *types.NetworkMap
+	var checks any
+	if _, nm, cks, _, err := am.networkMapController.GetValidatedPeerWithMap(ctx, false, input.AccountID, newPeer); err != nil {
+		// Peer is already persisted; failing here would leak an orphan with no compensation.
+		// The client will resync on first gRPC Sync, so fall through with a minimal response.
+		log.WithContext(ctx).Warnf("failed to build network map for entra peer %s (client will resync): %v", newPeer.ID, err)
+	} else {
+		netMap, checks = nm, cks
+	}
+
+	return &ed.EnrollPeerResult{
+		PeerID:        newPeer.ID,
+		NetbirdConfig: netbirdConfigToMap(am, settings),
+		PeerConfig:    peerConfigToMap(newPeer, netMap),
+		Checks:        checksToMaps(checks),
+	}, nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/entra_device_enroll.go` around lines 63 - 70, The call to
GetValidatedPeerWithMap inside EnrollEntraDevicePeer can fail after the peer was
already persisted/announced, leaking an orphan peer; change the post-persist
behavior to be best-effort: instead of returning an error when
GetValidatedPeerWithMap(..., newPeer) fails, log the error (include
context/newPeer.ID) and continue by using an empty PeerConfig/empty checks and
nil/empty netMap so the successful peer creation is still returned; update the
error branch that currently does "return nil, fmt.Errorf(...)" to log and set
defaults so EnrollEntraDevicePeer returns the created peer result rather than an
error.
🧹 Nitpick comments (4)
client/cmd/entra_enroll.go (3)

297-313: Use path/filepath.Dir instead of a hand-rolled filepathDir.

The helper duplicates stdlib behavior and silently drops the portability guarantees filepath.Dir provides (trailing-separator handling, cleaning, platform-aware separator). It also triggers the stdlib-reuse lint rule. Swap to the standard library and delete the helper.

♻️ Suggested refactor
 import (
 	"context"
 	"fmt"
 	"net/url"
 	"os"
+	"path/filepath"
 	"strings"
 	"time"
-	if err := os.MkdirAll(filepathDir(configPath), 0o755); err != nil {
+	if err := os.MkdirAll(filepath.Dir(configPath), 0o755); err != nil {
 		return nil, fmt.Errorf("mkdir %s: %w", configPath, err)
 	}
-func filepathDir(p string) string {
-	for i := len(p) - 1; i >= 0; i-- {
-		if p[i] == '\\' || p[i] == '/' {
-			return p[:i]
-		}
-	}
-	return "."
-}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/cmd/entra_enroll.go` around lines 297 - 313, Replace the custom
filepathDir helper with the standard library path/filepath.Dir: call
filepath.Dir(configPath) where filepathDir(configPath) is used (in the
os.MkdirAll call) and remove the filepathDir function entirely; ensure you
import "path/filepath" (or add it to existing imports) so the platform-aware,
cleaned directory behavior replaces the hand-rolled logic.

196-204: Prefer errors.As over a direct type assertion for structured server errors.

err.(*entradevice.Error) only succeeds when Enrol returns the sentinel unwrapped. If entradevice.Enroller.Enrol ever wraps its structured error with fmt.Errorf("...: %w", err) (a common refactor), this branch silently falls through to the generic message and the stable Code stops being surfaced. errors.As is the idiomatic guard that survives wrapping.

♻️ Suggested refactor
-	state, err := en.Enrol(ctx)
-	if err != nil {
-		if structured, ok := err.(*entradevice.Error); ok {
-			cmd.PrintErrf("Enrolment rejected: %s (HTTP %d)\n  %s\n",
-				structured.Code, structured.HTTPStatus, structured.Message)
-			return nil, fmt.Errorf("enrolment failed: %s", structured.Code)
-		}
-		return nil, fmt.Errorf("enrolment failed: %w", err)
-	}
+	state, err := en.Enrol(ctx)
+	if err != nil {
+		var structured *entradevice.Error
+		if errors.As(err, &structured) {
+			cmd.PrintErrf("Enrolment rejected: %s (HTTP %d)\n  %s\n",
+				structured.Code, structured.HTTPStatus, structured.Message)
+			return nil, fmt.Errorf("enrolment failed: %s", structured.Code)
+		}
+		return nil, fmt.Errorf("enrolment failed: %w", err)
+	}

Don't forget to add "errors" to the import block.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/cmd/entra_enroll.go` around lines 196 - 204, Replace the direct type
assertion err.(*entradevice.Error) with errors.As to reliably detect a
structured entradevice.Error even if it’s wrapped; in the Enrol error handling
after calling en.Enrol(ctx) use errors.As(&structured) to extract the structured
error, then call cmd.PrintErrf with structured.Code, structured.HTTPStatus and
structured.Message and return fmt.Errorf("enrolment failed: %s",
structured.Code) as before; ensure the "errors" package is added to the import
block so errors.As is available.

279-279: Nit: parameter managementURL shadows the package-level CLI flag variable of the same name.

Both managementURL (root flag) and this parameter refer to the same value here, so behavior is unaffected, but the shadow makes the function's provenance ambiguous for later readers/refactors. Consider renaming the parameter (e.g., mgmtURL) for clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/cmd/entra_enroll.go` at line 279, The parameter name managementURL in
function directLoadOrCreateProfileConfig shadows the package-level CLI flag of
the same name; rename the parameter to a non-shadowing name (e.g., mgmtURL) in
directLoadOrCreateProfileConfig and update all callers to pass the same argument
name, keeping behavior unchanged but removing the ambiguous shadowing; ensure
references inside the function use the new parameter name and run tests/build to
confirm no symbol breakage.
management/server/entra_device_enroll.go (1)

323-328: checksToMaps always returns nil — consider simplifying the call site.

The function is a documented stub; the checks any parameter is deliberately discarded. Since it's guaranteed to return nil today, you could drop the checks capture in EnrollEntraDevicePeer (line 67) and set Checks: nil directly until posture checks actually get surfaced here. This removes a small amount of dead plumbing without changing behavior.

Non-blocking; safe to defer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/entra_device_enroll.go` around lines 323 - 328, The helper
checksToMaps always returns nil and discards its input; update the call site in
EnrollEntraDevicePeer to stop passing an unused value and set the Checks field
explicitly to nil instead of calling checksToMaps, removing the dead plumbing
(leave checksToMaps as a stub if you want to keep it for future use). Locate the
usage of checksToMaps in EnrollEntraDevicePeer and replace that expression with
nil for the Checks field, and remove any now-unused local variable that was only
used for that call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/cmd/entra_enroll.go`:
- Around line 156-163: The code prints "Pass --force to re-enrol." even when the
user provided --force; move that hint inside the non-force branch. In the block
that checks an existing enrollment (using profileName and cfg.EntraEnroll.*),
only call cmd.Println("Pass --force to re-enrol.") when entraForce is false (if
!entraForce { cmd.Println(...); return true, nil }), otherwise skip the hint and
proceed to the existing return (return false, nil).
- Around line 114-116: The error message in preflightEntraEnroll incorrectly
claims managementURL must end with "/join/entra" but no suffix check exists;
update preflightEntraEnroll to validate that managementURL is non-empty and
strings.HasSuffix(managementURL, "/join/entra") (using the managementURL
variable and importing strings if needed) and return
fmt.Errorf("--management-url is required and must end with /join/entra") when
the suffix check fails; alternatively, if you prefer not to enforce the suffix,
remove the suffix text from the error message so it only requires non-empty
managementURL.

In `@management/Dockerfile.entra-test`:
- Around line 66-67: The Dockerfile currently hardcodes ENTRYPOINT
["/usr/local/bin/netbird-mgmt", "management"] while the compose override in
tools/entra-test/docker-compose.yml also supplies "management", causing the
subcommand to be duplicated; to fix, choose one contract—preferred: remove the
"management" token from the Dockerfile ENTRYPOINT so ENTRYPOINT becomes just
["/usr/local/bin/netbird-mgmt"] and let CMD/compose (the command: list) supply
the full argv, or alternatively remove the leading "management" from the compose
command if you prefer ENTRYPOINT to pin the subcommand; update the Dockerfile
ENTRYPOINT or the compose command accordingly and verify startup args passed to
the mgmt binary are not duplicated.
- Line 32: The Dockerfile has an inline comment on the COPY instruction and a
duplicated positional argument in ENTRYPOINT: remove the inline "# NOSONAR" from
the COPY . . line and place it on its own preceding comment line so the COPY has
only its two arguments; then resolve the duplicated "management" argument by
either removing the positional "management" from the Dockerfile ENTRYPOINT (so
compose/command supplies it) or removing it from the compose command (so
ENTRYPOINT provides it) — target the COPY line and the ENTRYPOINT declaration to
implement one consistent source of the positional argument.

In `@management/server/entra_device_enroll.go`:
- Around line 250-255: The compensation hook currently swallows every error from
tx.GetPeerByID in ExecuteInTransaction; change it to only ignore the specific
"not found" case and return all other errors so real DB failures surface. Locate
the call to tx.GetPeerByID (inside ExecuteInTransaction in
entra_device_enroll.go) and replace the unconditional return nil with a
conditional that checks the NotFound sentinel/utility used elsewhere in the repo
(the same pattern as management/internals/modules/peers/manager.go:113 — e.g.,
store.IsNotFound(err) or errors.Is(err, store.ErrNotFound)); if the error is
NotFound, return nil, otherwise return the original err (and remove the nolint
comment).

---

Duplicate comments:
In `@management/server/entra_device_enroll.go`:
- Around line 63-70: The call to GetValidatedPeerWithMap inside
EnrollEntraDevicePeer can fail after the peer was already persisted/announced,
leaking an orphan peer; change the post-persist behavior to be best-effort:
instead of returning an error when GetValidatedPeerWithMap(..., newPeer) fails,
log the error (include context/newPeer.ID) and continue by using an empty
PeerConfig/empty checks and nil/empty netMap so the successful peer creation is
still returned; update the error branch that currently does "return nil,
fmt.Errorf(...)" to log and set defaults so EnrollEntraDevicePeer returns the
created peer result rather than an error.

---

Nitpick comments:
In `@client/cmd/entra_enroll.go`:
- Around line 297-313: Replace the custom filepathDir helper with the standard
library path/filepath.Dir: call filepath.Dir(configPath) where
filepathDir(configPath) is used (in the os.MkdirAll call) and remove the
filepathDir function entirely; ensure you import "path/filepath" (or add it to
existing imports) so the platform-aware, cleaned directory behavior replaces the
hand-rolled logic.
- Around line 196-204: Replace the direct type assertion
err.(*entradevice.Error) with errors.As to reliably detect a structured
entradevice.Error even if it’s wrapped; in the Enrol error handling after
calling en.Enrol(ctx) use errors.As(&structured) to extract the structured
error, then call cmd.PrintErrf with structured.Code, structured.HTTPStatus and
structured.Message and return fmt.Errorf("enrolment failed: %s",
structured.Code) as before; ensure the "errors" package is added to the import
block so errors.As is available.
- Line 279: The parameter name managementURL in function
directLoadOrCreateProfileConfig shadows the package-level CLI flag of the same
name; rename the parameter to a non-shadowing name (e.g., mgmtURL) in
directLoadOrCreateProfileConfig and update all callers to pass the same argument
name, keeping behavior unchanged but removing the ambiguous shadowing; ensure
references inside the function use the new parameter name and run tests/build to
confirm no symbol breakage.

In `@management/server/entra_device_enroll.go`:
- Around line 323-328: The helper checksToMaps always returns nil and discards
its input; update the call site in EnrollEntraDevicePeer to stop passing an
unused value and set the Checks field explicitly to nil instead of calling
checksToMaps, removing the dead plumbing (leave checksToMaps as a stub if you
want to keep it for future use). Locate the usage of checksToMaps in
EnrollEntraDevicePeer and replace that expression with nil for the Checks field,
and remove any now-unused local variable that was only used for that call.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cfdc177d-e899-4584-8451-647219381fba

📥 Commits

Reviewing files that changed from the base of the PR and between 07bf8cf and d8c14e8.

📒 Files selected for processing (7)
  • client/cmd/entra_enroll.go
  • client/internal/profilemanager/config.go
  • client/internal/profilemanager/config_entra.go
  • management/Dockerfile.entra-test
  • management/server/entra_device_enroll.go
  • management/server/integrations/entra_device/cert_validator.go
  • tools/entra-test/enroll-tester/main.go
✅ Files skipped from review due to trivial changes (3)
  • client/internal/profilemanager/config_entra.go
  • tools/entra-test/enroll-tester/main.go
  • management/server/integrations/entra_device/cert_validator.go

Comment on lines +114 to +116
if managementURL == "" {
return "", fmt.Errorf("--management-url is required (and must end with /join/entra)")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Message claims a suffix check that isn't performed.

The error says --management-url ... must end with /join/entra, but preflightEntraEnroll never validates the suffix. A user passing --management-url https://mgmt.example.com will bypass this check entirely and only hit a later opaque HTTP error. Either actually enforce the suffix or soften the message.

🛠️ Suggested fix
 	if managementURL == "" {
 		return "", fmt.Errorf("--management-url is required (and must end with /join/entra)")
 	}
+	if !strings.HasSuffix(strings.TrimRight(managementURL, "/"), entradevice.EnrolmentPathSuffix) {
+		return "", fmt.Errorf("--management-url %q must end with %s", managementURL, entradevice.EnrolmentPathSuffix)
+	}
 	return resolvePFXPassword()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if managementURL == "" {
return "", fmt.Errorf("--management-url is required (and must end with /join/entra)")
}
if managementURL == "" {
return "", fmt.Errorf("--management-url is required (and must end with /join/entra)")
}
if !strings.HasSuffix(strings.TrimRight(managementURL, "/"), entradevice.EnrolmentPathSuffix) {
return "", fmt.Errorf("--management-url %q must end with %s", managementURL, entradevice.EnrolmentPathSuffix)
}
return resolvePFXPassword()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/cmd/entra_enroll.go` around lines 114 - 116, The error message in
preflightEntraEnroll incorrectly claims managementURL must end with
"/join/entra" but no suffix check exists; update preflightEntraEnroll to
validate that managementURL is non-empty and strings.HasSuffix(managementURL,
"/join/entra") (using the managementURL variable and importing strings if
needed) and return fmt.Errorf("--management-url is required and must end with
/join/entra") when the suffix check fails; alternatively, if you prefer not to
enforce the suffix, remove the suffix text from the error message so it only
requires non-empty managementURL.

Comment on lines +156 to +163
cmd.Printf("Profile %q is already Entra-enrolled (peer %s, enrolled %s).\n",
profileName, cfg.EntraEnroll.PeerID,
cfg.EntraEnroll.EnrolledAt.Format(time.RFC3339))
cmd.Println("Pass --force to re-enrol.")
if !entraForce {
return true, nil
}
return false, nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

"Pass --force to re-enrol." is printed even when --force is supplied.

When the user already passed --force, the hint is misleading — we tell them to pass a flag they just passed, then proceed anyway. Gate the hint behind the non-force branch.

🛠️ Suggested fix
 	cmd.Printf("Profile %q is already Entra-enrolled (peer %s, enrolled %s).\n",
 		profileName, cfg.EntraEnroll.PeerID,
 		cfg.EntraEnroll.EnrolledAt.Format(time.RFC3339))
-	cmd.Println("Pass --force to re-enrol.")
 	if !entraForce {
+		cmd.Println("Pass --force to re-enrol.")
 		return true, nil
 	}
+	cmd.Println("--force supplied; re-enrolling.")
 	return false, nil
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cmd.Printf("Profile %q is already Entra-enrolled (peer %s, enrolled %s).\n",
profileName, cfg.EntraEnroll.PeerID,
cfg.EntraEnroll.EnrolledAt.Format(time.RFC3339))
cmd.Println("Pass --force to re-enrol.")
if !entraForce {
return true, nil
}
return false, nil
cmd.Printf("Profile %q is already Entra-enrolled (peer %s, enrolled %s).\n",
profileName, cfg.EntraEnroll.PeerID,
cfg.EntraEnroll.EnrolledAt.Format(time.RFC3339))
if !entraForce {
cmd.Println("Pass --force to re-enrol.")
return true, nil
}
cmd.Println("--force supplied; re-enrolling.")
return false, nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/cmd/entra_enroll.go` around lines 156 - 163, The code prints "Pass
--force to re-enrol." even when the user provided --force; move that hint inside
the non-force branch. In the block that checks an existing enrollment (using
profileName and cfg.EntraEnroll.*), only call cmd.Println("Pass --force to
re-enrol.") when entraForce is false (if !entraForce { cmd.Println(...); return
true, nil }), otherwise skip the hint and proceed to the existing return (return
false, nil).

Comment thread management/Dockerfile.entra-test Outdated
# image; no secrets are expected to live in the source tree and there is no
# production layer that ships this COPY — see management/Dockerfile for the
# release build path used by goreleaser.
COPY . . # NOSONAR
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Does the Docker BuildKit dockerfile:1.7 frontend support inline # comments on instruction lines, or only full-line comments?

💡 Result:

No, the Docker BuildKit dockerfile:1.7 frontend supports only full-line comments (lines starting with # after whitespace), not inline comments on instruction lines. Inline # markers are treated as part of the instruction arguments, causing parse errors or incorrect behavior, such as in JSON array instructions like COPY or RUN where they invalidate JSON or become invalid patterns. This aligns with the longstanding Dockerfile syntax specification: "# must be the first non-whitespace character on the line to be a comment." Older Docker versions sometimes tolerated inline comments due to bugs (e.g., ignoring trailing bytes after JSON), but recent BuildKit fixes enforce strict parsing, breaking previously working invalid Dockerfiles. The dockerfile:1.7 release notes and parser source confirm no changes to comment handling; new features focus on variable expansions and context handling, not comments.

Citations:


🏁 Script executed:

# Locate the Dockerfile
find . -name "Dockerfile.entra-test" -type f

Repository: netbirdio/netbird

Length of output: 96


🏁 Script executed:

# Read the problematic file
if [ -f management/Dockerfile.entra-test ]; then
  echo "=== File found, examining key lines ===";
  echo "--- Line 32 context (lines 28-36) ---";
  sed -n '28,36p' management/Dockerfile.entra-test | cat -n;
  echo "--- Line 66-67 context (lines 62-70) ---";
  sed -n '62,70p' management/Dockerfile.entra-test | cat -n;
else
  echo "File not found";
fi

Repository: netbirdio/netbird

Length of output: 972


🏁 Script executed:

# Check if there's a docker-compose file that references this Dockerfile
find . -name "docker-compose*" -type f | head -10

Repository: netbirdio/netbird

Length of output: 252


🏁 Script executed:

# Check the docker-compose that might reference the Dockerfile.entra-test
cat ./tools/entra-test/docker-compose.yml

Repository: netbirdio/netbird

Length of output: 2588


Inline # NOSONAR on the COPY line violates Dockerfile syntax — remove it.

Docker only recognizes # as a comment when it appears at the start of a line (after optional whitespace). Inline comments are not supported; COPY . . # NOSONAR is parsed as a 4-argument command with sources ., ., #, and destination NOSONAR, which is invalid. Move the suppression to its own comment line above the instruction.

Additionally, the ENTRYPOINT already includes "management" as a positional argument (line 66). When using the docker-compose override (tools/entra-test/docker-compose.yml), the command also starts with "management", causing the process to receive management management --log-file .... Either the ENTRYPOINT should omit the positional argument and rely on CMD/compose override, or the compose command should not repeat it.

🛠️ Suggested fix for line 32
# Copy the entire repo into the build stage. This is a dev-only harness
# image; no secrets are expected to live in the source tree and there is no
# production layer that ships this COPY — see management/Dockerfile for the
# release build path used by goreleaser.
+# NOSONAR
-COPY . . # NOSONAR
+COPY . .
🧰 Tools
🪛 Hadolint (2.14.0)

[error] 32-32: COPY with more than 2 arguments requires the last argument to end with /

(DL3021)

🪛 Trivy (0.69.3)

[error] 32-32: COPY with more than two arguments not ending with slash

Slash is expected at the end of COPY command argument 'NOSONAR'

Rule: DS-0011

Learn more

(IaC/Dockerfile)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/Dockerfile.entra-test` at line 32, The Dockerfile has an inline
comment on the COPY instruction and a duplicated positional argument in
ENTRYPOINT: remove the inline "# NOSONAR" from the COPY . . line and place it on
its own preceding comment line so the COPY has only its two arguments; then
resolve the duplicated "management" argument by either removing the positional
"management" from the Dockerfile ENTRYPOINT (so compose/command supplies it) or
removing it from the compose command (so ENTRYPOINT provides it) — target the
COPY line and the ENTRYPOINT declaration to implement one consistent source of
the positional argument.

Comment on lines +66 to +67
ENTRYPOINT ["/usr/local/bin/netbird-mgmt", "management"]
CMD ["--log-file", "console", "--log-level", "info"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

ENTRYPOINT + compose command double-specifies the management subcommand.

ENTRYPOINT already appends management (line 66), and tools/entra-test/docker-compose.yml overrides CMD with a list that also starts with "management". The process therefore gets invoked as:

/usr/local/bin/netbird-mgmt management management --log-file console --log-level debug --port 33073 --config ...

Cobra today may silently accept the extra positional arg (which is likely why e2e "works"), but this is fragile — the moment mgmtCmd gets an Args: cobra.NoArgs/ExactArgs validator, or someone copies the compose command expecting it to be self-contained, things break.

Pick one side of the contract:

  • Option A (preferred — keep compose self-describing): drop management from the ENTRYPOINT so CMD/compose command owns the full argv.
    -ENTRYPOINT ["/usr/local/bin/netbird-mgmt", "management"]
    -CMD ["--log-file", "console", "--log-level", "info"]
    +ENTRYPOINT ["/usr/local/bin/netbird-mgmt"]
    +CMD ["management", "--log-file", "console", "--log-level", "info"]
  • Option B: keep the ENTRYPOINT as-is and remove the leading "management" from tools/entra-test/docker-compose.yml's command: list.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ENTRYPOINT ["/usr/local/bin/netbird-mgmt", "management"]
CMD ["--log-file", "console", "--log-level", "info"]
ENTRYPOINT ["/usr/local/bin/netbird-mgmt"]
CMD ["management", "--log-file", "console", "--log-level", "info"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/Dockerfile.entra-test` around lines 66 - 67, The Dockerfile
currently hardcodes ENTRYPOINT ["/usr/local/bin/netbird-mgmt", "management"]
while the compose override in tools/entra-test/docker-compose.yml also supplies
"management", causing the subcommand to be duplicated; to fix, choose one
contract—preferred: remove the "management" token from the Dockerfile ENTRYPOINT
so ENTRYPOINT becomes just ["/usr/local/bin/netbird-mgmt"] and let CMD/compose
(the command: list) supply the full argv, or alternatively remove the leading
"management" from the compose command if you prefer ENTRYPOINT to pin the
subcommand; update the Dockerfile ENTRYPOINT or the compose command accordingly
and verify startup args passed to the mgmt binary are not duplicated.

Comment on lines +250 to +255
return e.am.Store.ExecuteInTransaction(ctx, func(tx store.Store) error {
peer, err := tx.GetPeerByID(ctx, store.LockingStrengthNone, accountID, peerID)
if err != nil {
// Peer already gone is not an error; we're compensating.
return nil //nolint:nilerr // quiet on "already gone"
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Inspect how other Store methods classify not-found errors for GetPeerByID specifically,
# to pick the right predicate (status.NotFound type, errors.Is(gorm.ErrRecordNotFound), etc.)
rg -nP --type=go -C3 'GetPeerByID\s*\(' -g '!**/*_test.go' | head -80
echo "---"
# How other compensation / cleanup sites in the repo distinguish "already gone":
rg -nP --type=go -C2 'status\.NotFound|ErrRecordNotFound' -g '!**/*_test.go' | head -40

Repository: netbirdio/netbird

Length of output: 8755


🏁 Script executed:

#!/bin/bash
# Get the full GetPeerByID implementation in sql_store.go
sed -n '3263,3280p' management/server/store/sql_store.go
echo "---"
# Get the full DeletePeer function in entra_device_enroll.go
sed -n '242,261p' management/server/entra_device_enroll.go

Repository: netbirdio/netbird

Length of output: 1552


🏁 Script executed:

#!/bin/bash
# Search for other compensation/cleanup patterns that check for NotFound errors
rg -nP --type=go 'ExecuteInTransaction.*error' -A5 | grep -A5 'GetPeerByID\|GetAccountByID\|GetDomain' | head -60
echo "---"
# See what error predicates are used with GetPeerByID specifically
rg -nP --type=go 'GetPeerByID.*if err' -A2 | head -40

Repository: netbirdio/netbird

Length of output: 1967


🏁 Script executed:

#!/bin/bash
# Verify the status package import and NotFound type usage in entra_device_enroll.go
head -30 management/server/entra_device_enroll.go | grep -E '^import|status'
echo "---"
# Double-check that status.NotFound is the right constant
rg -nP --type=go 'status\.NotFound' -B2 -A2 | head -20

Repository: netbirdio/netbird

Length of output: 1536


🏁 Script executed:

#!/bin/bash
# Verify fmt package is imported in entra_device_enroll.go (needed for the fix)
grep -E '^import|"fmt"' management/server/entra_device_enroll.go | head -20

Repository: netbirdio/netbird

Length of output: 77


Narrow error suppression to NotFound only; propagate real DB failures.

The compensation hook swallows all GetPeerByID errors, including transient DB failures, timeouts, and context cancellation. This masks real failures and leaves orphaned peers in the database. The comment correctly identifies the intent ("peer already gone"), but the implementation doesn't verify it.

Use the pattern already established elsewhere in the codebase (e.g., management/internals/modules/peers/manager.go:113): check if the error is specifically a NotFound type, and propagate anything else.

🛡️ Proposed fix
 		peer, err := tx.GetPeerByID(ctx, store.LockingStrengthNone, accountID, peerID)
 		if err != nil {
-			// Peer already gone is not an error; we're compensating.
-			return nil //nolint:nilerr // quiet on "already gone"
+			if e, ok := status.FromError(err); ok && e.Type() == status.NotFound {
+				return nil // peer already gone — compensation is a no-op
+			}
+			return fmt.Errorf("lookup peer for entra compensation: %w", err)
 		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return e.am.Store.ExecuteInTransaction(ctx, func(tx store.Store) error {
peer, err := tx.GetPeerByID(ctx, store.LockingStrengthNone, accountID, peerID)
if err != nil {
// Peer already gone is not an error; we're compensating.
return nil //nolint:nilerr // quiet on "already gone"
}
return e.am.Store.ExecuteInTransaction(ctx, func(tx store.Store) error {
peer, err := tx.GetPeerByID(ctx, store.LockingStrengthNone, accountID, peerID)
if err != nil {
if e, ok := status.FromError(err); ok && e.Type() == status.NotFound {
return nil // peer already gone — compensation is a no-op
}
return fmt.Errorf("lookup peer for entra compensation: %w", err)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/entra_device_enroll.go` around lines 250 - 255, The
compensation hook currently swallows every error from tx.GetPeerByID in
ExecuteInTransaction; change it to only ignore the specific "not found" case and
return all other errors so real DB failures surface. Locate the call to
tx.GetPeerByID (inside ExecuteInTransaction in entra_device_enroll.go) and
replace the unconditional return nil with a conditional that checks the NotFound
sentinel/utility used elsewhere in the repo (the same pattern as
management/internals/modules/peers/manager.go:113 — e.g., store.IsNotFound(err)
or errors.Is(err, store.ErrNotFound)); if the error is NotFound, return nil,
otherwise return the original err (and remove the nolint comment).

Round out docs/ENTRA_DEVICE_AUTH.md with the operator-facing content
that was missing, refresh stale status text, and drop a lingering
reference to an internal design artifact that should never have
survived scrubbing.

* Refreshed TL;DR header to reflect final state: server + client
  (PFX provider) complete, live-tested against a real Entra tenant,
  with explicit cross-links to Known production gaps and the new
  Production readiness checklist.

* Added a "Deploying device certificates via Intune (PKCS Certificate
  profile)" section so the mechanism for getting the cert onto the
  device is documented end-to-end (subject name CN={{AAD_Device_ID}},
  TPM-preferred KSP, key usage, extended key usage, CA trust path to
  plumb into CertValidator.TrustRoots, assignment).

* Added a "Troubleshooting" section with a per-error-code matrix
  (most likely cause + where to look) + client-side and server-side
  diagnostic tips (--log-level debug, enroll-tester, activity events).

* Added a "Production readiness checklist" so operators can sign off
  on Entra app perms, client_secret handling, Intune PKCS profile,
  TrustRoots wiring, mapping_resolution choice, trusted-proxy policy,
  HA single-node caveat, activity-log sink, and alerting.

* Expanded "Security notes" to list the concrete mitigations the
  code implements today (OData escape, fail-closed pagination, empty
  CN rejection, opt-in XFF, 512 KiB body cap with 413, bootstrap-token
  validate-before-delete, atomic rejection paths).

* Refreshed "Current implementation status" table: Graph client is no
  longer "not yet run against a live tenant"; Client PFX provider +
  CLI row added; three explicit follow-up rows for the known gaps.

* Dropped the stale "See the Open design decisions in the plan
  document" sentence (superseded by the Known production gaps
  section) and the "see the original design doc artifact
  (create_plan)" reference in Further reading.

No code changes. All affected unit tests still green; build clean.
Drives a real httptest server with both the admin CRUD handler and the device-facing /join/entra handler wired onto the same router, then walks through:

  1. Admin PUTs the integration config (tenant/client/secret, mapping_resolution, revalidation_interval).
  2. Admin POSTs a mapping (Entra group -> NetBird auto-groups).
  3. Device GETs /join/entra/challenge to obtain a one-shot nonce.
  4. Device signs the nonce with a self-signed RSA leaf cert (CN = device GUID, RSA-PSS / SHA-256 to match the production CertValidator) and POSTs /join/entra/enroll.
  5. Server resolves the mapping, the fake PeerEnroller records the call, the device receives a peer config + bootstrap token.
  6. Replaying the same nonce produces a 4xx (single-use semantics).
  7. Admin reads back integration (secret masked as ********), lists mappings, updates and deletes the mapping (404 on subsequent GET).

A second test confirms that flipping Enabled=false on the integration causes /join/entra/enroll to fail with CodeIntegrationDisabled while leaving the row in place.

Microsoft Graph and the AccountManager are stubbed (external dependencies that can't be exercised without a live tenant); every other layer runs the production code path.
@thvevirtue
Copy link
Copy Markdown
Author

Ready for final review 🙏

This PR has now been through three substantive rounds:

  1. Initial implementation — server-side Entra device authentication: cert validator, Graph client, nonce store, mapping resolver, manager, persistence, admin CRUD, device-facing /join/entra/{challenge,enroll} endpoints, client-side enroller, docs, wiring, permissions module.
  2. CodeRabbit / Sonar pass — addressed all 23 actionable comments from the prior review (security fixes in cert_validator and Graph OData escaping, concurrency fixes in the in-memory + SQL stores, HTTP handler hardening with MaxBytesReader + XFF parsing notes, fail-closed wiring on missing permissions, client-side CN normalisation, lint, docs, healthcheck and seed-account fixes).
  3. End-to-end verification (latest commit e22f666) — added management/server/http/handlers/entra_device_auth/e2e_test.go. Spins up a real httptest server with the admin and device routers wired together and walks the entire flow:
    • Admin PUT /api/integrations/entra-device-auth (config) → POST /mappings
    • Device GET /join/entra/challenge → signs the nonce with a self-signed RSA leaf cert (CN=device-guid, RSA-PSS / SHA-256 to match the production CertValidator) → POST /join/entra/enroll
    • Asserts: peer config + bootstrap token returned, mapping resolution applied the right auto_groups, the same nonce can't be replayed, the admin readback masks the client_secret as ********, full CRUD round-trip on mappings.
    • Second test confirms Enabled=false causes enrolment to fail with CodeIntegrationDisabled while the row is preserved.

Microsoft Graph and AccountManager.AddPeer are stubbed (they require a live Entra tenant); every other layer runs production code.

Test results

ok  github.com/netbirdio/netbird/management/server/integrations/entra_device       1.305s
ok  github.com/netbirdio/netbird/management/server/http/handlers/entra_join        0.340s
ok  github.com/netbirdio/netbird/management/server/http/handlers/entra_device_auth 0.247s
ok  github.com/netbirdio/netbird/client/internal/enroll/entradevice                1.258s

Companion dashboard PR

PR is mergeable; REVIEW_REQUIRED is the only outstanding gate. Tagging a couple of recent maintainers for visibility — apologies for the ping.

cc @mlsmaycon @pascal-fischer

Two issues blocked the gate on PR netbirdio#5977; both are fixed by this commit.

1. management/server/http/handler.go: NewAPIHandler had grown to 23 parameters (go:S107) and had cognitive complexity 24 (go:S3776). Refactored to take a single APIHandlerOptions struct; extracted addBypassPaths(), buildRateLimiterFromEnv(), parseIntEnv(), and registerCoreEndpoints() so the function body is now mostly straight-line wiring. Updated the two existing call sites (boot.go + the channel test harness) to pass APIHandlerOptions. No behavioural change.

2. management/Dockerfile.entra-test: replaced the recursive 'COPY . .' (docker:S6470 hotspot) with explicit per-directory COPYs for each top-level Go source dir that 'go build ./management' actually needs. The previous NOSONAR comment did not suppress the hotspot because Sonar reads each '.' token in 'COPY . .' as a separate finding.
@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants