Skip to content

feat: enhance telemetry module with container runtime detection and p…#1320

Open
ytallo wants to merge 8 commits intomainfrom
feat/amplitude-prd-telemetry
Open

feat: enhance telemetry module with container runtime detection and p…#1320
ytallo wants to merge 8 commits intomainfrom
feat/amplitude-prd-telemetry

Conversation

@ytallo
Copy link
Contributor

@ytallo ytallo commented Mar 17, 2026

…roject context management

  • Added container_runtime field to EnvironmentInfo struct to capture the container runtime type.
  • Implemented detect_container_runtime function to identify the runtime based on environment variables and system checks.
  • Introduced project context management with functions to find project root and read project configuration from project.ini.
  • Updated telemetry data collection to include project ID and name.
  • Refactored telemetry snapshot to provide a flat JSON structure for easier access to metrics.
  • Enhanced CLI update functions to send telemetry events for update start, success, and failure.

Summary by CodeRabbit

  • New Features

    • CLI, engine, and installers now emit richer telemetry: install/update lifecycle (start/success/failure), first-run, heartbeats, and background events with project, worker/function, and host-user context.
    • Installers export a host-user identifier to shell profiles and report install/upgrade lifecycle and failure telemetry.
    • Environment detection improved (container runtime, install method, env, timezone); telemetry identity persisted and migrated to a TOML-backed store.
  • Chores

    • Added telemetry-related runtime dependencies, config helpers, and container/host environment variable wiring to enable telemetry.

@vercel
Copy link
Contributor

vercel bot commented Mar 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
iii-website Ready Ready Preview, Comment Mar 19, 2026 6:52pm
motia-docs Ready Ready Preview, Comment Mar 19, 2026 6:52pm

Request Review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 17, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds end-to-end Amplitude telemetry: new CLI telemetry module and deps, CLI update instrumentation, expanded engine telemetry (env/config/project detection, flattened snapshots, typed collectors, first-run tracking), installer telemetry, and compose/Docker env wiring.

Changes

Cohort / File(s) Summary
CLI Telemetry & Wiring
cli/Cargo.toml, cli/src/main.rs, cli/src/telemetry.rs
Adds uuid/toml deps and serial_test dev-dep; introduces telemetry module that builds Amplitude payloads, persists telemetry id in ~/.iii/telemetry.toml (with legacy migration), and exposes fire-and-forget send APIs for update lifecycle events.
CLI Update Hooks
cli/src/update.rs
Instruments update and self-update flows to emit start/success/failure telemetry, refactors download/install error handling to emit failure events and return explicit errors.
Engine Telemetry: Environment & Config
engine/src/modules/telemetry/environment.rs, engine/Cargo.toml
Adds TOML config helpers (read/write), container runtime detection, install-method/env/host-user-id detection, container_runtime field on EnvironmentInfo, and opt-out via config; adds toml dep.
Engine Telemetry: Collector & Module
engine/src/modules/telemetry/collector.rs, engine/src/modules/telemetry/mod.rs
Collector snapshot shape changed to flat top-level counters; telemetry module refactored to typed collectors (FunctionTriggerData, WorkerData), project context resolution (.iii/project.ini), first-run detection/marking, flattened user_properties, new heartbeat/started event schemas, and many test updates.
Installer Scripts
cli/install.sh, engine/install.sh
Adds Amplitude config and telemetry helpers (UUID, TOML persistence, send-event), migrates legacy telemetry id file to TOML, emits install/upgrade lifecycle events (start/success/failure), exports host-user-id for compose correlation, and hooks telemetry into error paths.
Container / Compose
engine/Dockerfile, engine/docker-compose.yml, engine/docker-compose.prod.yml
Sets III_CONTAINER and III_ENV in Dockerfile and adds III_HOST_USER_ID env passthrough in compose files for host-user propagation.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI Process
    participant Telemetry as CLI Telemetry Module
    participant FS as File System (~/.iii)
    participant Amplitude as Amplitude API

    CLI->>Telemetry: send_cli_update_started(target, from_version)
    Telemetry->>FS: read `telemetry.toml` / env vars (III_*)
    alt no telemetry_id
        Telemetry->>FS: write generated telemetry_id (UUID v4)
    end
    Telemetry->>Telemetry: build Amplitude payload (device_id, user_props, event_props)
    Telemetry->>Amplitude: POST /2/httpapi (async, 5s timeout)
    Amplitude-->>Telemetry: response (ignored)
Loading
sequenceDiagram
    participant Engine as Engine
    participant Project as Project Discovery (.iii/project.ini)
    participant Env as EnvironmentInfo
    participant Collector as Telemetry Collector
    participant Amplitude as Amplitude API

    Engine->>Project: find_project_root & read_project_ini()
    Project-->>Engine: ProjectContext (id, name)
    Engine->>Env: EnvironmentInfo::collect() -> includes container_runtime, host_user_id
    Engine->>Collector: collect_functions_and_triggers(), collect_worker_data()
    Collector-->>Engine: FunctionTriggerData, WorkerData
    Engine->>Amplitude: POST engine_started / heartbeat / first_run (async)
    Amplitude-->>Engine: response (ignored)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • sergiofilhowz
  • guibeira
  • andersonleal

Poem

🐰 I hop with bytes and tiny beats,
I tuck a UUID beneath your feets,
From installer hum to engine song,
Events hop off, quick and strong,
Project, host, and OS — all neat! 🥕📡

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title is truncated and incomplete (ends with 'and p…'), making it unclear what the full feature is. However, it is clearly related to the main changes (telemetry enhancement, container runtime detection, and project context management) but lacks specificity due to truncation. Complete the title to clearly state the full feature scope. For example: 'feat: enhance telemetry module with container runtime detection and project context management'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 86.23% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/amplitude-prd-telemetry
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (1)
cli/src/telemetry.rs (1)

182-202: Make these tests hermetic.

clear_opt_out_vars() removes real process env vars without restoring them, and the stability test uses the user's actual telemetry ID file under HOME. That couples the suite to machine state and can interfere with unrelated tests; please route env/path lookup through temp-scoped test fixtures here.

Also applies to: 304-309

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/src/telemetry.rs` around lines 182 - 202, Tests are mutating real process
state: clear_opt_out_vars() currently removes environment variables unsafely and
tests read/write the user's telemetry ID under HOME; make tests hermetic by (1)
changing test code to set up a temp directory (TempDir) and point any
HOME/telemetry path resolution to that directory rather than the real HOME, (2)
avoid unscoped env removal by saving and restoring all affected env vars (the
ones removed by clear_opt_out_vars) around the test or better yet refactor
clear_opt_out_vars() into a helper that accepts an env map so tests can pass a
sandboxed map, and (3) apply the same pattern to the other affected test helpers
referenced in the diff (the stability test and the code touching the telemetry
ID file) so all env/path lookups use the temp-scoped fixture and original env is
restored after the test finishes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cli/src/telemetry.rs`:
- Around line 42-49: Generate or load the stable telemetry ID only after it is
actually persisted: first try to read an existing ID from path and return it if
present; if not, create the parent dir (std::fs::create_dir_all), generate id
(uuid::Uuid::new_v4()), write it to a temp file (path.with_extension("tmp") and
std::fs::write) and then atomically rename the tmp to path (std::fs::rename);
only return the new id if the rename succeeded, otherwise attempt to read any
newly-created path file and return that (or propagate an error/None) so you
never return a generated id that failed to be persisted. Ensure you reference
the variables id, path, tmp and the calls create_dir_all, write and rename when
making the changes.
- Around line 122-135: The send_fire_and_forget function currently spawns a
detached task so telemetry is lost on process::exit; change send_fire_and_forget
into an async function (e.g., async fn send_telemetry(api_key: String, event:
AmplitudeEvent) -> Result<(), reqwest::Error>) that performs the same payload
construction and client.post(...).send().await and returns Result indicating
success/failure, remove tokio::spawn usage, and update callers (e.g., the update
flow that called send_fire_and_forget) to await this future with a tight timeout
using tokio::time::timeout(Duration::from_secs(5),
send_telemetry(...)).map_err/handle the timeout so the CLI doesn’t block
indefinitely but ensures the event is sent before exit.

In `@cli/src/update.rs`:
- Around line 211-213: The telemetry call is currently sending human-facing
error text via e.to_string(); instead, change telemetry::send_cli_update_failed
to receive a stable error category/code (not a Display string) and pass a
concise enum/key (e.g., an Update error kind or Download error category) so
analytics uses stable dimensions; keep the full e.to_string() for the local
Err(UpdateError::Download(e)) return or local logging. Update the same pattern
at the other occurrence mentioned (the block around lines 293-295) so both
telemetry calls send a stable error code/category rather than the verbatim error
message, referencing telemetry::send_cli_update_failed, UpdateError::Download,
spec.name, and from_version_str to locate the sites.
- Around line 188-193: Move the telemetry start event earlier so it's emitted
before any release lookup/version parsing/asset resolution; compute the
from_version_str from previous_version and call
telemetry::send_cli_update_started(spec.name, &from_version_str) at the very
beginning of the update flow (before the code that resolves releases/assets and
before functions like resolve_release or parse_version are invoked), and mirror
the same change where another send_cli_update_started is currently at lines
271-273 so both start events occur prior to any early-returning failure paths;
ensure existing send_cli_update_failed calls remain unchanged so failures are
still reported.

In `@engine/Dockerfile`:
- Around line 6-7: The Dockerfile currently hardcodes runtime telemetry defaults
via ENV III_CONTAINER=docker and ENV III_ENV=development which causes
engine/src/modules/telemetry/environment.rs to read and report incorrect runtime
values; remove these ENV lines from the Dockerfile (or leave them unset/empty)
so the runtime picks up authoritative values from the deployment environment
(k8s/pod or compose overrides) rather than baking "docker" and "development"
into the image; ensure any CI/deploy manifests set III_CONTAINER and III_ENV
explicitly where needed instead.

In `@engine/src/modules/telemetry/mod.rs`:
- Around line 951-989: Tests calling resolve_project_context() currently unset
III_PROJECT_ROOT which makes them depend on the repo CWD; instead, in the tests
(e.g., test_resolve_project_context_env_fallback,
test_resolve_project_context_sdk_telemetry_project_name,
test_resolve_project_context_none_when_unset and the similar block around lines
1270-1304) set III_PROJECT_ROOT to a fresh temporary directory (create a temp
dir per test and set env::set_var("III_PROJECT_ROOT", temp_dir.path())) so
resolve_project_context() cannot find a real .iii/project.ini; revert or remove
the temp dir afterwards and keep any existing III_PROJECT_ID handling as-is.
- Around line 379-381: The code currently takes
environment::detect_host_user_id() and sets props["host_user_id"] =
serde_json::Value::String(host_user_id) directly; instead, pseudonymize this
value before sending by hashing or HMACing it (e.g., compute a SHA-256 or
HMAC-SHA256 of host_user_id and encode as hex/base64) and assign the hashed
string to props["host_user_id"] (or rename to props["host_user_id_hash"] if
preferred); update the logic around environment::detect_host_user_id(), the
assignment to props["host_user_id"], and any tests/consumers to expect the
hashed value rather than the raw host_user_id.
- Around line 221-223: The check in telemetry decision logic currently treats an
empty config.api_key as missing even when config.sdk_api_key is present; update
the guard that returns Some(DisableReason::NoApiKey) to only do so when both
config.api_key and config.sdk_api_key are empty (i.e., treat sdk_api_key as a
valid configured client/fallback). Locate the condition using config.api_key and
config.sdk_api_key in the same block (the branch that returns
DisableReason::NoApiKey) and modify it so active_client() behavior is mirrored:
only disable telemetry when both keys are blank.
- Around line 649-679: The is_active flag currently only checks
delta_invocations_total; update the is_active computation in mod.rs so it also
considers other traffic deltas (delta_api_requests, delta_queue_emits,
delta_queue_consumes, delta_pubsub_publishes, delta_pubsub_subscribes,
delta_cron_executions) — e.g., compute a total_activity from those deltas (or OR
them) and set is_active = total_activity > 0 (replace the existing is_active =
delta_invocations_total > 0). Reference the is_active variable near where
uptime_secs, ft (collect_functions_and_triggers), and wd (collect_worker_data)
are used.
- Around line 175-191: The function check_and_mark_first_run currently
overwrites ~/.iii/state.ini with just the telemetry key and ignores write
errors; change it to preserve existing state and only set
telemetry.first_run_sent: read the file at state_path (or start with an empty
buffer), update or insert the [telemetry] section and first_run_sent=true
without removing other keys, then write the updated contents atomically (e.g.,
write to a temp file and rename) using std::fs::OpenOptions/create_dir_all for
the parent; ensure the write result is checked and return false on write failure
so the function only returns true when the state was successfully persisted.

---

Nitpick comments:
In `@cli/src/telemetry.rs`:
- Around line 182-202: Tests are mutating real process state:
clear_opt_out_vars() currently removes environment variables unsafely and tests
read/write the user's telemetry ID under HOME; make tests hermetic by (1)
changing test code to set up a temp directory (TempDir) and point any
HOME/telemetry path resolution to that directory rather than the real HOME, (2)
avoid unscoped env removal by saving and restoring all affected env vars (the
ones removed by clear_opt_out_vars) around the test or better yet refactor
clear_opt_out_vars() into a helper that accepts an env map so tests can pass a
sandboxed map, and (3) apply the same pattern to the other affected test helpers
referenced in the diff (the stability test and the code touching the telemetry
ID file) so all env/path lookups use the temp-scoped fixture and original env is
restored after the test finishes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a4d8d8e3-b5f2-4969-944a-356790ddedf6

📥 Commits

Reviewing files that changed from the base of the PR and between 61f7877 and 3d2b813.

📒 Files selected for processing (10)
  • cli/Cargo.toml
  • cli/src/main.rs
  • cli/src/telemetry.rs
  • cli/src/update.rs
  • engine/Dockerfile
  • engine/docker-compose.prod.yml
  • engine/docker-compose.yml
  • engine/src/modules/telemetry/collector.rs
  • engine/src/modules/telemetry/environment.rs
  • engine/src/modules/telemetry/mod.rs

Comment on lines +42 to +49
let id = uuid::Uuid::new_v4().to_string();
if let Some(parent) = path.parent() {
std::fs::create_dir_all(parent).ok();
}
let tmp = path.with_extension("tmp");
if std::fs::write(&tmp, &id).is_ok() {
std::fs::rename(&tmp, &path).ok();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Only return a new telemetry ID after it is actually persisted.

This function returns the freshly generated UUID even if create_dir_all, write, or rename fails. That means the current process can emit events with a device ID that the next invocation will never reuse, which breaks the "stable telemetry ID" guarantee on the error path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/src/telemetry.rs` around lines 42 - 49, Generate or load the stable
telemetry ID only after it is actually persisted: first try to read an existing
ID from path and return it if present; if not, create the parent dir
(std::fs::create_dir_all), generate id (uuid::Uuid::new_v4()), write it to a
temp file (path.with_extension("tmp") and std::fs::write) and then atomically
rename the tmp to path (std::fs::rename); only return the new id if the rename
succeeded, otherwise attempt to read any newly-created path file and return that
(or propagate an error/None) so you never return a generated id that failed to
be persisted. Ensure you reference the variables id, path, tmp and the calls
create_dir_all, write and rename when making the changes.

Comment on lines +122 to +135
fn send_fire_and_forget(api_key: String, event: AmplitudeEvent) {
tokio::spawn(async move {
let payload = AmplitudePayload {
api_key: &api_key,
events: vec![event],
};
let client = reqwest::Client::builder()
.timeout(std::time::Duration::from_secs(5))
.build();

if let Ok(client) = client {
let _ = client.post(AMPLITUDE_ENDPOINT).json(&payload).send().await;
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find the telemetry.rs file and examine the context around lines 122-135
find . -name "telemetry.rs" -type f

Repository: iii-hq/iii

Length of output: 102


🏁 Script executed:

# Once located, read the file to see full context
cat -n cli/src/telemetry.rs

Repository: iii-hq/iii

Length of output: 11306


🏁 Script executed:

# Search for any telemetry shutdown or flush mechanisms
rg "shutdown|flush|telemetry" --type rust -B 2 -A 2

Repository: iii-hq/iii

Length of output: 50369


🏁 Script executed:

# Find the CLI's main entry point
find cli -name "main.rs" -o -name "lib.rs" | head -5

Repository: iii-hq/iii

Length of output: 69


🏁 Script executed:

# Look for Cargo.toml to understand the binary structure
cat cli/Cargo.toml | head -50

Repository: iii-hq/iii

Length of output: 954


🏁 Script executed:

# Search for the main function in CLI
rg "fn main" cli/src --type rust -B 2 -A 10

Repository: iii-hq/iii

Length of output: 559


🏁 Script executed:

# Check the full main.rs file and see if there's any shutdown handling
cat -n cli/src/main.rs

Repository: iii-hq/iii

Length of output: 12467


🏁 Script executed:

# Look for any telemetry shutdown calls
rg "shutdown|flush|telemetry" cli/src/main.rs

Repository: iii-hq/iii

Length of output: 68


🏁 Script executed:

# Check if there are any update-related functions that call telemetry
rg "send_cli_update" cli/src --type rust -B 2 -A 5

Repository: iii-hq/iii

Length of output: 3882


🏁 Script executed:

# Check if there's any await or wait mechanism between telemetry call and process::exit
rg "update_binary|self_update|update_all" cli/src/main.rs -A 5 -B 2

Repository: iii-hq/iii

Length of output: 1275


🏁 Script executed:

# Look at the update functions to see if they await the telemetry
rg "send_cli_update" cli/src/update.rs -A 2 -B 2

Repository: iii-hq/iii

Length of output: 1195


🏁 Script executed:

# Let me see the exact timeline: when telemetry is called vs when process::exit happens
rg "process::exit|return" cli/src/main.rs | head -20

Repository: iii-hq/iii

Length of output: 373


🏁 Script executed:

# Check what happens after telemetry calls - is there any waiting?
sed -n '150,280p' cli/src/update.rs | cat -n

Repository: iii-hq/iii

Length of output: 5745


Await the send path instead of detaching it.

In a short-lived CLI, spawned telemetry tasks are aborted when process::exit() is called, so the cli_update_succeeded and cli_update_failed events are lost. Make the send path async and await it with a tight timeout from the update flow instead of spawning and forgetting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/src/telemetry.rs` around lines 122 - 135, The send_fire_and_forget
function currently spawns a detached task so telemetry is lost on process::exit;
change send_fire_and_forget into an async function (e.g., async fn
send_telemetry(api_key: String, event: AmplitudeEvent) -> Result<(),
reqwest::Error>) that performs the same payload construction and
client.post(...).send().await and returns Result indicating success/failure,
remove tokio::spawn usage, and update callers (e.g., the update flow that called
send_fire_and_forget) to await this future with a tight timeout using
tokio::time::timeout(Duration::from_secs(5), send_telemetry(...)).map_err/handle
the timeout so the CLI doesn’t block indefinitely but ensures the event is sent
before exit.

Comment on lines +211 to +213
Err(e) => {
telemetry::send_cli_update_failed(spec.name, &from_version_str, &e.to_string());
Err(UpdateError::Download(e))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't make telemetry depend on Display strings.

Forwarding e.to_string() verbatim turns human-facing error text into an analytics dimension. That creates high cardinality and makes wording/path changes look like new failure types; emit a stable error code/category instead and keep the full message local.

Also applies to: 293-295

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/src/update.rs` around lines 211 - 213, The telemetry call is currently
sending human-facing error text via e.to_string(); instead, change
telemetry::send_cli_update_failed to receive a stable error category/code (not a
Display string) and pass a concise enum/key (e.g., an Update error kind or
Download error category) so analytics uses stable dimensions; keep the full
e.to_string() for the local Err(UpdateError::Download(e)) return or local
logging. Update the same pattern at the other occurrence mentioned (the block
around lines 293-295) so both telemetry calls send a stable error code/category
rather than the verbatim error message, referencing
telemetry::send_cli_update_failed, UpdateError::Download, spec.name, and
from_version_str to locate the sites.

Comment on lines +6 to +7
ENV III_CONTAINER=docker
ENV III_ENV=development
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't bake runtime-specific telemetry defaults into the image.

engine/src/modules/telemetry/environment.rs treats III_CONTAINER as authoritative, so a pod built from this image will still report "docker". III_ENV=development has the same problem: engine/docker-compose.prod.yml does not override it, so prod deployments will be tagged as development unless every runtime remembers to replace it.

Suggested adjustment
-ENV III_CONTAINER=docker
-ENV III_ENV=development
+# Inject telemetry-specific environment labels at deploy time
+# (docker-compose, Helm, etc.) so they reflect the actual runtime.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ENV III_CONTAINER=docker
ENV III_ENV=development
# Inject telemetry-specific environment labels at deploy time
# (docker-compose, Helm, etc.) so they reflect the actual runtime.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/Dockerfile` around lines 6 - 7, The Dockerfile currently hardcodes
runtime telemetry defaults via ENV III_CONTAINER=docker and ENV
III_ENV=development which causes engine/src/modules/telemetry/environment.rs to
read and report incorrect runtime values; remove these ENV lines from the
Dockerfile (or leave them unset/empty) so the runtime picks up authoritative
values from the deployment environment (k8s/pod or compose overrides) rather
than baking "docker" and "development" into the image; ensure any CI/deploy
manifests set III_CONTAINER and III_ENV explicitly where needed instead.

Comment on lines +379 to +381
if let Some(host_user_id) = environment::detect_host_user_id() {
props["host_user_id"] = serde_json::Value::String(host_user_id);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't emit the host correlation ID verbatim.

III_HOST_USER_ID is passed straight through to Amplitude as host_user_id. Since this value comes from the host and is used for cross-runtime correlation, it becomes another stable identifier on top of install_id and environment.machine_id; please hash or otherwise pseudonymize it before sending.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/modules/telemetry/mod.rs` around lines 379 - 381, The code
currently takes environment::detect_host_user_id() and sets
props["host_user_id"] = serde_json::Value::String(host_user_id) directly;
instead, pseudonymize this value before sending by hashing or HMACing it (e.g.,
compute a SHA-256 or HMAC-SHA256 of host_user_id and encode as hex/base64) and
assign the hashed string to props["host_user_id"] (or rename to
props["host_user_id_hash"] if preferred); update the logic around
environment::detect_host_user_id(), the assignment to props["host_user_id"], and
any tests/consumers to expect the hashed value rather than the raw host_user_id.

Comment on lines +649 to +679
let is_active = delta_invocations_total > 0;
let uptime_secs = start_time.elapsed().as_secs();

let ft = collect_functions_and_triggers(&engine);
let wd = collect_worker_data(&engine);
let project = resolve_project_context(wd.sdk_telemetry.as_ref());

let properties = serde_json::json!({
"invocations_total": delta_invocations_total,
"invocations_success": delta_invocations_success,
"invocations_error": delta_invocations_error,
"api_requests": delta_api_requests,
"queue_emits": delta_queue_emits,
"queue_consumes": delta_queue_consumes,
"pubsub_publishes": delta_pubsub_publishes,
"pubsub_subscribes": delta_pubsub_subscribes,
"cron_executions": delta_cron_executions,
"function_count": ft.function_count,
"trigger_count": ft.trigger_count,
"functions": ft.functions,
"trigger_types": ft.trigger_types,
"worker_count_total": wd.worker_count_total,
"worker_count_motia": wd.worker_count_motia,
"workers": wd.workers,
"sdk_languages": wd.sdk_languages,
"client_type": wd.client_type,
"project_id": project.project_id,
"project_name": project.project_name,
"period_secs": interval_secs,
"uptime_secs": uptime_secs,
"invocations": {
"total": invocations_total,
"success": invocations_success,
"error": invocations_error,
},
"rates": {
"invocations_per_sec": rate_invocations,
"queue_emits_per_sec": rate_queue_emits,
"api_requests_per_sec": rate_api_requests,
},
"workers": {
"spawns": workers_spawns,
"deaths": workers_deaths,
"active": workers_spawns.saturating_sub(workers_deaths),
"runtimes": runtime_counts,
},
"modules": telemetry_snapshot,
"registry": registry_data,
"client_context": client_context,
"is_active": is_active,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

is_active ignores non-invocation traffic.

Line 649 only checks delta_invocations_total, so a period with API requests, queue activity, cron executions, or pubsub traffic still reports is_active=false. That makes this flag disagree with the rest of the heartbeat payload.

Suggested fix
-                        let is_active = delta_invocations_total > 0;
+                        let is_active = delta_invocations_total > 0
+                            || delta_api_requests > 0
+                            || delta_queue_emits > 0
+                            || delta_queue_consumes > 0
+                            || delta_pubsub_publishes > 0
+                            || delta_pubsub_subscribes > 0
+                            || delta_cron_executions > 0;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let is_active = delta_invocations_total > 0;
let uptime_secs = start_time.elapsed().as_secs();
let ft = collect_functions_and_triggers(&engine);
let wd = collect_worker_data(&engine);
let project = resolve_project_context(wd.sdk_telemetry.as_ref());
let properties = serde_json::json!({
"invocations_total": delta_invocations_total,
"invocations_success": delta_invocations_success,
"invocations_error": delta_invocations_error,
"api_requests": delta_api_requests,
"queue_emits": delta_queue_emits,
"queue_consumes": delta_queue_consumes,
"pubsub_publishes": delta_pubsub_publishes,
"pubsub_subscribes": delta_pubsub_subscribes,
"cron_executions": delta_cron_executions,
"function_count": ft.function_count,
"trigger_count": ft.trigger_count,
"functions": ft.functions,
"trigger_types": ft.trigger_types,
"worker_count_total": wd.worker_count_total,
"worker_count_motia": wd.worker_count_motia,
"workers": wd.workers,
"sdk_languages": wd.sdk_languages,
"client_type": wd.client_type,
"project_id": project.project_id,
"project_name": project.project_name,
"period_secs": interval_secs,
"uptime_secs": uptime_secs,
"invocations": {
"total": invocations_total,
"success": invocations_success,
"error": invocations_error,
},
"rates": {
"invocations_per_sec": rate_invocations,
"queue_emits_per_sec": rate_queue_emits,
"api_requests_per_sec": rate_api_requests,
},
"workers": {
"spawns": workers_spawns,
"deaths": workers_deaths,
"active": workers_spawns.saturating_sub(workers_deaths),
"runtimes": runtime_counts,
},
"modules": telemetry_snapshot,
"registry": registry_data,
"client_context": client_context,
"is_active": is_active,
let is_active = delta_invocations_total > 0
|| delta_api_requests > 0
|| delta_queue_emits > 0
|| delta_queue_consumes > 0
|| delta_pubsub_publishes > 0
|| delta_pubsub_subscribes > 0
|| delta_cron_executions > 0;
let uptime_secs = start_time.elapsed().as_secs();
let ft = collect_functions_and_triggers(&engine);
let wd = collect_worker_data(&engine);
let project = resolve_project_context(wd.sdk_telemetry.as_ref());
let properties = serde_json::json!({
"invocations_total": delta_invocations_total,
"invocations_success": delta_invocations_success,
"invocations_error": delta_invocations_error,
"api_requests": delta_api_requests,
"queue_emits": delta_queue_emits,
"queue_consumes": delta_queue_consumes,
"pubsub_publishes": delta_pubsub_publishes,
"pubsub_subscribes": delta_pubsub_subscribes,
"cron_executions": delta_cron_executions,
"function_count": ft.function_count,
"trigger_count": ft.trigger_count,
"functions": ft.functions,
"trigger_types": ft.trigger_types,
"worker_count_total": wd.worker_count_total,
"worker_count_motia": wd.worker_count_motia,
"workers": wd.workers,
"sdk_languages": wd.sdk_languages,
"client_type": wd.client_type,
"project_id": project.project_id,
"project_name": project.project_name,
"period_secs": interval_secs,
"uptime_secs": uptime_secs,
"is_active": is_active,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/modules/telemetry/mod.rs` around lines 649 - 679, The is_active
flag currently only checks delta_invocations_total; update the is_active
computation in mod.rs so it also considers other traffic deltas
(delta_api_requests, delta_queue_emits, delta_queue_consumes,
delta_pubsub_publishes, delta_pubsub_subscribes, delta_cron_executions) — e.g.,
compute a total_activity from those deltas (or OR them) and set is_active =
total_activity > 0 (replace the existing is_active = delta_invocations_total >
0). Reference the is_active variable near where uptime_secs, ft
(collect_functions_and_triggers), and wd (collect_worker_data) are used.

Comment on lines +951 to +989
fn test_resolve_project_context_env_fallback() {
unsafe {
env::set_var("III_PROJECT_ID", "proj-123");
env::remove_var("III_PROJECT_ROOT");
}
let result = resolve_project_id();
assert_eq!(result, Some("proj-123".to_string()));
let ctx = resolve_project_context(None);
assert_eq!(ctx.project_id, Some("proj-123".to_string()));
unsafe {
env::remove_var("III_PROJECT_ID");
}
}

#[test]
#[serial]
fn test_resolve_project_id_when_unset() {
fn test_resolve_project_context_sdk_telemetry_project_name() {
unsafe {
env::remove_var("III_PROJECT_ID");
env::remove_var("III_PROJECT_ROOT");
}
let result = resolve_project_id();
assert_eq!(result, None);
let telemetry = WorkerTelemetryMeta {
language: None,
project_name: Some("my-sdk-project".to_string()),
framework: None,
};
let ctx = resolve_project_context(Some(&telemetry));
assert_eq!(ctx.project_name, Some("my-sdk-project".to_string()));
}

#[test]
#[serial]
fn test_resolve_project_id_when_empty() {
unsafe {
env::set_var("III_PROJECT_ID", "");
}
let result = resolve_project_id();
assert_eq!(result, None);
fn test_resolve_project_context_none_when_unset() {
unsafe {
env::remove_var("III_PROJECT_ID");
env::remove_var("III_PROJECT_ROOT");
}
let ctx = resolve_project_context(None);
assert_eq!(ctx.project_id, None);
assert_eq!(ctx.project_name, None);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

These project-context tests depend on the repo's current working directory.

resolve_project_context() walks up from current_dir(), but these tests only unset III_PROJECT_ROOT. If the repository ever gains a real .iii/project.ini, they'll start reading that file and fail unexpectedly. Point III_PROJECT_ROOT at an isolated temp dir instead of removing it.

Also applies to: 1270-1304

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/modules/telemetry/mod.rs` around lines 951 - 989, Tests calling
resolve_project_context() currently unset III_PROJECT_ROOT which makes them
depend on the repo CWD; instead, in the tests (e.g.,
test_resolve_project_context_env_fallback,
test_resolve_project_context_sdk_telemetry_project_name,
test_resolve_project_context_none_when_unset and the similar block around lines
1270-1304) set III_PROJECT_ROOT to a fresh temporary directory (create a temp
dir per test and set env::set_var("III_PROJECT_ROOT", temp_dir.path())) so
resolve_project_context() cannot find a real .iii/project.ini; revert or remove
the temp dir afterwards and keep any existing III_PROJECT_ID handling as-is.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

♻️ Duplicate comments (5)
engine/src/modules/telemetry/mod.rs (5)

371-373: ⚠️ Potential issue | 🟠 Major

Hash host_user_id before sending it.

This value is a stable cross-runtime identifier from the host, so emitting it verbatim adds another durable identifier on top of install_id and environment.machine_id.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/modules/telemetry/mod.rs` around lines 371 - 373, The telemetry
code currently assigns the raw host identifier into props["host_user_id"] using
environment::detect_host_user_id(); instead, compute a stable hash (e.g.,
SHA-256 or another project-standard hashing function) of the returned
host_user_id and store the hex (or base64) digest in props["host_user_id"] so
the original identifier is not emitted; update the block that handles
environment::detect_host_user_id() to perform hashing and set the hashed value
into props.

213-215: ⚠️ Potential issue | 🟠 Major

sdk_api_key should keep telemetry enabled.

This returns NoApiKey whenever api_key is blank, even though active_client() will use sdk_client when sdk_api_key is present. Only disable when both keys are empty.

🔑 Suggested guard
-    if config.api_key.is_empty() {
+    let has_any_api_key = !config.api_key.trim().is_empty()
+        || config
+            .sdk_api_key
+            .as_deref()
+            .is_some_and(|k| !k.trim().is_empty());
+    if !has_any_api_key {
         return Some(DisableReason::NoApiKey);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/modules/telemetry/mod.rs` around lines 213 - 215, The current
guard in telemetry/mod.rs returns DisableReason::NoApiKey whenever
config.api_key is empty, but active_client() can fall back to the sdk client;
change the check to only disable when both keys are missing by verifying
config.api_key.is_empty() && config.sdk_api_key.is_empty() before returning
Some(DisableReason::NoApiKey). Update the condition near the existing api_key
check and keep the DisableReason::NoApiKey return the same; reference
active_client(), config.api_key, config.sdk_api_key, and DisableReason::NoApiKey
to locate and modify the logic.

648-667: ⚠️ Potential issue | 🟠 Major

Keep heartbeat inventory fields bounded.

The recurring heartbeat event now ships full functions, trigger_types, workers, and sdk_languages arrays every interval. On larger projects the payload grows with registry size and becomes easy to drop or rate-limit. Keep the counts here and move capped or one-shot inventories to a separate event.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/modules/telemetry/mod.rs` around lines 648 - 667, The heartbeat
payload is sending full inventory arrays (ft.functions, ft.trigger_types,
wd.workers, wd.sdk_languages) causing unbounded growth; change the properties
built for the heartbeat to include only counts (function_count, trigger_count,
worker_count_total, worker_count_motia) and remove the raw arrays from the
`properties` JSON used by the heartbeat event, then emit a separate
capped/one-shot inventory event (e.g., send an "inventory" event or call a new
emit_inventory function) that ships limited-length snapshots (truncate arrays to
a safe max and/or only send on change or at a lower cadence) so full inventories
are not included in every heartbeat.

943-980: ⚠️ Potential issue | 🟡 Minor

Pin III_PROJECT_ROOT to a temp dir in these tests.

Unsetting III_PROJECT_ROOT lets resolve_project_context() walk up into the real repository, so these tests can start reading a checked-in .iii/project.ini and fail depending on the current working directory.

Also applies to: 1263-1304

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/modules/telemetry/mod.rs` around lines 943 - 980, Tests that
remove III_PROJECT_ROOT allow resolve_project_context() to traverse the real
repo and pick up checked-in .iii/project.ini; instead, in the tests
(test_resolve_project_context_env_fallback,
test_resolve_project_context_sdk_telemetry_project_name,
test_resolve_project_context_none_when_unset and the duplicate block) set
III_PROJECT_ROOT to a temporary directory (e.g., create a tempdir and set
env::set_var("III_PROJECT_ROOT", temp_dir.path())) before calling
resolve_project_context(), and restore or remove the env var after the test to
ensure the function cannot walk into the repository when resolving project
context.

641-671: ⚠️ Potential issue | 🟡 Minor

Count non-invocation traffic in is_active.

A period with API requests, queue activity, pubsub traffic, or cron executions still reports false here.

📈 Suggested fix
-                        let is_active = delta_invocations_total > 0;
+                        let is_active = delta_invocations_total > 0
+                            || delta_api_requests > 0
+                            || delta_queue_emits > 0
+                            || delta_queue_consumes > 0
+                            || delta_pubsub_publishes > 0
+                            || delta_pubsub_subscribes > 0
+                            || delta_cron_executions > 0;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/modules/telemetry/mod.rs` around lines 641 - 671, The is_active
calculation currently only checks delta_invocations_total; update it so
non-invocation traffic also marks the node active by including the other delta
counters (delta_api_requests, delta_queue_emits, delta_queue_consumes,
delta_pubsub_publishes, delta_pubsub_subscribes, delta_cron_executions) in the
boolean expression used to set is_active (in the block where is_active is
assigned and the telemetry properties are constructed), e.g. set is_active =
delta_invocations_total > 0 || delta_api_requests > 0 || delta_queue_emits > 0
|| delta_queue_consumes > 0 || delta_pubsub_publishes > 0 ||
delta_pubsub_subscribes > 0 || delta_cron_executions > 0 so any of those deltas
> 0 yields true.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cli/install.sh`:
- Around line 841-862: The telemetry bootstrap (calls to iii_gen_uuid,
iii_get_or_create_telemetry_id and the initial iii_send_event) is being executed
unconditionally; move this block so it runs only after the telemetry eligibility
check but still before any early exit/error paths that are instrumented, and
before any calls that modify user shell state (e.g., iii_export_host_user_id);
ensure you still set install_event_prefix and send either "upgrade_started" or
"install_started" via iii_send_event (with the same payloads) immediately after
determining eligibility so instrumented failures that occur later include
install_id/telemetry_id; adjust call sites to reference the same functions
(iii_gen_uuid, iii_get_or_create_telemetry_id, iii_send_event,
iii_export_host_user_id) without changing their payload structure.
- Around line 99-110: The installer helper function iii_telemetry_enabled()
currently ignores the III_TELEMETRY_DEV flag; update iii_telemetry_enabled() to
treat III_TELEMETRY_DEV=true (or "1") as disabling telemetry by checking that
variable at the top (same semantics as III_TELEMETRY_ENABLED) and returning 1
when set, ensuring the installer and cli/src/telemetry.rs behave consistently
regarding III_TELEMETRY_DEV.
- Around line 256-258: The background telemetry curl POST to AMPLITUDE_ENDPOINT
is unbounded and can keep the installer waiting when err() performs a bare wait;
update the curl invocation that posts "$payload" to include explicit timeouts
(e.g., --connect-timeout 5 and --max-time 10 or similar) so the background job
is bounded and won't hang forever; ensure you keep the existing flags (-s -o
/dev/null -X POST -H "Content-Type: application/json" --data-raw "$payload" &)
and only add the timeout flags to that same command.

In `@cli/src/telemetry.rs`:
- Around line 12-17: telemetry_ini_path currently hardcodes ~/.iii/telemetry.ini
causing tests that call build_event() / get_or_create_telemetry_id() to mutate
real user state; make the telemetry path injectable by changing
telemetry_ini_path to read an override (e.g. from an env var like
TELEMETRY_INI_PATH) or by providing a variant telemetry_ini_path_from(path:
Option<PathBuf>) and then update build_event and get_or_create_telemetry_id to
accept a &Path or use the override so tests can supply a temp file, and update
tests to set the env var or pass a temp PathBuf; ensure default behavior remains
the same when no override is provided.

In `@engine/install.sh`:
- Around line 13-25: The telemetry code is splicing unescaped error text into
the JSON fragments before calling iii_send_event, which can break JSON and let
the background curl hang; change implementation so install.sh stops inserting
raw _err_msg into the JSON payload and instead pass the raw error string as a
separate parameter to iii_send_event (or call a new helper) and perform proper
JSON escaping/quoting inside iii_send_event when building _payload, and add
short curl timeouts (e.g., --connect-timeout and --max-time) and explicit
--silent/--fail handling in the background upload path so err()'s wait cannot
block indefinitely; update both the upgrade_failed/install_failed call sites and
the iii_send_event function to perform escaping and timeout handling
consistently (also apply the same fix for the other occurrence noted around
lines 166-189).
- Around line 316-318: The installer currently calls
iii_get_or_create_telemetry_id (and performs iii_export_host_user_id)
unconditionally, which creates ~/.iii/telemetry.ini and shell-profile exports
even when telemetry is disabled or no Amplitude key is set; wrap the calls that
generate/persist telemetry (iii_get_or_create_telemetry_id and
iii_export_host_user_id) in a guard that checks telemetry is enabled and an
Amplitude key/config is present (i.e., only call iii_get_or_create_telemetry_id
and iii_export_host_user_id when the telemetry opt-in flag is true and the
amplitude key is configured), leaving iii_gen_uuid (install_id) unaffected if it
must always be generated. Ensure the same guard is applied to the other
occurrence noted in the comment.

In `@engine/src/modules/telemetry/environment.rs`:
- Around line 60-81: set_ini_key currently swallows all filesystem errors,
causing telemetry.ini persistence to appear successful when it may not be;
change set_ini_key to return a Result<(), std::io::Error> (or a suitable error
type), propagate and return errors from create_dir_all, std::fs::write,
set_permissions (on unix), and std::fs::rename instead of ignoring them, and
keep the existing atomic tmp->rename flow using telemetry_ini_path, parse_ini,
and serialize_ini; update callers to handle the Result so failures are reported
and do not silently lose install ID/state.

In `@engine/src/modules/telemetry/mod.rs`:
- Around line 83-109: read_project_ini currently only matches exact
"project_id=" / "project_name=" prefixes and thus misses lines formatted as "key
= value"; update read_project_ini to split each trimmed line on the first '='
(e.g., using split_once('=')), trim both key and value, then match the key
against "project_id" and "project_name" and set project_id/project_name when the
trimmed value is non-empty, preserving the existing Option semantics and return
behavior in the read_project_ini function.

---

Duplicate comments:
In `@engine/src/modules/telemetry/mod.rs`:
- Around line 371-373: The telemetry code currently assigns the raw host
identifier into props["host_user_id"] using environment::detect_host_user_id();
instead, compute a stable hash (e.g., SHA-256 or another project-standard
hashing function) of the returned host_user_id and store the hex (or base64)
digest in props["host_user_id"] so the original identifier is not emitted;
update the block that handles environment::detect_host_user_id() to perform
hashing and set the hashed value into props.
- Around line 213-215: The current guard in telemetry/mod.rs returns
DisableReason::NoApiKey whenever config.api_key is empty, but active_client()
can fall back to the sdk client; change the check to only disable when both keys
are missing by verifying config.api_key.is_empty() &&
config.sdk_api_key.is_empty() before returning Some(DisableReason::NoApiKey).
Update the condition near the existing api_key check and keep the
DisableReason::NoApiKey return the same; reference active_client(),
config.api_key, config.sdk_api_key, and DisableReason::NoApiKey to locate and
modify the logic.
- Around line 648-667: The heartbeat payload is sending full inventory arrays
(ft.functions, ft.trigger_types, wd.workers, wd.sdk_languages) causing unbounded
growth; change the properties built for the heartbeat to include only counts
(function_count, trigger_count, worker_count_total, worker_count_motia) and
remove the raw arrays from the `properties` JSON used by the heartbeat event,
then emit a separate capped/one-shot inventory event (e.g., send an "inventory"
event or call a new emit_inventory function) that ships limited-length snapshots
(truncate arrays to a safe max and/or only send on change or at a lower cadence)
so full inventories are not included in every heartbeat.
- Around line 943-980: Tests that remove III_PROJECT_ROOT allow
resolve_project_context() to traverse the real repo and pick up checked-in
.iii/project.ini; instead, in the tests
(test_resolve_project_context_env_fallback,
test_resolve_project_context_sdk_telemetry_project_name,
test_resolve_project_context_none_when_unset and the duplicate block) set
III_PROJECT_ROOT to a temporary directory (e.g., create a tempdir and set
env::set_var("III_PROJECT_ROOT", temp_dir.path())) before calling
resolve_project_context(), and restore or remove the env var after the test to
ensure the function cannot walk into the repository when resolving project
context.
- Around line 641-671: The is_active calculation currently only checks
delta_invocations_total; update it so non-invocation traffic also marks the node
active by including the other delta counters (delta_api_requests,
delta_queue_emits, delta_queue_consumes, delta_pubsub_publishes,
delta_pubsub_subscribes, delta_cron_executions) in the boolean expression used
to set is_active (in the block where is_active is assigned and the telemetry
properties are constructed), e.g. set is_active = delta_invocations_total > 0 ||
delta_api_requests > 0 || delta_queue_emits > 0 || delta_queue_consumes > 0 ||
delta_pubsub_publishes > 0 || delta_pubsub_subscribes > 0 ||
delta_cron_executions > 0 so any of those deltas > 0 yields true.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 881b4196-37a5-4efc-aa0d-8d937201f6f3

📥 Commits

Reviewing files that changed from the base of the PR and between 3d2b813 and 898bbc8.

📒 Files selected for processing (5)
  • cli/install.sh
  • cli/src/telemetry.rs
  • engine/install.sh
  • engine/src/modules/telemetry/environment.rs
  • engine/src/modules/telemetry/mod.rs

Comment on lines +99 to +110
iii_telemetry_enabled() {
if [[ "${III_TELEMETRY_ENABLED:-}" == "false" || "${III_TELEMETRY_ENABLED:-}" == "0" ]]; then
return 1
fi
local ci_vars=(CI GITHUB_ACTIONS GITLAB_CI CIRCLECI JENKINS_URL TRAVIS BUILDKITE TF_BUILD CODEBUILD_BUILD_ID BITBUCKET_BUILD_NUMBER DRONE TEAMCITY_VERSION)
for v in "${ci_vars[@]}"; do
if [[ -n "${!v:-}" ]]; then
return 1
fi
done
return 0
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Honor III_TELEMETRY_DEV in the installer too.

cli/src/telemetry.rs disables telemetry when III_TELEMETRY_DEV=true, but this helper does not. That makes dev/opt-out behavior inconsistent across the two entry points.

Suggested change
 iii_telemetry_enabled() {
   if [[ "${III_TELEMETRY_ENABLED:-}" == "false" || "${III_TELEMETRY_ENABLED:-}" == "0" ]]; then
     return 1
   fi
+  if [[ "${III_TELEMETRY_DEV:-}" == "true" ]]; then
+    return 1
+  fi
   local ci_vars=(CI GITHUB_ACTIONS GITLAB_CI CIRCLECI JENKINS_URL TRAVIS BUILDKITE TF_BUILD CODEBUILD_BUILD_ID BITBUCKET_BUILD_NUMBER DRONE TEAMCITY_VERSION)
   for v in "${ci_vars[@]}"; do
     if [[ -n "${!v:-}" ]]; then
       return 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/install.sh` around lines 99 - 110, The installer helper function
iii_telemetry_enabled() currently ignores the III_TELEMETRY_DEV flag; update
iii_telemetry_enabled() to treat III_TELEMETRY_DEV=true (or "1") as disabling
telemetry by checking that variable at the top (same semantics as
III_TELEMETRY_ENABLED) and returning 1 when set, ensuring the installer and
cli/src/telemetry.rs behave consistently regarding III_TELEMETRY_DEV.

Comment on lines +256 to +258
curl -s -o /dev/null -X POST "$AMPLITUDE_ENDPOINT" \
-H "Content-Type: application/json" \
--data-raw "$payload" &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

nl -ba cli/install.sh | sed -n '62,77p;246,259p'
rg -n --fixed-strings -- '--connect-timeout' cli/install.sh || true
rg -n --fixed-strings -- '--max-time' cli/install.sh || true

Repository: iii-hq/iii

Length of output: 96


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check the curl command at lines 256-258
cat -n cli/install.sh | sed -n '250,265p'

# Search for timeout flags
echo "=== Searching for --connect-timeout ==="
rg -n --fixed-strings -- '--connect-timeout' cli/install.sh || echo "Not found"

echo "=== Searching for --max-time ==="
rg -n --fixed-strings -- '--max-time' cli/install.sh || echo "Not found"

# Find the err() function definition
echo "=== Searching for err() function ==="
rg -n 'err\s*\(\s*\)' cli/install.sh -A 5 || echo "Not found"

Repository: iii-hq/iii

Length of output: 1526


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Get more lines of the err() function
cat -n cli/install.sh | sed -n '62,90p'

# Also search for wait in err function context
rg -n 'wait' cli/install.sh | head -20

Repository: iii-hq/iii

Length of output: 1254


Bound the telemetry curl before waiting on it.

err() does a bare wait at line 76, so this background POST can keep the installer open until curl/network timeouts fire. Add explicit --connect-timeout and --max-time limits to the curl command at lines 256-258.

Suggested change
-  curl -s -o /dev/null -X POST "$AMPLITUDE_ENDPOINT" \
+  curl -s -o /dev/null --connect-timeout 5 --max-time 5 -X POST "$AMPLITUDE_ENDPOINT" \
     -H "Content-Type: application/json" \
     --data-raw "$payload" &
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
curl -s -o /dev/null -X POST "$AMPLITUDE_ENDPOINT" \
-H "Content-Type: application/json" \
--data-raw "$payload" &
curl -s -o /dev/null --connect-timeout 5 --max-time 5 -X POST "$AMPLITUDE_ENDPOINT" \
-H "Content-Type: application/json" \
--data-raw "$payload" &
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/install.sh` around lines 256 - 258, The background telemetry curl POST to
AMPLITUDE_ENDPOINT is unbounded and can keep the installer waiting when err()
performs a bare wait; update the curl invocation that posts "$payload" to
include explicit timeouts (e.g., --connect-timeout 5 and --max-time 10 or
similar) so the background job is bounded and won't hang forever; ensure you
keep the existing flags (-s -o /dev/null -X POST -H "Content-Type:
application/json" --data-raw "$payload" &) and only add the timeout flags to
that same command.

Comment on lines +12 to +17
fn telemetry_ini_path() -> std::path::PathBuf {
dirs::home_dir()
.unwrap_or_else(std::env::temp_dir)
.join(".iii")
.join("telemetry.ini")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make the telemetry store path injectable for tests.

telemetry_ini_path() always targets ~/.iii/telemetry.ini, and this module’s tests call build_event() / get_or_create_telemetry_id(). That makes the suite mutate shared user state on a real path instead of a temp sandbox.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/src/telemetry.rs` around lines 12 - 17, telemetry_ini_path currently
hardcodes ~/.iii/telemetry.ini causing tests that call build_event() /
get_or_create_telemetry_id() to mutate real user state; make the telemetry path
injectable by changing telemetry_ini_path to read an override (e.g. from an env
var like TELEMETRY_INI_PATH) or by providing a variant
telemetry_ini_path_from(path: Option<PathBuf>) and then update build_event and
get_or_create_telemetry_id to accept a &Path or use the override so tests can
supply a temp file, and update tests to set the env var or pass a temp PathBuf;
ensure default behavior remains the same when no override is provided.

Comment on lines +13 to +25
if [ -n "${install_event_prefix:-}" ] && [ -n "${install_id:-}" ] && [ -n "${telemetry_id:-}" ]; then
_err_msg=$(echo "$*" | tr '"' "'")
if [ "$install_event_prefix" = "upgrade" ]; then
iii_send_event "upgrade_failed" \
"\"install_id\":\"${install_id}\",\"from_version\":\"${from_version:-}\",\"install_method\":\"sh\",\"target_binary\":\"${BIN_NAME}\",\"error_stage\":\"${_err_msg}\"" \
"$telemetry_id" "$install_id"
else
iii_send_event "install_failed" \
"\"install_id\":\"${install_id}\",\"install_method\":\"sh\",\"target_binary\":\"${BIN_NAME}\",\"error_stage\":\"${_err_msg}\"" \
"$telemetry_id" "$install_id"
fi
wait
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Stop splicing raw strings into the telemetry JSON body.

iii_send_event() accepts raw JSON fragments and interpolates them straight into _payload; err() then feeds it arbitrary error text. A quote, backslash, or newline makes the body invalid, and because err() does wait, the background curl can stall installer exit indefinitely with no timeout. Move escaping/building into iii_send_event() and give curl a short connect/max time.

Also applies to: 166-189

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/install.sh` around lines 13 - 25, The telemetry code is splicing
unescaped error text into the JSON fragments before calling iii_send_event,
which can break JSON and let the background curl hang; change implementation so
install.sh stops inserting raw _err_msg into the JSON payload and instead pass
the raw error string as a separate parameter to iii_send_event (or call a new
helper) and perform proper JSON escaping/quoting inside iii_send_event when
building _payload, and add short curl timeouts (e.g., --connect-timeout and
--max-time) and explicit --silent/--fail handling in the background upload path
so err()'s wait cannot block indefinitely; update both the
upgrade_failed/install_failed call sites and the iii_send_event function to
perform escaping and timeout handling consistently (also apply the same fix for
the other occurrence noted around lines 166-189).

Comment on lines +316 to +318
install_id=$(iii_gen_uuid)
telemetry_id=$(iii_get_or_create_telemetry_id)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't persist telemetry IDs after opt-out.

iii_get_or_create_telemetry_id() and iii_export_host_user_id run even when telemetry is disabled or no Amplitude key is configured, so the installer still writes ~/.iii/telemetry.ini and shell-profile exports after explicit opt-out. Those are the same identifiers later consumed by engine/src/modules/telemetry/environment.rs.

🔒 Suggested guard
-install_id=$(iii_gen_uuid)
-telemetry_id=$(iii_get_or_create_telemetry_id)
+install_id=""
+telemetry_id=""
+if iii_telemetry_enabled && [ -n "$AMPLITUDE_API_KEY" ]; then
+  install_id=$(iii_gen_uuid)
+  telemetry_id=$(iii_get_or_create_telemetry_id)
+fi-iii_export_host_user_id
+if [ -n "$telemetry_id" ]; then
+  iii_export_host_user_id
+fi

Also applies to: 518-518

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/install.sh` around lines 316 - 318, The installer currently calls
iii_get_or_create_telemetry_id (and performs iii_export_host_user_id)
unconditionally, which creates ~/.iii/telemetry.ini and shell-profile exports
even when telemetry is disabled or no Amplitude key is set; wrap the calls that
generate/persist telemetry (iii_get_or_create_telemetry_id and
iii_export_host_user_id) in a guard that checks telemetry is enabled and an
Amplitude key/config is present (i.e., only call iii_get_or_create_telemetry_id
and iii_export_host_user_id when the telemetry opt-in flag is true and the
amplitude key is configured), leaving iii_gen_uuid (install_id) unaffected if it
must always be generated. Ensure the same guard is applied to the other
occurrence noted in the comment.

Comment on lines +60 to +81
pub fn set_ini_key(section: &str, key: &str, value: &str) {
let path = telemetry_ini_path();
let contents = std::fs::read_to_string(&path).unwrap_or_default();
let mut sections = parse_ini(&contents);
sections
.entry(section.to_string())
.or_default()
.insert(key.to_string(), value.to_string());
let serialized = serialize_ini(&sections);
if let Some(parent) = path.parent() {
std::fs::create_dir_all(parent).ok();
}
let tmp = path.with_extension("tmp");
if std::fs::write(&tmp, &serialized).is_ok() {
#[cfg(unix)]
{
use std::os::unix::fs::PermissionsExt;
let perms = std::fs::Permissions::from_mode(0o600);
std::fs::set_permissions(&tmp, perms).ok();
}
std::fs::rename(&tmp, &path).ok();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Return an error when telemetry.ini persistence fails.

set_ini_key() swallows create_dir_all, write, set_permissions, and rename errors, but callers treat the value as durable. If this write fails, install IDs stop being stable and first_run_sent can be emitted again on every launch.

🛠 Suggested direction
-pub fn set_ini_key(section: &str, key: &str, value: &str) {
+pub fn set_ini_key(section: &str, key: &str, value: &str) -> std::io::Result<()> {
     let path = telemetry_ini_path();
     let contents = std::fs::read_to_string(&path).unwrap_or_default();
     let mut sections = parse_ini(&contents);
     sections
         .entry(section.to_string())
         .or_default()
         .insert(key.to_string(), value.to_string());
     let serialized = serialize_ini(&sections);
     if let Some(parent) = path.parent() {
-        std::fs::create_dir_all(parent).ok();
+        std::fs::create_dir_all(parent)?;
     }
     let tmp = path.with_extension("tmp");
-    if std::fs::write(&tmp, &serialized).is_ok() {
-        #[cfg(unix)]
-        {
-            use std::os::unix::fs::PermissionsExt;
-            let perms = std::fs::Permissions::from_mode(0o600);
-            std::fs::set_permissions(&tmp, perms).ok();
-        }
-        std::fs::rename(&tmp, &path).ok();
-    }
+    std::fs::write(&tmp, &serialized)?;
+    #[cfg(unix)]
+    {
+        use std::os::unix::fs::PermissionsExt;
+        let perms = std::fs::Permissions::from_mode(0o600);
+        std::fs::set_permissions(&tmp, perms)?;
+    }
+    std::fs::rename(&tmp, &path)?;
+    Ok(())
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub fn set_ini_key(section: &str, key: &str, value: &str) {
let path = telemetry_ini_path();
let contents = std::fs::read_to_string(&path).unwrap_or_default();
let mut sections = parse_ini(&contents);
sections
.entry(section.to_string())
.or_default()
.insert(key.to_string(), value.to_string());
let serialized = serialize_ini(&sections);
if let Some(parent) = path.parent() {
std::fs::create_dir_all(parent).ok();
}
let tmp = path.with_extension("tmp");
if std::fs::write(&tmp, &serialized).is_ok() {
#[cfg(unix)]
{
use std::os::unix::fs::PermissionsExt;
let perms = std::fs::Permissions::from_mode(0o600);
std::fs::set_permissions(&tmp, perms).ok();
}
std::fs::rename(&tmp, &path).ok();
}
pub fn set_ini_key(section: &str, key: &str, value: &str) -> std::io::Result<()> {
let path = telemetry_ini_path();
let contents = std::fs::read_to_string(&path).unwrap_or_default();
let mut sections = parse_ini(&contents);
sections
.entry(section.to_string())
.or_default()
.insert(key.to_string(), value.to_string());
let serialized = serialize_ini(&sections);
if let Some(parent) = path.parent() {
std::fs::create_dir_all(parent)?;
}
let tmp = path.with_extension("tmp");
std::fs::write(&tmp, &serialized)?;
#[cfg(unix)]
{
use std::os::unix::fs::PermissionsExt;
let perms = std::fs::Permissions::from_mode(0o600);
std::fs::set_permissions(&tmp, perms)?;
}
std::fs::rename(&tmp, &path)?;
Ok(())
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/modules/telemetry/environment.rs` around lines 60 - 81,
set_ini_key currently swallows all filesystem errors, causing telemetry.ini
persistence to appear successful when it may not be; change set_ini_key to
return a Result<(), std::io::Error> (or a suitable error type), propagate and
return errors from create_dir_all, std::fs::write, set_permissions (on unix),
and std::fs::rename instead of ignoring them, and keep the existing atomic
tmp->rename flow using telemetry_ini_path, parse_ini, and serialize_ini; update
callers to handle the Result so failures are reported and do not silently lose
install ID/state.

Comment on lines +83 to +109
fn read_project_ini(root: &std::path::Path) -> Option<(Option<String>, Option<String>)> {
let ini_path = root.join(".iii").join("project.ini");
let contents = std::fs::read_to_string(&ini_path).ok()?;

let mut project_id: Option<String> = None;
let mut project_name: Option<String> = None;

for line in contents.lines() {
let line = line.trim();
if let Some(val) = line.strip_prefix("project_id=") {
let val = val.trim();
if !val.is_empty() {
project_id = Some(val.to_string());
}
} else if let Some(val) = line.strip_prefix("project_name=") {
let val = val.trim();
if !val.is_empty() {
project_name = Some(val.to_string());
}
}
}

if project_id.is_some() || project_name.is_some() {
Some((project_id, project_name))
} else {
None
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Parse standard key = value lines in project.ini.

read_project_ini() only recognizes literal project_id= / project_name=. The new INI writer in engine/src/modules/telemetry/environment.rs serializes key = value, so the same formatting here resolves to None and drops project context from every event.

🧩 Suggested fix
     for line in contents.lines() {
         let line = line.trim();
-        if let Some(val) = line.strip_prefix("project_id=") {
-            let val = val.trim();
-            if !val.is_empty() {
-                project_id = Some(val.to_string());
-            }
-        } else if let Some(val) = line.strip_prefix("project_name=") {
-            let val = val.trim();
-            if !val.is_empty() {
-                project_name = Some(val.to_string());
-            }
-        }
+        if line.is_empty() || line.starts_with('#') || line.starts_with(';') || line.starts_with('[') {
+            continue;
+        }
+        let Some((key, value)) = line.split_once('=') else {
+            continue;
+        };
+        let key = key.trim();
+        let val = value.trim();
+        if key == "project_id" && !val.is_empty() {
+            project_id = Some(val.to_string());
+        } else if key == "project_name" && !val.is_empty() {
+            project_name = Some(val.to_string());
+        }
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn read_project_ini(root: &std::path::Path) -> Option<(Option<String>, Option<String>)> {
let ini_path = root.join(".iii").join("project.ini");
let contents = std::fs::read_to_string(&ini_path).ok()?;
let mut project_id: Option<String> = None;
let mut project_name: Option<String> = None;
for line in contents.lines() {
let line = line.trim();
if let Some(val) = line.strip_prefix("project_id=") {
let val = val.trim();
if !val.is_empty() {
project_id = Some(val.to_string());
}
} else if let Some(val) = line.strip_prefix("project_name=") {
let val = val.trim();
if !val.is_empty() {
project_name = Some(val.to_string());
}
}
}
if project_id.is_some() || project_name.is_some() {
Some((project_id, project_name))
} else {
None
}
fn read_project_ini(root: &std::path::Path) -> Option<(Option<String>, Option<String>)> {
let ini_path = root.join(".iii").join("project.ini");
let contents = std::fs::read_to_string(&ini_path).ok()?;
let mut project_id: Option<String> = None;
let mut project_name: Option<String> = None;
for line in contents.lines() {
let line = line.trim();
if line.is_empty() || line.starts_with('#') || line.starts_with(';') || line.starts_with('[') {
continue;
}
let Some((key, value)) = line.split_once('=') else {
continue;
};
let key = key.trim();
let val = value.trim();
if key == "project_id" && !val.is_empty() {
project_id = Some(val.to_string());
} else if key == "project_name" && !val.is_empty() {
project_name = Some(val.to_string());
}
}
if project_id.is_some() || project_name.is_some() {
Some((project_id, project_name))
} else {
None
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/modules/telemetry/mod.rs` around lines 83 - 109, read_project_ini
currently only matches exact "project_id=" / "project_name=" prefixes and thus
misses lines formatted as "key = value"; update read_project_ini to split each
trimmed line on the first '=' (e.g., using split_once('=')), trim both key and
value, then match the key against "project_id" and "project_name" and set
project_id/project_name when the trimmed value is non-empty, preserving the
existing Option semantics and return behavior in the read_project_ini function.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
cli/src/telemetry.rs (2)

289-316: Test helper uses unsafe for environment variable manipulation.

The clear_opt_out_vars function uses unsafe blocks for env::remove_var and env::set_var. While this is necessary for the serial_test pattern, consider adding a comment explaining why unsafe is required (Rust 1.80+ marks these as unsafe due to potential data races in multithreaded contexts).

Add explanatory comment
 fn clear_opt_out_vars() {
+    // SAFETY: Tests are serialized via #[serial] attribute to prevent
+    // concurrent env var access. Rust 1.80+ marks env mutation as unsafe
+    // due to potential data races in multithreaded programs.
     unsafe {
         env::remove_var("III_TELEMETRY_ENABLED");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/src/telemetry.rs` around lines 289 - 316, The helper function
clear_opt_out_vars currently wraps env::remove_var calls in unsafe blocks; add
an explanatory comment above (or inside) clear_opt_out_vars noting that these
env::remove_var/set_var calls are marked unsafe in Rust 1.80+ because they can
cause data races in multithreaded contexts, and that the unsafe usage here is
deliberate and safe because tests run serially (serial_test) so concurrent
access is prevented; keep the unsafe blocks but document this rationale and
reference serial_test/serial to justify the safety.

81-81: Hard-coded API key exposed in source.

The Amplitude API key is embedded directly in the source code. While this is common for client-side telemetry keys, consider documenting that this key is intentionally public (write-only) and has appropriate rate limits configured on the Amplitude side.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/src/telemetry.rs` at line 81, Replace the hard-coded API key constant by
reading the Amplitude key from a runtime source (e.g., environment variable like
AMPLITUDE_API_KEY or a config loader) instead of keeping const API_KEY set to a
literal; update the code that references API_KEY to use the new getter/setting
and add a brief comment near the telemetry initialization (e.g., where API_KEY
was defined or in init_telemetry) documenting that the key is intentionally
public/read-only if that is the case and that rate limits are enforced on the
Amplitude side. Ensure any default behavior fails fast or logs a clear message
if the env/config value is missing.
engine/src/modules/telemetry/mod.rs (1)

28-28: Hard-coded API key in engine matches CLI.

The same Amplitude API key e8fb1f8d290a72dbb2d9b264926be4bf appears in both cli/src/telemetry.rs:81 and here. Consider extracting to a shared constant or build-time configuration to ensure consistency and simplify key rotation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/modules/telemetry/mod.rs` at line 28, The hard-coded API key in
the telemetry module is duplicated; replace the local const API_KEY with a
single source of truth (e.g., read from a shared telemetry constant or
build-time config/env var like AMPLITUDE_API_KEY) and update both places that
currently use the literal so they reference that shared value; change the const
API_KEY in mod.rs to fetch from the new shared symbol (or env/config getter) and
update the other telemetry usage to import that symbol so key rotation stays
consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cli/install.sh`:
- Around line 261-275: The function iii_export_host_user_id currently appends an
export line to user shell profiles without consent; change it to be opt-in and
to notify when changes are made by (1) adding a flag/variable (e.g.,
INSTALL_MODIFY_PROFILES or a CLI option) checked by iii_export_host_user_id
before writing, (2) if the flag is not set, skip modification and optionally
print a message showing the export line and instructions for manual addition,
and (3) when the function does modify a file (the code that writes export_line
into profiles like "${HOME}/.bashrc" "${HOME}/.zshrc" "${HOME}/.profile"), print
a clear notice to stdout/stderr stating which profile was updated and what was
appended; keep use of iii_read_ini_key to fetch the id, and ensure the function
returns a nonzero status or prints guidance if modification was skipped.

In `@cli/src/telemetry.rs`:
- Around line 402-408: The test test_get_or_create_telemetry_id_is_stable
currently writes to the real user state via get_or_create_telemetry_id(); change
the test to isolate filesystem side effects by creating a temporary directory
(e.g., tempfile::TempDir) and configuring the code under test to use that path
(set the env var or inject the path used by get_or_create_telemetry_id(), or
refactor get_or_create_telemetry_id() to accept a base path) so the telemetry
file is written into the temp dir, then run the two calls, assert non-empty and
equality, and let the TempDir drop to clean up; alternatively, if refactoring is
not possible, mark the test #[serial] and explicitly remove the created file
after the test to avoid touching real user state.

In `@engine/install.sh`:
- Around line 203-215: The iii_export_host_user_id function currently appends
export lines to user shell profiles without consent; change it to be opt-in:
detect non-interactive mode via a NO_PROMPT or CI env var and skip modification,
otherwise prompt the user (e.g., read -p) asking permission to append
III_HOST_USER_ID to one of their profiles; if the user declines, do not modify
files and instead print the exact export line and instructions for manual
installation. Update iii_export_host_user_id to implement the prompt/opt-out
logic and to respect an explicit env flag (e.g., III_AUTO_EXPORT=true) so
automated installs can still opt in.

---

Nitpick comments:
In `@cli/src/telemetry.rs`:
- Around line 289-316: The helper function clear_opt_out_vars currently wraps
env::remove_var calls in unsafe blocks; add an explanatory comment above (or
inside) clear_opt_out_vars noting that these env::remove_var/set_var calls are
marked unsafe in Rust 1.80+ because they can cause data races in multithreaded
contexts, and that the unsafe usage here is deliberate and safe because tests
run serially (serial_test) so concurrent access is prevented; keep the unsafe
blocks but document this rationale and reference serial_test/serial to justify
the safety.
- Line 81: Replace the hard-coded API key constant by reading the Amplitude key
from a runtime source (e.g., environment variable like AMPLITUDE_API_KEY or a
config loader) instead of keeping const API_KEY set to a literal; update the
code that references API_KEY to use the new getter/setting and add a brief
comment near the telemetry initialization (e.g., where API_KEY was defined or in
init_telemetry) documenting that the key is intentionally public/read-only if
that is the case and that rate limits are enforced on the Amplitude side. Ensure
any default behavior fails fast or logs a clear message if the env/config value
is missing.

In `@engine/src/modules/telemetry/mod.rs`:
- Line 28: The hard-coded API key in the telemetry module is duplicated; replace
the local const API_KEY with a single source of truth (e.g., read from a shared
telemetry constant or build-time config/env var like AMPLITUDE_API_KEY) and
update both places that currently use the literal so they reference that shared
value; change the const API_KEY in mod.rs to fetch from the new shared symbol
(or env/config getter) and update the other telemetry usage to import that
symbol so key rotation stays consistent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 21a2b80a-2865-49dd-b3b9-9b219c39cce3

📥 Commits

Reviewing files that changed from the base of the PR and between 898bbc8 and c696821.

📒 Files selected for processing (4)
  • cli/install.sh
  • cli/src/telemetry.rs
  • engine/install.sh
  • engine/src/modules/telemetry/mod.rs

Comment on lines +261 to +275
iii_export_host_user_id() {
local huid
huid=$(iii_read_ini_key "identity" "id")
if [[ -z "$huid" ]]; then
return 0
fi
local export_line="export III_HOST_USER_ID=\"${huid}\""
local profile
for profile in "${HOME}/.bashrc" "${HOME}/.zshrc" "${HOME}/.profile"; do
if [[ -f "$profile" ]] && ! grep -qF "III_HOST_USER_ID" "$profile" 2>/dev/null; then
printf '\n# iii host correlation\n%s\n' "$export_line" >> "$profile"
break
fi
done
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

iii_export_host_user_id modifies shell profiles without consent.

This function appends to .bashrc, .zshrc, or .profile if III_HOST_USER_ID is not already present. Users may not expect an installer to modify their shell configuration. Consider:

  1. Making this opt-in via a flag
  2. At minimum, informing the user that their profile was modified
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/install.sh` around lines 261 - 275, The function iii_export_host_user_id
currently appends an export line to user shell profiles without consent; change
it to be opt-in and to notify when changes are made by (1) adding a
flag/variable (e.g., INSTALL_MODIFY_PROFILES or a CLI option) checked by
iii_export_host_user_id before writing, (2) if the flag is not set, skip
modification and optionally print a message showing the export line and
instructions for manual addition, and (3) when the function does modify a file
(the code that writes export_line into profiles like "${HOME}/.bashrc"
"${HOME}/.zshrc" "${HOME}/.profile"), print a clear notice to stdout/stderr
stating which profile was updated and what was appended; keep use of
iii_read_ini_key to fetch the id, and ensure the function returns a nonzero
status or prints guidance if modification was skipped.

Comment on lines +402 to +408
#[test]
fn test_get_or_create_telemetry_id_is_stable() {
let id1 = get_or_create_telemetry_id();
let id2 = get_or_create_telemetry_id();
assert!(!id1.is_empty());
assert_eq!(id1, id2);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test test_get_or_create_telemetry_id_is_stable writes to real user state.

This test calls get_or_create_telemetry_id() which writes to ~/.iii/telemetry.ini on the developer's machine. It should use a temp directory or be marked #[serial] and clean up after itself.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/src/telemetry.rs` around lines 402 - 408, The test
test_get_or_create_telemetry_id_is_stable currently writes to the real user
state via get_or_create_telemetry_id(); change the test to isolate filesystem
side effects by creating a temporary directory (e.g., tempfile::TempDir) and
configuring the code under test to use that path (set the env var or inject the
path used by get_or_create_telemetry_id(), or refactor
get_or_create_telemetry_id() to accept a base path) so the telemetry file is
written into the temp dir, then run the two calls, assert non-empty and
equality, and let the TempDir drop to clean up; alternatively, if refactoring is
not possible, mark the test #[serial] and explicitly remove the created file
after the test to avoid touching real user state.

Comment on lines +203 to +215
iii_export_host_user_id() {
_huid=$(iii_read_ini_key "identity" "id")
if [ -z "$_huid" ]; then
return 0
fi
_export_line="export III_HOST_USER_ID=\"${_huid}\""
for _profile in "${HOME}/.bashrc" "${HOME}/.zshrc" "${HOME}/.profile"; do
if [ -f "$_profile" ] && ! grep -qF "III_HOST_USER_ID" "$_profile" 2>/dev/null; then
printf '\n# iii host correlation\n%s\n' "$_export_line" >> "$_profile"
break
fi
done
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

iii_export_host_user_id modifies shell profiles without user consent.

Similar to the CLI installer, this function appends to .bashrc, .zshrc, or .profile without informing the user. Users may not expect an installer to modify their shell configuration files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/install.sh` around lines 203 - 215, The iii_export_host_user_id
function currently appends export lines to user shell profiles without consent;
change it to be opt-in: detect non-interactive mode via a NO_PROMPT or CI env
var and skip modification, otherwise prompt the user (e.g., read -p) asking
permission to append III_HOST_USER_ID to one of their profiles; if the user
declines, do not modify files and instead print the exact export line and
instructions for manual installation. Update iii_export_host_user_id to
implement the prompt/opt-out logic and to respect an explicit env flag (e.g.,
III_AUTO_EXPORT=true) so automated installs can still opt in.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
engine/src/modules/telemetry/mod.rs (3)

360-362: ⚠️ Potential issue | 🟠 Major

Pseudonymize III_HOST_USER_ID before sending it.

This value is emitted verbatim into host_user_id, which adds another stable cross-runtime identifier on top of install_id and environment.machine_id. Hash or HMAC it before it leaves the process.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/modules/telemetry/mod.rs` around lines 360 - 362, The code
currently assigns the raw host identifier from
environment::detect_host_user_id() into props["host_user_id"]; instead,
pseudonymize it before sending by computing a stable, non-reversible value
(e.g., a SHA-256 hash or HMAC with a local secret key) of the detected
host_user_id and store that pseudonymized string in props["host_user_id"];
ensure the same method is used consistently with other identifiers (install_id /
environment.machine_id) so the value remains stable per host but cannot be
reversed to the original III_HOST_USER_ID.

89-101: ⚠️ Potential issue | 🟠 Major

read_project_ini() still rejects the normal key = value form.

Lines 91-99 only match the literal prefixes project_id= and project_name=. A standard INI file with spaces around = falls through here, so project context disappears from emitted events.

🧩 Suggested fix
     for line in contents.lines() {
         let line = line.trim();
-        if let Some(val) = line.strip_prefix("project_id=") {
-            let val = val.trim();
-            if !val.is_empty() {
-                project_id = Some(val.to_string());
-            }
-        } else if let Some(val) = line.strip_prefix("project_name=") {
-            let val = val.trim();
-            if !val.is_empty() {
-                project_name = Some(val.to_string());
-            }
-        }
+        if line.is_empty() || line.starts_with('#') || line.starts_with(';') || line.starts_with('[') {
+            continue;
+        }
+        let Some((key, value)) = line.split_once('=') else {
+            continue;
+        };
+        let key = key.trim();
+        let value = value.trim();
+        if key == "project_id" && !value.is_empty() {
+            project_id = Some(value.to_string());
+        } else if key == "project_name" && !value.is_empty() {
+            project_name = Some(value.to_string());
+        }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/modules/telemetry/mod.rs` around lines 89 - 101, The loop in
read_project_ini() only matches exact prefixes "project_id=" and "project_name="
and therefore ignores lines like "project_id = value"; update the parsing logic
to split each trimmed line on the first '=' (or use a regex) and trim both the
key and value, then match the normalized key ("project_id" / "project_name") and
set project_id/project_name when the value is non-empty; modify the code around
the for line in contents.lines() loop and the conditional branches that
reference the literal prefixes so they use the split-and-trim approach.

671-698: ⚠️ Potential issue | 🟠 Major

heartbeat still ships full inventories on every interval.

This payload includes full functions, trigger_types, sdk_languages, and workers arrays on every heartbeat. On larger projects the event size scales with registry size, which makes drops and throttling much more likely; keep the counts inline, but cap/sample the inventories or move them to a separate inventory event.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/modules/telemetry/mod.rs` around lines 671 - 698, The heartbeat
telemetry currently includes full inventories (ft.functions, ft.trigger_types,
wd.sdk_languages, wd.workers) in the properties payload which can balloon event
size; modify the heartbeat path (where properties is built, e.g., in the
heartbeat function in telemetry::mod.rs) to only include counts (function_count,
trigger_count, worker_count_*) and remove or replace the full arrays with either
(a) truncated/sampled versions limited to a small constant N or (b) omit them
entirely and emit a separate periodic "inventory" event that contains the full
ft.functions, ft.trigger_types, wd.sdk_languages, wd.workers data; ensure the
heartbeat still sends period/uptime/is_active and update any downstream
schema/consumers to read inventories from the new inventory event if you choose
that approach.
engine/src/modules/telemetry/environment.rs (1)

23-37: ⚠️ Potential issue | 🟠 Major

Stop treating telemetry config writes as infallible.

write_atomic() still swallows every filesystem error, and set_config_key() gives callers no way to detect failure. That means install ID and first_run_sent can look persisted when they were actually lost, which makes IDs unstable and can re-emit one-shot events.

Also applies to: 49-60

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/modules/telemetry/environment.rs` around lines 23 - 37, The
write_atomic function currently swallows all filesystem errors; change its
signature to return Result<(), std::io::Error> (or suitable Error) and propagate
errors instead of .ok(), performing each filesystem action
(create_dir_all(parent), write tmp (path.with_extension("tmp")), set_permissions
on unix, and rename) with ? so failures surface; then update set_config_key to
return Result<(), std::io::Error> (or propagate the error type) and forward the
Result from write_atomic so callers can detect and handle write failures (ensure
both the tmp write and rename errors are surfaced rather than ignored).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@engine/src/modules/telemetry/mod.rs`:
- Around line 165-182: The function check_and_mark_first_run performs a
read-then-write on shared state and can race when two engine processes start
concurrently; modify it to perform an atomic, interprocess-safe check-and-set
(e.g., acquire a filesystem-based lock or use atomic create_new of a sentinel
file) before calling environment::read_config_key / environment::set_config_key
so only one process can observe-and-set the first-run flag: update
check_and_mark_first_run to attempt to acquire a lock (or atomically create a
"first_run_sent" sentinel file) then re-check environment::read_config_key, set
the key via environment::set_config_key only if not already set, and release the
lock; this ensures a single process returns true and emits the first_run event.

---

Duplicate comments:
In `@engine/src/modules/telemetry/environment.rs`:
- Around line 23-37: The write_atomic function currently swallows all filesystem
errors; change its signature to return Result<(), std::io::Error> (or suitable
Error) and propagate errors instead of .ok(), performing each filesystem action
(create_dir_all(parent), write tmp (path.with_extension("tmp")), set_permissions
on unix, and rename) with ? so failures surface; then update set_config_key to
return Result<(), std::io::Error> (or propagate the error type) and forward the
Result from write_atomic so callers can detect and handle write failures (ensure
both the tmp write and rename errors are surfaced rather than ignored).

In `@engine/src/modules/telemetry/mod.rs`:
- Around line 360-362: The code currently assigns the raw host identifier from
environment::detect_host_user_id() into props["host_user_id"]; instead,
pseudonymize it before sending by computing a stable, non-reversible value
(e.g., a SHA-256 hash or HMAC with a local secret key) of the detected
host_user_id and store that pseudonymized string in props["host_user_id"];
ensure the same method is used consistently with other identifiers (install_id /
environment.machine_id) so the value remains stable per host but cannot be
reversed to the original III_HOST_USER_ID.
- Around line 89-101: The loop in read_project_ini() only matches exact prefixes
"project_id=" and "project_name=" and therefore ignores lines like "project_id =
value"; update the parsing logic to split each trimmed line on the first '=' (or
use a regex) and trim both the key and value, then match the normalized key
("project_id" / "project_name") and set project_id/project_name when the value
is non-empty; modify the code around the for line in contents.lines() loop and
the conditional branches that reference the literal prefixes so they use the
split-and-trim approach.
- Around line 671-698: The heartbeat telemetry currently includes full
inventories (ft.functions, ft.trigger_types, wd.sdk_languages, wd.workers) in
the properties payload which can balloon event size; modify the heartbeat path
(where properties is built, e.g., in the heartbeat function in
telemetry::mod.rs) to only include counts (function_count, trigger_count,
worker_count_*) and remove or replace the full arrays with either (a)
truncated/sampled versions limited to a small constant N or (b) omit them
entirely and emit a separate periodic "inventory" event that contains the full
ft.functions, ft.trigger_types, wd.sdk_languages, wd.workers data; ensure the
heartbeat still sends period/uptime/is_active and update any downstream
schema/consumers to read inventories from the new inventory event if you choose
that approach.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 499573d8-dd68-4998-a446-35a94e216534

📥 Commits

Reviewing files that changed from the base of the PR and between c696821 and 95c3440.

📒 Files selected for processing (3)
  • engine/Cargo.toml
  • engine/src/modules/telemetry/environment.rs
  • engine/src/modules/telemetry/mod.rs
✅ Files skipped from review due to trivial changes (1)
  • engine/Cargo.toml

Comment on lines +165 to +182
fn check_and_mark_first_run() -> bool {
if environment::read_config_key("state", "first_run_sent").as_deref() == Some("true") {
return false;
}

let legacy_path = dirs::home_dir()
.unwrap_or_else(std::env::temp_dir)
.join(".iii")
.join("state.ini");
if let Ok(contents) = std::fs::read_to_string(&legacy_path) {
if contents.contains("first_run_sent=true") {
environment::set_config_key("state", "first_run_sent", "true");
return false;
}
}

environment::set_config_key("state", "first_run_sent", "true");
true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

first_run can be emitted twice when two engines start together.

This is still a read-then-write check on shared state with no interprocess synchronization. If two processes cross this branch concurrently, both can observe first_run_sent != true, both return true, and both send a first_run event.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/modules/telemetry/mod.rs` around lines 165 - 182, The function
check_and_mark_first_run performs a read-then-write on shared state and can race
when two engine processes start concurrently; modify it to perform an atomic,
interprocess-safe check-and-set (e.g., acquire a filesystem-based lock or use
atomic create_new of a sentinel file) before calling
environment::read_config_key / environment::set_config_key so only one process
can observe-and-set the first-run flag: update check_and_mark_first_run to
attempt to acquire a lock (or atomically create a "first_run_sent" sentinel
file) then re-check environment::read_config_key, set the key via
environment::set_config_key only if not already set, and release the lock; this
ensures a single process returns true and emits the first_run event.

ytallo added 6 commits March 18, 2026 22:52
…roject context management

- Added `container_runtime` field to `EnvironmentInfo` struct to capture the container runtime type.
- Implemented `detect_container_runtime` function to identify the runtime based on environment variables and system checks.
- Introduced project context management with functions to find project root and read project configuration from `project.ini`.
- Updated telemetry data collection to include project ID and name.
- Refactored telemetry snapshot to provide a flat JSON structure for easier access to metrics.
- Enhanced CLI update functions to send telemetry events for update start, success, and failure.
…vents

- Added telemetry support for installation and upgrade events in the CLI and engine scripts.
- Introduced functions to manage telemetry data, including generating unique IDs and reading/writing to a telemetry configuration file.
- Enhanced error handling to send telemetry events on installation failures.
- Implemented methods to detect installation methods and gather environment information for telemetry.
- Updated the telemetry module to include user properties and installation context in event payloads.
- Replaced the dynamic API key retrieval with a constant API key in the CLI and engine telemetry modules.
- Simplified the event building process by removing unnecessary API key resolution logic.
- Updated telemetry event functions to directly use the constant API key, enhancing clarity and reducing complexity.
- Removed tests related to API key presence, as the new implementation does not require runtime API key configuration.
- Updated telemetry configuration from INI to TOML format for improved structure and parsing.
- Introduced new functions for reading and writing configuration keys in the TOML file.
- Replaced all instances of INI key handling with corresponding TOML functions.
- Added tests to ensure correct serialization and deserialization of TOML configuration data.
- Added support for TOML configuration in CLI and engine scripts, replacing previous INI format.
- Updated functions for reading and writing telemetry keys to work with TOML files.
- Enhanced error handling in installation scripts to include detailed error messages in telemetry events.
- Refactored telemetry data collection to ensure consistency in configuration management.
- Added tests to validate TOML serialization and deserialization processes.
…ate modules

- Reorganized imports in telemetry module for clarity.
- Improved code readability by formatting match statements in update module.
- Enhanced test assertions in telemetry module for better readability.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (18)
cli/src/telemetry.rs (4)

133-152: ⚠️ Potential issue | 🟡 Minor

Only return a new telemetry ID after it is durable.

get_or_create_telemetry_id() returns the freshly generated UUID even if set_toml_key() failed to write or rename the file. That breaks the stable-ID contract on the next run.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/src/telemetry.rs` around lines 133 - 152, get_or_create_telemetry_id
currently generates and returns a new UUID before ensuring persistence; change
it so the function only returns a newly generated id after
set_toml_key("identity", "id", &id) has succeeded (or been atomically moved into
place). Specifically, after creating id with uuid::Uuid::new_v4().to_string(),
call set_toml_key and check its result/return value (or propagate/handle errors)
and only return id on success; on failure, retry or propagate an error/None
instead of returning an unpersisted id. Update callers if necessary to handle
the failure case.

252-259: ⚠️ Potential issue | 🟠 Major

Accept a stable error code instead of raw error text.

This signature makes wording, paths, and upstream error formatting part of the cli_update_failed dimension. Use a small error category/code and keep the full message local.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/src/telemetry.rs` around lines 252 - 259, The telemetry helper
send_cli_update_failed currently embeds raw error text into the event; change
its signature (send_cli_update_failed) to accept a stable error code/category
(e.g., error_code: &str or an ErrorCode enum) instead of the full error string,
update the event payload built by build_event to include "error_code":
error_code (remove or stop sending the raw "error" field), and keep the verbose
error text local (log it or attach to non-telemetry diagnostics). Update all
callers of send_cli_update_failed to pass the stable code and preserve the full
error message only for local logs; ensure detect_install_method and build_event
usage remains unchanged except for the renamed field.

14-19: ⚠️ Potential issue | 🟠 Major

Make telemetry_toml_path() overridable in tests.

build_event() and get_or_create_telemetry_id() still target ~/.iii/telemetry.toml, so this module's tests mutate developer state instead of an isolated temp file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/src/telemetry.rs` around lines 14 - 19, telemetry_toml_path() currently
always points at the real home file; make it test-overridable by first checking
an environment variable (e.g. "III_TELEMETRY_PATH") using std::env::var_os and,
if present, returning that PathBuf, otherwise falling back to the existing
dirs::home_dir() logic; leave build_event() and get_or_create_telemetry_id()
unchanged so they continue to call telemetry_toml_path(), and update tests to
set/restore the env var to a temp file to avoid mutating the developer home
state.

209-223: ⚠️ Potential issue | 🟠 Major

Don't spawn-and-forget update telemetry in a short-lived CLI.

tokio::spawn detaches the POST from the command flow, so cli_update_succeeded / cli_update_failed are easy to lose when the process exits. Make the send path async and await it with a tight timeout from the update flow.

Verify that the send path is detached all the way back to the update entrypoints:

#!/bin/bash
set -euo pipefail

# Expectation: `send_fire_and_forget` uses `tokio::spawn`, and the update flow only calls sync wrappers.
rg -n 'send_fire_and_forget|tokio::spawn|send_cli_update_(started|succeeded|failed)' cli/src/telemetry.rs cli/src/update.rs -C2
rg -n '#\[tokio::main\]|process::exit|std::process::exit' cli/src/main.rs -C2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/src/telemetry.rs` around lines 209 - 223, send_fire_and_forget currently
detaches the HTTP POST via tokio::spawn causing telemetry to be lost when the
CLI exits; change it to an async send function (e.g., async fn
send_event/amplitude_send) that builds the AmplitudePayload and performs the
reqwest post and returns a Result, remove tokio::spawn and AMPLITUDE_ENDPOINT
usage from inside a detached task, and update all callers
(send_cli_update_started / send_cli_update_succeeded / send_cli_update_failed
and any wrappers) to await the async send with a tight timeout
(tokio::time::timeout with a small Duration) from the update flow so the process
waits briefly for delivery before exiting; ensure callers run inside the runtime
(or use block_on if called from sync main) so the awaited send actually
executes.
engine/src/modules/telemetry/mod.rs (5)

936-974: ⚠️ Potential issue | 🟡 Minor

Pin III_PROJECT_ROOT to a temp dir in these tests.

Removing it makes resolve_project_context() walk from the repo CWD, so a real .iii/project.ini will make these tests fail unexpectedly. The same pattern repeats in the later build_user_properties tests.

Also applies to: 1220-1316

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/modules/telemetry/mod.rs` around lines 936 - 974, These tests
remove III_PROJECT_ROOT which lets resolve_project_context() walk the real repo
and pick up a real .iii/project.ini; instead, set III_PROJECT_ROOT to a
temporary empty directory at the start of each test (e.g. create a temp dir and
set env var) and remove/unset it at teardown; apply the same change to the
related build_user_properties tests (the later tests referenced) so both
resolve_project_context and build_user_properties operate against an isolated
temp dir rather than the repository CWD.

362-364: ⚠️ Potential issue | 🟠 Major

Pseudonymize host_user_id before sending it.

III_HOST_USER_ID is a stable cross-runtime identifier sourced from the host. Emitting it raw adds another durable identifier on top of install_id and environment.machine_id.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/modules/telemetry/mod.rs` around lines 362 - 364, The code
currently emits the raw host_user_id (from environment::detect_host_user_id())
into props["host_user_id"]; instead, pseudonymize it before sending by replacing
the raw string with a deterministic, non-reversible identifier (e.g.,
HMAC-SHA256 or salted hash) derived from host_user_id plus a local secret/pepper
or install-scoped id so it stays stable per-install but is not reversible;
update the assignment that sets serde_json::Value::String(host_user_id) to
construct and set serde_json::Value::String(pseudonymized_value) using a helper
(e.g., pseudonymize_host_user_id) and ensure the helper lives in telemetry::mod
or a nearby util so tests/consumers reference the pseudonymization function
rather than the raw value.

165-182: ⚠️ Potential issue | 🟠 Major

Make check_and_mark_first_run() interprocess-safe.

This is still a read-then-write on shared state, so two engines starting together can both send first_run.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/modules/telemetry/mod.rs` around lines 165 - 182, The function
check_and_mark_first_run() is vulnerable to a race (read-then-write) so
concurrent processes can both report first-run; make it interprocess-safe by
performing the check-and-set under an atomic lock or using an atomic filesystem
operation. Modify check_and_mark_first_run to acquire a process-wide lock (e.g.,
a lock file using create_new/OpenOptions or an OS-level file lock via fs2)
before calling environment::read_config_key and environment::set_config_key (or
replace the config key write with an atomic create of a marker file); ensure the
lock is released on all paths and preserve the current boolean behavior (return
true only for the single process that transitions the state). Reference
check_and_mark_first_run, environment::read_config_key,
environment::set_config_key when implementing the locking.

89-101: ⚠️ Potential issue | 🟠 Major

Parse standard key = value lines in project.ini.

The current prefix match still misses spaced assignments, so valid project.ini files resolve to None and drop project context from telemetry.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/modules/telemetry/mod.rs` around lines 89 - 101, The current loop
over contents.lines() uses
strip_prefix("project_id=")/strip_prefix("project_name=") and therefore fails
when the ini uses spaced assignments; change the parsing in that loop to split
each trimmed line on the first '=' (e.g. splitn on '=') then trim both the key
and value, and match against "project_id" and "project_name" to set project_id
and project_name (replacing the strip_prefix logic) so lines like "project_id =
123" are correctly parsed.

666-700: ⚠️ Potential issue | 🟡 Minor

Count non-invocation traffic in is_active.

A heartbeat with API, queue, pubsub, or cron traffic still reports is_active=false, which makes the flag disagree with the rest of this payload.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/modules/telemetry/mod.rs` around lines 666 - 700, The is_active
calculation currently only checks delta_invocations_total; update the logic that
computes is_active in telemetry/mod.rs so it is true if any relevant delta
metric is > 0 (e.g., delta_invocations_total, delta_api_requests,
delta_queue_emits, delta_queue_consumes, delta_pubsub_publishes,
delta_pubsub_subscribes, delta_cron_executions). Locate the assignment to
is_active (near where uptime_secs and the properties JSON are built) and change
it to a boolean expression that ORs those delta_* variables so heartbeat
payloads with API/queue/pubsub/cron activity report is_active = true. Ensure you
reference the same variable names (is_active, delta_invocations_total,
delta_api_requests, delta_queue_emits, delta_queue_consumes,
delta_pubsub_publishes, delta_pubsub_subscribes, delta_cron_executions) to keep
consistency with the properties map.
cli/install.sh (4)

64-76: ⚠️ Potential issue | 🟠 Major

Build/escape telemetry JSON inside the helper and bound the POST.

iii_send_event() still expects pre-serialized JSON fragments, so err() only replacing " leaves backslashes/newlines able to break the payload. Since err() does wait, the background curl also needs short timeouts.

Also applies to: 236-261

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/install.sh` around lines 64 - 76, The telemetry payload is being
assembled outside iii_send_event with only naive quote replacement, allowing
backslashes/newlines to break JSON and leaving the curl background call
unbounded; change to pass raw error text to iii_send_event (stop pre-serializing
JSON fragments in the install/upgrade branches) and implement robust
JSON-escaping inside iii_send_event (e.g., use jq -R -s `@json` or equivalent to
quote the message) when building the final payload, and also bound the POST by
adding curl timeouts (--connect-timeout and --max-time) and proper
background/job handling so err()'s wait won't leave curl hanging.

264-278: ⚠️ Potential issue | 🟠 Major

Don't edit shell profiles without an explicit opt-in.

iii_export_host_user_id() still appends to rc files automatically. Prompt/guard it, or print the export command for manual setup instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/install.sh` around lines 264 - 278, The function iii_export_host_user_id
currently appends an export line into shell profiles without consent; change it
to avoid modifying files automatically by replacing the write logic with an
explicit opt-in: either prompt the user interactively (yes/no) before editing
profiles or, by default, just print the export line and a short instruction
telling the user to add it to their preferred file (e.g., ~/.bashrc, ~/.zshrc,
~/.profile); update iii_export_host_user_id to check an opt-in flag/env (e.g.,
III_AUTO_EXPORT=true) if automation is desired, and only perform the grep+append
into the profile files when consent is given. Ensure the behavior and messages
reference the export string produced by the function.

99-110: ⚠️ Potential issue | 🟠 Major

Honor III_TELEMETRY_DEV in iii_telemetry_enabled().

The installer still ignores the same dev opt-out that cli/src/telemetry.rs respects, so local dev installs keep sending and persisting telemetry.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/install.sh` around lines 99 - 110, The installer function
iii_telemetry_enabled currently ignores the III_TELEMETRY_DEV env var; update
iii_telemetry_enabled() to honor III_TELEMETRY_DEV by checking it (treat "false"
or "0" as opt-out) and returning non-zero when set, before proceeding to CI-env
detection and the existing III_TELEMETRY_ENABLED check; modify the function so
it explicitly tests III_TELEMETRY_DEV and returns 1 when it indicates dev
opt-out (while keeping the existing checks for III_TELEMETRY_ENABLED and the
ci_vars loop intact).

844-865: ⚠️ Potential issue | 🟠 Major

Move telemetry bootstrap behind eligibility, and before earlier failures.

This block still runs unconditionally, so opt-out installs persist telemetry.toml and later export III_HOST_USER_ID, while release-discovery failures above this point still have no telemetry context at all.

Also applies to: 888-888

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/install.sh` around lines 844 - 865, The telemetry bootstrap (creation of
install_id and call to iii_get_or_create_telemetry_id and any iii_send_event
usage) is being executed unconditionally; move the bootstrap and subsequent
iii_send_event calls (the install_id, telemetry_id, install_event_prefix logic
and both iii_send_event "upgrade_started"/"install_started" branches) so they
run only after the opt-in/eligibility check completes and before any earlier
failure points (e.g., release-discovery paths) so that opt-out installs do not
persist telemetry.toml and failures still have telemetry context; update
references to install_id, iii_get_or_create_telemetry_id, install_event_prefix,
and iii_send_event accordingly.
cli/src/update.rs (1)

188-193: ⚠️ Potential issue | 🟠 Major

Emit cli_update_started before the fallible setup work.

Both flows still send the start event only after several fallible branches have succeeded, so failures in release lookup, version parsing, or asset resolution never get a matching start/fail pair.

Also applies to: 271-273

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/src/update.rs` around lines 188 - 193, Move the
telemetry::send_cli_update_started call so it runs immediately after computing
spec.name (i.e., before any fallible setup like release lookup, version parsing,
or asset resolution), ensuring the start event is emitted regardless of
subsequent errors; update both occurrences (the call near
previous_version/as_ref(...) and the duplicate at lines referenced 271-273) to
be invoked early in the update flow (before functions that perform release
lookup, parse versions, or resolve assets) so failures later will have matching
start/fail telemetry pairs.
engine/install.sh (3)

316-318: ⚠️ Potential issue | 🟠 Major

Gate telemetry ID creation on actual eligibility.

iii_get_or_create_telemetry_id() still runs even when telemetry is disabled, so the installer writes ~/.iii/telemetry.toml and the later iii_export_host_user_id() call can turn that into a persistent profile change after an opt-out.

Also applies to: 518-518

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/install.sh` around lines 316 - 318, The installer currently calls
iii_get_or_create_telemetry_id() unconditionally (creating ~/.iii/telemetry.toml
even when telemetry is disabled); change the flow so telemetry_id is only
generated and iii_export_host_user_id() invoked when telemetry is actually
enabled/eligible—i.e., wrap the call to iii_get_or_create_telemetry_id() (and
any subsequent iii_export_host_user_id() usage) in a conditional that checks the
telemetry opt-in/eligibility flag or function (e.g., an existing
telemetry-enabled check), leaving install_id=$(iii_gen_uuid) untouched.

203-214: ⚠️ Potential issue | 🟠 Major

Make iii_export_host_user_id() opt-in.

This still appends to .bashrc / .zshrc / .profile without user consent. Prompt/guard it, or print the exact export line for manual setup instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/install.sh` around lines 203 - 214, The function
iii_export_host_user_id currently appends an export line to shell profiles
without consent; change it to be opt-in by prompting the user (e.g., using a
read -p confirmation) before modifying any of the profiles in
iii_export_host_user_id, and if the session is non-interactive or the user
declines, do not write to files but instead print the exact export line stored
in _export_line with instructions for manual addition; also honor an opt-in
environment flag (e.g., III_AUTO_EXPORT) or detect CI/non-interactive sessions
to skip prompting and only print the line.

13-24: ⚠️ Potential issue | 🟠 Major

Stop accepting pre-encoded JSON fragments in iii_send_event.

iii_send_event() still splices _event_props straight into _payload, so err() only replacing " leaves backslashes/newlines able to break the JSON body. Because err() does a bare wait, the background curl also needs short timeouts or installer exit can block on network delays.

Also applies to: 166-189

engine/src/modules/telemetry/environment.rs (1)

23-36: ⚠️ Potential issue | 🟠 Major

Don't ignore atomic-write failures here.

write_atomic() / set_config_key() still swallow create_dir_all, write, set_permissions, and rename errors, so callers treat install IDs and first-run flags as durable when they may not have been persisted.

Also applies to: 49-59

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/modules/telemetry/environment.rs` around lines 23 - 36, The
write_atomic function currently swallows errors from create_dir_all, write,
set_permissions, and rename which makes persistence appear successful when it
may have failed; change write_atomic to return Result<(), std::io::Error> (or a
crate-specific error) and propagate explicit errors from
std::fs::create_dir_all, std::fs::write, std::fs::set_permissions, and
std::fs::rename instead of calling .ok(); update callers such as set_config_key
to also return/propagate Result and handle or log errors so install IDs /
first-run flags are not treated as durable when the underlying filesystem
operations failed. Ensure to preserve the unix-only permission logic but surface
its errors (e.g., via map_err/?).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cli/src/telemetry.rs`:
- Around line 118-130: In build_user_properties(), do not send the raw
III_HOST_USER_ID; instead read the env var (III_HOST_USER_ID), compute a stable
one-way hash (e.g., SHA-256 hex) of its value and serialize that hashed string
into the "host_user_id" field (preserving the Option semantics if the env var is
missing). Update the host_user_id construction so it maps
std::env::var("III_HOST_USER_ID").ok() through a hashing function (use a
cryptographic hash library available in the project) and return the hashed hex
string or None.
- Around line 84-90: detect_machine_id currently reads the HOSTNAME env var and
returns only 8 bytes of the SHA-256 hash, causing mismatch with engine telemetry
and collapsing hosts that lack the env var; update the detect_machine_id
function to obtain the real OS hostname (e.g. via hostname::get() or equivalent
used by engine) instead of std::env::var("HOSTNAME"), compute SHA-256 over that
value, and return the first 16 bytes (32 hex characters) of the digest so CLI
machine_id aligns with engine/src/modules/telemetry/environment.rs.

In `@cli/src/update.rs`:
- Around line 195-215: The new match blocks in update.rs are not formatted
according to project Rust style; run rustfmt (cargo fmt) to reformat the match
expressions around the download::download_and_install(...) call and the similar
block at lines 279-297, ensuring consistent indentation and line breaks for the
Ok(()) and Err(e) arms (which call state.record_install,
telemetry::send_cli_update_succeeded, telemetry::send_cli_update_failed, and
return UpdateResult::Updated / UpdateError::Download). After formatting, re-run
CI to confirm the match blocks compile and the diff is clean.

In `@engine/install.sh`:
- Around line 33-42: The iii_telemetry_enabled() function currently ignores the
III_TELEMETRY_DEV variable, so add an explicit check that returns 1 when
III_TELEMETRY_DEV is set to opt-out (e.g., "true" or "1") before the CI checks;
specifically, in iii_telemetry_enabled() test the value of
${III_TELEMETRY_DEV:-} with a case like true|1) return 1 ;; placed at the top of
the function so local dev opt-out takes precedence over other environment
checks.

---

Duplicate comments:
In `@cli/install.sh`:
- Around line 64-76: The telemetry payload is being assembled outside
iii_send_event with only naive quote replacement, allowing backslashes/newlines
to break JSON and leaving the curl background call unbounded; change to pass raw
error text to iii_send_event (stop pre-serializing JSON fragments in the
install/upgrade branches) and implement robust JSON-escaping inside
iii_send_event (e.g., use jq -R -s `@json` or equivalent to quote the message)
when building the final payload, and also bound the POST by adding curl timeouts
(--connect-timeout and --max-time) and proper background/job handling so err()'s
wait won't leave curl hanging.
- Around line 264-278: The function iii_export_host_user_id currently appends an
export line into shell profiles without consent; change it to avoid modifying
files automatically by replacing the write logic with an explicit opt-in: either
prompt the user interactively (yes/no) before editing profiles or, by default,
just print the export line and a short instruction telling the user to add it to
their preferred file (e.g., ~/.bashrc, ~/.zshrc, ~/.profile); update
iii_export_host_user_id to check an opt-in flag/env (e.g., III_AUTO_EXPORT=true)
if automation is desired, and only perform the grep+append into the profile
files when consent is given. Ensure the behavior and messages reference the
export string produced by the function.
- Around line 99-110: The installer function iii_telemetry_enabled currently
ignores the III_TELEMETRY_DEV env var; update iii_telemetry_enabled() to honor
III_TELEMETRY_DEV by checking it (treat "false" or "0" as opt-out) and returning
non-zero when set, before proceeding to CI-env detection and the existing
III_TELEMETRY_ENABLED check; modify the function so it explicitly tests
III_TELEMETRY_DEV and returns 1 when it indicates dev opt-out (while keeping the
existing checks for III_TELEMETRY_ENABLED and the ci_vars loop intact).
- Around line 844-865: The telemetry bootstrap (creation of install_id and call
to iii_get_or_create_telemetry_id and any iii_send_event usage) is being
executed unconditionally; move the bootstrap and subsequent iii_send_event calls
(the install_id, telemetry_id, install_event_prefix logic and both
iii_send_event "upgrade_started"/"install_started" branches) so they run only
after the opt-in/eligibility check completes and before any earlier failure
points (e.g., release-discovery paths) so that opt-out installs do not persist
telemetry.toml and failures still have telemetry context; update references to
install_id, iii_get_or_create_telemetry_id, install_event_prefix, and
iii_send_event accordingly.

In `@cli/src/telemetry.rs`:
- Around line 133-152: get_or_create_telemetry_id currently generates and
returns a new UUID before ensuring persistence; change it so the function only
returns a newly generated id after set_toml_key("identity", "id", &id) has
succeeded (or been atomically moved into place). Specifically, after creating id
with uuid::Uuid::new_v4().to_string(), call set_toml_key and check its
result/return value (or propagate/handle errors) and only return id on success;
on failure, retry or propagate an error/None instead of returning an unpersisted
id. Update callers if necessary to handle the failure case.
- Around line 252-259: The telemetry helper send_cli_update_failed currently
embeds raw error text into the event; change its signature
(send_cli_update_failed) to accept a stable error code/category (e.g.,
error_code: &str or an ErrorCode enum) instead of the full error string, update
the event payload built by build_event to include "error_code": error_code
(remove or stop sending the raw "error" field), and keep the verbose error text
local (log it or attach to non-telemetry diagnostics). Update all callers of
send_cli_update_failed to pass the stable code and preserve the full error
message only for local logs; ensure detect_install_method and build_event usage
remains unchanged except for the renamed field.
- Around line 14-19: telemetry_toml_path() currently always points at the real
home file; make it test-overridable by first checking an environment variable
(e.g. "III_TELEMETRY_PATH") using std::env::var_os and, if present, returning
that PathBuf, otherwise falling back to the existing dirs::home_dir() logic;
leave build_event() and get_or_create_telemetry_id() unchanged so they continue
to call telemetry_toml_path(), and update tests to set/restore the env var to a
temp file to avoid mutating the developer home state.
- Around line 209-223: send_fire_and_forget currently detaches the HTTP POST via
tokio::spawn causing telemetry to be lost when the CLI exits; change it to an
async send function (e.g., async fn send_event/amplitude_send) that builds the
AmplitudePayload and performs the reqwest post and returns a Result, remove
tokio::spawn and AMPLITUDE_ENDPOINT usage from inside a detached task, and
update all callers (send_cli_update_started / send_cli_update_succeeded /
send_cli_update_failed and any wrappers) to await the async send with a tight
timeout (tokio::time::timeout with a small Duration) from the update flow so the
process waits briefly for delivery before exiting; ensure callers run inside the
runtime (or use block_on if called from sync main) so the awaited send actually
executes.

In `@cli/src/update.rs`:
- Around line 188-193: Move the telemetry::send_cli_update_started call so it
runs immediately after computing spec.name (i.e., before any fallible setup like
release lookup, version parsing, or asset resolution), ensuring the start event
is emitted regardless of subsequent errors; update both occurrences (the call
near previous_version/as_ref(...) and the duplicate at lines referenced 271-273)
to be invoked early in the update flow (before functions that perform release
lookup, parse versions, or resolve assets) so failures later will have matching
start/fail telemetry pairs.

In `@engine/install.sh`:
- Around line 316-318: The installer currently calls
iii_get_or_create_telemetry_id() unconditionally (creating ~/.iii/telemetry.toml
even when telemetry is disabled); change the flow so telemetry_id is only
generated and iii_export_host_user_id() invoked when telemetry is actually
enabled/eligible—i.e., wrap the call to iii_get_or_create_telemetry_id() (and
any subsequent iii_export_host_user_id() usage) in a conditional that checks the
telemetry opt-in/eligibility flag or function (e.g., an existing
telemetry-enabled check), leaving install_id=$(iii_gen_uuid) untouched.
- Around line 203-214: The function iii_export_host_user_id currently appends an
export line to shell profiles without consent; change it to be opt-in by
prompting the user (e.g., using a read -p confirmation) before modifying any of
the profiles in iii_export_host_user_id, and if the session is non-interactive
or the user declines, do not write to files but instead print the exact export
line stored in _export_line with instructions for manual addition; also honor an
opt-in environment flag (e.g., III_AUTO_EXPORT) or detect CI/non-interactive
sessions to skip prompting and only print the line.

In `@engine/src/modules/telemetry/environment.rs`:
- Around line 23-36: The write_atomic function currently swallows errors from
create_dir_all, write, set_permissions, and rename which makes persistence
appear successful when it may have failed; change write_atomic to return
Result<(), std::io::Error> (or a crate-specific error) and propagate explicit
errors from std::fs::create_dir_all, std::fs::write, std::fs::set_permissions,
and std::fs::rename instead of calling .ok(); update callers such as
set_config_key to also return/propagate Result and handle or log errors so
install IDs / first-run flags are not treated as durable when the underlying
filesystem operations failed. Ensure to preserve the unix-only permission logic
but surface its errors (e.g., via map_err/?).

In `@engine/src/modules/telemetry/mod.rs`:
- Around line 936-974: These tests remove III_PROJECT_ROOT which lets
resolve_project_context() walk the real repo and pick up a real
.iii/project.ini; instead, set III_PROJECT_ROOT to a temporary empty directory
at the start of each test (e.g. create a temp dir and set env var) and
remove/unset it at teardown; apply the same change to the related
build_user_properties tests (the later tests referenced) so both
resolve_project_context and build_user_properties operate against an isolated
temp dir rather than the repository CWD.
- Around line 362-364: The code currently emits the raw host_user_id (from
environment::detect_host_user_id()) into props["host_user_id"]; instead,
pseudonymize it before sending by replacing the raw string with a deterministic,
non-reversible identifier (e.g., HMAC-SHA256 or salted hash) derived from
host_user_id plus a local secret/pepper or install-scoped id so it stays stable
per-install but is not reversible; update the assignment that sets
serde_json::Value::String(host_user_id) to construct and set
serde_json::Value::String(pseudonymized_value) using a helper (e.g.,
pseudonymize_host_user_id) and ensure the helper lives in telemetry::mod or a
nearby util so tests/consumers reference the pseudonymization function rather
than the raw value.
- Around line 165-182: The function check_and_mark_first_run() is vulnerable to
a race (read-then-write) so concurrent processes can both report first-run; make
it interprocess-safe by performing the check-and-set under an atomic lock or
using an atomic filesystem operation. Modify check_and_mark_first_run to acquire
a process-wide lock (e.g., a lock file using create_new/OpenOptions or an
OS-level file lock via fs2) before calling environment::read_config_key and
environment::set_config_key (or replace the config key write with an atomic
create of a marker file); ensure the lock is released on all paths and preserve
the current boolean behavior (return true only for the single process that
transitions the state). Reference check_and_mark_first_run,
environment::read_config_key, environment::set_config_key when implementing the
locking.
- Around line 89-101: The current loop over contents.lines() uses
strip_prefix("project_id=")/strip_prefix("project_name=") and therefore fails
when the ini uses spaced assignments; change the parsing in that loop to split
each trimmed line on the first '=' (e.g. splitn on '=') then trim both the key
and value, and match against "project_id" and "project_name" to set project_id
and project_name (replacing the strip_prefix logic) so lines like "project_id =
123" are correctly parsed.
- Around line 666-700: The is_active calculation currently only checks
delta_invocations_total; update the logic that computes is_active in
telemetry/mod.rs so it is true if any relevant delta metric is > 0 (e.g.,
delta_invocations_total, delta_api_requests, delta_queue_emits,
delta_queue_consumes, delta_pubsub_publishes, delta_pubsub_subscribes,
delta_cron_executions). Locate the assignment to is_active (near where
uptime_secs and the properties JSON are built) and change it to a boolean
expression that ORs those delta_* variables so heartbeat payloads with
API/queue/pubsub/cron activity report is_active = true. Ensure you reference the
same variable names (is_active, delta_invocations_total, delta_api_requests,
delta_queue_emits, delta_queue_consumes, delta_pubsub_publishes,
delta_pubsub_subscribes, delta_cron_executions) to keep consistency with the
properties map.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 71eb33fe-9f39-4d3b-9a6f-ccb122f54889

📥 Commits

Reviewing files that changed from the base of the PR and between 95c3440 and 710e033.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • cli/Cargo.toml
  • cli/install.sh
  • cli/src/main.rs
  • cli/src/telemetry.rs
  • cli/src/update.rs
  • engine/Cargo.toml
  • engine/Dockerfile
  • engine/docker-compose.prod.yml
  • engine/docker-compose.yml
  • engine/install.sh
  • engine/src/modules/telemetry/collector.rs
  • engine/src/modules/telemetry/environment.rs
  • engine/src/modules/telemetry/mod.rs
✅ Files skipped from review due to trivial changes (5)
  • engine/Cargo.toml
  • cli/Cargo.toml
  • cli/src/main.rs
  • engine/docker-compose.yml
  • engine/docker-compose.prod.yml
🚧 Files skipped from review as they are similar to previous changes (1)
  • engine/Dockerfile

Comment on lines +118 to +130
fn build_user_properties() -> serde_json::Value {
serde_json::json!({
"environment.os": std::env::consts::OS,
"environment.arch": std::env::consts::ARCH,
"environment.cpu_cores": std::thread::available_parallelism().map(|n| n.get()).unwrap_or(1),
"environment.timezone": std::env::var("TZ").unwrap_or_else(|_| "Unknown".to_string()),
"environment.machine_id": detect_machine_id(),
"environment.is_container": detect_is_container(),
"env": std::env::var("III_ENV").unwrap_or_else(|_| "unknown".to_string()),
"install_method": detect_install_method(),
"cli_version": env!("CARGO_PKG_VERSION"),
"host_user_id": std::env::var("III_HOST_USER_ID").ok(),
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Hash III_HOST_USER_ID before serializing it.

This value is a stable host-scoped correlation ID. Sending it verbatim adds another durable identifier alongside device_id and environment.machine_id.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/src/telemetry.rs` around lines 118 - 130, In build_user_properties(), do
not send the raw III_HOST_USER_ID; instead read the env var (III_HOST_USER_ID),
compute a stable one-way hash (e.g., SHA-256 hex) of its value and serialize
that hashed string into the "host_user_id" field (preserving the Option
semantics if the env var is missing). Update the host_user_id construction so it
maps std::env::var("III_HOST_USER_ID").ok() through a hashing function (use a
cryptographic hash library available in the project) and return the hashed hex
string or None.

Comment on lines +33 to +42
iii_telemetry_enabled() {
case "${III_TELEMETRY_ENABLED:-}" in
false|0) return 1 ;;
esac
for ci_var in CI GITHUB_ACTIONS GITLAB_CI CIRCLECI JENKINS_URL TRAVIS BUILDKITE TF_BUILD CODEBUILD_BUILD_ID BITBUCKET_BUILD_NUMBER DRONE TEAMCITY_VERSION; do
if [ -n "$(eval "echo \${${ci_var}:-}")" ]; then
return 1
fi
done
return 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Respect III_TELEMETRY_DEV in iii_telemetry_enabled().

The runtime already treats III_TELEMETRY_DEV=true as an opt-out, but this gate still enables telemetry and persists IDs during local dev installs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/install.sh` around lines 33 - 42, The iii_telemetry_enabled() function
currently ignores the III_TELEMETRY_DEV variable, so add an explicit check that
returns 1 when III_TELEMETRY_DEV is set to opt-out (e.g., "true" or "1") before
the CI checks; specifically, in iii_telemetry_enabled() test the value of
${III_TELEMETRY_DEV:-} with a case like true|1) return 1 ;; placed at the top of
the function so local dev opt-out takes precedence over other environment
checks.

@anthonyiscoding
Copy link
Contributor

Here's my review, overall looks great, just a few things remain

  • Old API key is still hardcoded everywhere — the new Amplitude project + key swap listed as a prerequisite hasn't been done yet. (Unsure if this matters)
  • Heartbeat delta_ prefix diverges from PRD naming — update the PRD to match or align the code.
  • iii-tools project telemetry (section 4b) is not implemented — no project_created/project_initialized events and no .iii/project.ini generation, breaking the onboarding funnel.
  • engine_started is missing worker_count_by_language — the heartbeat includes it but the started event doesn't.
  • error_stage and error_message are set to the same value in both install scripts — error_stage should be the pipeline location (e.g. "download", "extract", "install"), not a copy of the error text.
  • Dockerfiles are missing their updates in the iii quickstart
    • adding ENV III_CONTAINER=docker and ENV III_ENV=development to:
      • cli-tooling/templates/iii/quickstart/services/client/Dockerfile
      • cli-tooling/templates/iii/quickstart/services/compute-service/Dockerfile
      • cli-tooling/templates/iii/quickstart/services/data-service/Dockerfile
      • cli-tooling/templates/iii/quickstart/services/payment-service/Dockerfile
  • And III_HOST_USER_ID to:
    • cli-tooling/templates/iii/quickstart/docker-compose.yaml
  • Autogenerated telemetry_id should use an auto- prefix (e.g. auto-299a8abd-...) so it can be trivially distinguished from real user IDs when accounts ship.

- Add worker_count_by_language to engine_started event (was only in heartbeat)
- Remove delta_ prefix from heartbeat properties to match PRD naming
- Remove duplicate cumulative counters from heartbeat (delta values are canonical)
- Add auto- prefix to newly generated telemetry_id for future user ID disambiguation
- Refactor err() in both install scripts to accept error_stage as first parameter
- Assign semantic stage names (args, dependency, platform, download, checksum, extract, binary_lookup, install) to all error call sites
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants