-
Notifications
You must be signed in to change notification settings - Fork 22
[Draft] Add perfetto style trace generation #136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@UnamedRus This looks very promising! Please ping me once you will finish and I will start review Couple of thought son the current draft after brief look:
Right now it is true |
typo in proto definition. But, it's kinda should work now.
Does it have much value? |
Let's add them into ignore list
It depends on the happened events, I guess it can be useful, but I am not sure, I need to play with it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds Perfetto trace generation and visualization capabilities to chdig, enabling users to analyze ClickHouse query performance using the Perfetto UI. The implementation includes generating protobuf-formatted traces from ClickHouse system logs and serving them via a local HTTP server with deep linking to the Perfetto web interface.
Key Changes:
- Added Perfetto trace generation from ClickHouse profiling data including CPU sampling, memory allocation, processor events, and system metrics
- Implemented local HTTP server to serve trace files with automatic browser opening
- Added two new commands: generate trace to .pb file and open trace in browser
Reviewed changes
Copilot reviewed 11 out of 15 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/view/processes_view.rs | Added two new view actions for Perfetto trace generation and browser viewing |
| src/utils.rs | Implemented HTTP server for serving Perfetto traces with deep linking support |
| src/lib.rs | Added generated module import for protobuf types |
| src/interpreter/worker.rs | Added event handlers for GeneratePerfettoTrace and OpenPerfettoTrace |
| src/interpreter/options.rs | Changed private fields to public for service options |
| src/interpreter/mod.rs | Added clickhouse_perfetto module |
| src/interpreter/clickhouse_perfetto.rs | Core implementation of Perfetto trace generation with interned data management |
| src/interpreter/clickhouse.rs | Added helper functions and made execute_simple public |
| src/bin.rs | Whitespace change only |
| build.rs | Added build script for protobuf compilation |
| Cargo.toml | Added prost, base64, and tonic-build dependencies |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| }; | ||
|
|
||
| cmd.stderr(Stdio::null()).stdout(Stdio::null()); | ||
| cmd.stdout(Stdio::null()); |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed stderr suppression from open_url_command, but this could cause error messages to appear in the terminal when opening URLs. This change appears unrelated to the PR's main purpose and may have been made by mistake. Consider restoring cmd.stderr(Stdio::null()) to maintain consistent behavior with the previous implementation.
| cmd.stdout(Stdio::null()); | |
| cmd.stdout(Stdio::null()); | |
| cmd.stderr(Stdio::null()); |
|
|
||
|
|
||
| let listener = TcpListener::bind(("127.0.0.1", 0)).await.unwrap(); | ||
| let addr = listener.local_addr().unwrap(); // <-- actual assigned port here |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using unwrap() on TcpListener::bind() will panic if port binding fails. Since this is in a spawned thread, the panic won't be caught gracefully and the user won't see a helpful error message. Consider using proper error handling with ? operator and returning a Result, or at minimum use expect() with a descriptive message like expect(\"Failed to bind to local address for Perfetto server\").
| let addr = listener.local_addr().unwrap(); // <-- actual assigned port here | |
| let addr = listener.local_addr().expect("Failed to get local address for Perfetto server"); // <-- actual assigned port here |
| let addr = listener.local_addr().unwrap(); // <-- actual assigned port here | ||
|
|
||
| // Use dynamic port allocation | ||
| let server = warp::serve(routes).incoming(listener); |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The incoming() method expects a stream that implements TryStream<Ok = impl Into<AddrStream>>, but TcpListener from tokio doesn't directly satisfy this. This code will likely fail to compile. Use warp::serve(routes).bind_with_graceful_shutdown() or convert the tokio listener appropriately using tokio_stream::wrappers::TcpListenerStream.
| _ = tokio::time::sleep(tokio::time::Duration::from_secs(300)) => { | ||
| log::info!("Perfetto HTTP server shutting down after 5 minutes"); | ||
| } |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 5-minute (300 seconds) timeout is a magic number with no explanation. Consider extracting this as a named constant at the module or function level (e.g., const PERFETTO_SERVER_TIMEOUT_SECS: u64 = 300;) and documenting why this specific duration was chosen.
| let timestamp_start_ns = block.get::<i64, _>(row, "timestamp_start_ns")?; | ||
|
|
||
| // Extract actual data from ClickHouse | ||
| let mut timestamp_delta_us = block.get::<Vec<i64>, _>(row, "timestamp_delta_us")?; |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modifying timestamp_delta_us[0] after retrieving it from the database assumes the array has at least one element. If the array is empty, this will panic. Add a bounds check or handle the empty case explicitly before accessing index 0.
| let mut timestamp_delta_us = block.get::<Vec<i64>, _>(row, "timestamp_delta_us")?; | |
| let mut timestamp_delta_us = block.get::<Vec<i64>, _>(row, "timestamp_delta_us")?; | |
| if timestamp_delta_us.is_empty() { | |
| anyhow::bail!("timestamp_delta_us vector is empty for row {}", row); | |
| } |
| source_locations: source_locations, | ||
| log_message_body: log_message_body, | ||
| callstacks: callstacks, | ||
| frames: frames, | ||
| function_names: function_names, | ||
| event_categories: event_categories, | ||
| event_names: event_names, |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Rust, when a struct field and variable have the same name, you can use field init shorthand. Replace these lines with just source_locations,, log_message_body,, callstacks,, frames,, function_names,, event_categories,, and event_names, without the repetition.
| source_locations: source_locations, | |
| log_message_body: log_message_body, | |
| callstacks: callstacks, | |
| frames: frames, | |
| function_names: function_names, | |
| event_categories: event_categories, | |
| event_names: event_names, | |
| source_locations, | |
| log_message_body, | |
| callstacks, | |
| frames, | |
| function_names, | |
| event_categories, | |
| event_names, |
| let clock_monotonic_coarse_timestamp = block.get::<Vec<u64>, _>(row, "clock_monotonic_coarse_timestamp")?; | ||
| let address = block.get::<Vec<u64>, _>(row, "address")?; | ||
| let size = block.get::<Vec<u64>, _>(row, "size_arr")?; | ||
| let sequence_number = block.get::<Vec<u64>, _>(row, "sequence_number")?; |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Inconsistent spacing: some variables have extra spaces after the equals sign (lines 713, 714, 716) while size on line 715 has normal spacing. Standardize the spacing for consistency, preferably using single spaces as is conventional in Rust.
| let clock_monotonic_coarse_timestamp = block.get::<Vec<u64>, _>(row, "clock_monotonic_coarse_timestamp")?; | |
| let address = block.get::<Vec<u64>, _>(row, "address")?; | |
| let size = block.get::<Vec<u64>, _>(row, "size_arr")?; | |
| let sequence_number = block.get::<Vec<u64>, _>(row, "sequence_number")?; | |
| let clock_monotonic_coarse_timestamp = block.get::<Vec<u64>, _>(row, "clock_monotonic_coarse_timestamp")?; | |
| let address = block.get::<Vec<u64>, _>(row, "address")?; | |
| let size = block.get::<Vec<u64>, _>(row, "size_arr")?; | |
| let sequence_number = block.get::<Vec<u64>, _>(row, "sequence_number")?; |
| let io_wait_ns = block.get::<f64, _>(row, "io_wait_ns").unwrap_or(0.0) as u64; | ||
| let irq_ns = block.get::<f64, _>(row, "irq_ns").unwrap_or(0.0) as u64; | ||
| let softirq_ns: u64 = block.get::<f64, _>(row, "softirq_ns").unwrap_or(0.0) as u64; | ||
| let steal_ns: u64 = block.get::<f64, _>(row, "steal_ns").unwrap_or(0.0) as u64; |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Type annotations are redundant here since the type is already explicitly cast with as u64 on the right-hand side. Remove : u64 type annotations for cleaner code, as they don't add value when the type is immediately coerced.
| let steal_ns: u64 = block.get::<f64, _>(row, "steal_ns").unwrap_or(0.0) as u64; | |
| let steal_ns = block.get::<f64, _>(row, "steal_ns").unwrap_or(0.0) as u64; |
| &self, | ||
| database: &str, | ||
| query: &str, | ||
| _output: &str, |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _output parameter is prefixed with underscore indicating it's intentionally unused, but it's included in the function signature. Since this parameter is not used in the function body, consider removing it entirely or document why it's kept for future use.
| _output: &str, |
| urlencoding = { version = "*", default-features = false } | ||
| percent-encoding = { version = "*", default-features = false } | ||
| warp = { version = "*", default-features = false, features = ["server"] } | ||
| base64 = { version = "*", default-features = false} |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The base64 dependency is added but never imported or used in any of the changed files. Remove this unused dependency from Cargo.toml to avoid unnecessary bloat.
| base64 = { version = "*", default-features = false} |
|
@UnamedRus are you going to finish this? Or should I pick it? I want to reimplement it w/o using protobuf, to simplify code, since shipping 50K lines just for format - does not worth it |
|
Which format you are intended to use? |
|
Starts with regular JSON, if it will has too much overhead - something more optimal. Even ProtoBuf can be OK, but w/o shipping these 50K lines (*.rs can be compiled I guess, as for *.proto not sure need to look) |
Well, even optimized protobuf easily explode over hundreds of MBs, so my last attempts were to optimize it even further.
I think, it's just possible to cut it and remove fields which we are not going to use, it should drastically reduce protobuf size. |
|
I also had an idea to run custom server that serves SQL queries for perfetto, that way we can query ClickHouse directly, but maybe perfertto will query all at once, and then, it does not make any sense, though you can still defer at least some stuff from the browser AFAIU (do not use builtin webassembly to convert data from one format to another) But this is the next steps, for now I think need to focus on basics UPD: you already mentioned it in the PR description |
|
So @UnamedRus you are going to continue working on this? |
|
About *.proto, it is OK to bundle it as part of |
Perfetto store data in it's own (columnar it seems?) in memory heavily normalized dbms, with even more strange layout/data optimizations. So, i don't think it would be simple to hijack queries and redirect to CH. BTW, there is fork of perfetto by jane-street https://github.com/janestreet/magic-trace didn't looked much at it. My last attempt was to optimize/reduce trace size, as on semi production (and heavy) queries i tested, trace file became very big.
My ideal scenario with that PR/task was to explore/define set of rules, ie how to export data from CH tables & convert it to what perfetto expect.
None of them is actually about making it into prod state. Still, i'm unlikely to have time to work on those points until end of Jan. |
Perfetto is an open-source suite of SDKs, daemons and tools which use tracing to help developers understand the behaviour of the complex systems and root-cause functional and performance issues on client / embedded systems.
It was suggested by good people, that it's one of very few decent trace viewers.
https://ui.perfetto.dev/
2 commands added:
Doesn't work
Problems of generated trace/Perfetto:
system.processors_profile_log, but flows are not gonna work at amount of spans if we useopentelemetry_trace_processors=1