[Draft] Add perfetto style trace generation #136

UnamedRus · 2025-08-03T22:53:23Z

Perfetto is an open-source suite of SDKs, daemons and tools which use tracing to help developers understand the behaviour of the complex systems and root-cause functional and performance issues on client / embedded systems.

It was suggested by good people, that it's one of very few decent trace viewers.
https://ui.perfetto.dev/

2 commands added:

Generate Perfetto .pb file
Open in Perfetto, (start http server which serve simple page and trace.pb file, which implement deeplinking to ui.perfetto.dev)

Doesn't work

Seems, events are sync? so, no other events are running during trace generation. (And it's takes a while)
Big traces (>200MB-400MB) better to be processed by TraceProcessor server, not using WASM in ui.perfetto.dev https://perfetto.dev/docs/visualization/large-traces

Problems of generated trace/Perfetto:

Cant filter counters by categories
Particular remote processor event doesn't know from which node it's consumed data opentelemetry_span_log propagate original trace_id (and initial_query_id) for Distributed queries ClickHouse/ClickHouse#77375 opentelemetry_span_log connect to processors_profile_log ClickHouse/ClickHouse#77395
clock_sync_failure during import of stack_traces Clock synchronization column in system.query_log ClickHouse/ClickHouse#78234
No memory tracing
No flows out of system.processors_profile_log, but flows are not gonna work at amount of spans if we use opentelemetry_trace_processors=1
Use of threads vs processors for query threads.
Better deduplication for internedData

azat · 2025-08-04T09:39:48Z

@UnamedRus This looks very promising! Please ping me once you will finish and I will start review

Couple of thought son the current draft after brief look:

It is OK for now to execute the query again, but I am not sure that I like this (since any other actions do not do this, but this is another story, since it requires lots of info), let's at least underline this in the actions name
I see the reason for having separate action to store the profile data on disk, let's keep it for now, but not sure that this is a way to go
you may rely on symbols/lines as we do for system.trace_log over using trace
Can we render stacktraces for ProfileEvents changes in UI?
I guess we will also need to tune the UI to make it even better!

Seems, events are sync? so, no other events are running during trace generation. (And it's takes a while)

Right now it is true

UnamedRus · 2025-08-31T22:08:55Z

pull_request / Spell Check with Typos (pull_request)Failing after 8s

typo in proto definition.

But, it's kinda should work now.
Trace size still an issue, as traces over 500MB doesn't work in browser and they need to be processed using extra tool - trace_processor which run as server docs and UI connects to it.

Can we render stacktraces for ProfileEvents changes in UI?

Does it have much value?
StreamingStackTraces belongs to particular track or thread_id even, and i didn't figure out nice way to show multiple types of them, like Real/CPU yet per thread.

azat · 2025-09-01T09:47:07Z

typo in proto definition.

Let's add them into ignore list

Does it have much value?

It depends on the happened events, I guess it can be useful, but I am not sure, I need to play with it.
One it will be ready form your side ping me and I will start looking into it.

Copilot

Pull request overview

This PR adds Perfetto trace generation and visualization capabilities to chdig, enabling users to analyze ClickHouse query performance using the Perfetto UI. The implementation includes generating protobuf-formatted traces from ClickHouse system logs and serving them via a local HTTP server with deep linking to the Perfetto web interface.

Key Changes:

Added Perfetto trace generation from ClickHouse profiling data including CPU sampling, memory allocation, processor events, and system metrics
Implemented local HTTP server to serve trace files with automatic browser opening
Added two new commands: generate trace to .pb file and open trace in browser

Reviewed changes

Copilot reviewed 11 out of 15 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
src/view/processes_view.rs	Added two new view actions for Perfetto trace generation and browser viewing
src/utils.rs	Implemented HTTP server for serving Perfetto traces with deep linking support
src/lib.rs	Added generated module import for protobuf types
src/interpreter/worker.rs	Added event handlers for GeneratePerfettoTrace and OpenPerfettoTrace
src/interpreter/options.rs	Changed private fields to public for service options
src/interpreter/mod.rs	Added clickhouse_perfetto module
src/interpreter/clickhouse_perfetto.rs	Core implementation of Perfetto trace generation with interned data management
src/interpreter/clickhouse.rs	Added helper functions and made execute_simple public
src/bin.rs	Whitespace change only
build.rs	Added build script for protobuf compilation
Cargo.toml	Added prost, base64, and tonic-build dependencies

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-30T14:42:19Z

src/utils.rs

    };

-    cmd.stderr(Stdio::null()).stdout(Stdio::null());
+    cmd.stdout(Stdio::null());


Removed stderr suppression from open_url_command, but this could cause error messages to appear in the terminal when opening URLs. This change appears unrelated to the PR's main purpose and may have been made by mistake. Consider restoring cmd.stderr(Stdio::null()) to maintain consistent behavior with the previous implementation.

Suggested change

cmd.stdout(Stdio::null());

cmd.stdout(Stdio::null());

cmd.stderr(Stdio::null());

Copilot · 2025-11-30T14:42:19Z

src/utils.rs

+
+
+            let listener = TcpListener::bind(("127.0.0.1", 0)).await.unwrap();
+            let addr = listener.local_addr().unwrap(); // <-- actual assigned port here


Using unwrap() on TcpListener::bind() will panic if port binding fails. Since this is in a spawned thread, the panic won't be caught gracefully and the user won't see a helpful error message. Consider using proper error handling with ? operator and returning a Result, or at minimum use expect() with a descriptive message like expect(\"Failed to bind to local address for Perfetto server\").

Suggested change

let addr = listener.local_addr().unwrap(); // <-- actual assigned port here

let addr = listener.local_addr().expect("Failed to get local address for Perfetto server"); // <-- actual assigned port here

Copilot · 2025-11-30T14:42:19Z

src/utils.rs

+            let addr = listener.local_addr().unwrap(); // <-- actual assigned port here
+
+            // Use dynamic port allocation
+            let server = warp::serve(routes).incoming(listener);


The incoming() method expects a stream that implements TryStream<Ok = impl Into<AddrStream>>, but TcpListener from tokio doesn't directly satisfy this. This code will likely fail to compile. Use warp::serve(routes).bind_with_graceful_shutdown() or convert the tokio listener appropriately using tokio_stream::wrappers::TcpListenerStream.

Copilot · 2025-11-30T14:42:20Z

src/utils.rs

+                _ = tokio::time::sleep(tokio::time::Duration::from_secs(300)) => {
+                    log::info!("Perfetto HTTP server shutting down after 5 minutes");
+                }


The 5-minute (300 seconds) timeout is a magic number with no explanation. Consider extracting this as a named constant at the module or function level (e.g., const PERFETTO_SERVER_TIMEOUT_SECS: u64 = 300;) and documenting why this specific duration was chosen.

Copilot · 2025-11-30T14:42:20Z