diff --git a/deps/AIP-0003-visualization.md b/deps/AIP-0003-visualization.md new file mode 100644 index 00000000..ab171c52 --- /dev/null +++ b/deps/AIP-0003-visualization.md @@ -0,0 +1,597 @@ +# Dynamo AIPerf Visualization Feature and Plot Command + +**Status**: Under Review + +**Authors**: [ilana-n] + +**Category**: Feature + +**Replaces**: Gen AIPerf Compare Command + +**Replaced By**: N/A + +**Sponsor**: [TBD] + +**Required Reviewers**: [ganesh-k, lkomali] + +**Review Date**: [TBD] + +**Pull Request**: [TBD] + +**Implementation PR / Tracking Issue**: [TBD] + +# Summary + +Introduce `aiperf plot` command to generate static and interactive visualizations from profiling results. This addresses the common need to visualize performance metrics, compare multiple runs, and generate pareto curves for analysis and reporting. + +# Motivation + +Users frequently run profiling experiments across multiple configurations to understand performance trade-offs. Currently, there is no built-in way to visualize AIPerf results, so users have to use external tools to create plots. + +Common visualization needs include: +1. **Comparing multiple runs** - Understanding how different configurations (concurrency, model size, etc.) affect performance. +2. **Analyzing single runs** - Identifying anomalies, warmup effects, and performance trends over time. + +AIPerf can significantly improve the user experience by providing built-in visualization capabilities that work seamlessly with existing profiling workflows. + +## Goals + +* Support static artifact generation as default (PNG). +* Support an interactive visualization option where users can dynamically select metrics and plot types in the browser (HTML and locally hosted options). +* Enable comparison plots of multiple profiling runs with pareto curves, scatter/line plots. +* Enable deep-dive analysis of single runs with time series and distribution plots. +* Allow users to configure which plots are generated by default and also customize their own plot presets. + +## Non Goals + +* Real-time visualization during profiling (plots are generated post-profiling). +* Integration with external databases. + +### REQ 1 Plot Command + +AIPerf **MUST** provide a `plot` command that supports multiple input modes: +1. Default (no argument): compare and plot runs in the `./artifacts` directory. +2. One path to a directory containing multiple runs. +3. One path to a single profile run. +4. Multiple paths to individual profile runs (for manual comparison, assuming they are located in different directories). + +### REQ 2 Static Visualization Export + +AIPerf **MUST** support generating static visualization artifacts via `aiperf plot`: + +**Default PNG Mode:** +- By default (no flags), `aiperf plot` **MUST** generate PNG images of all default plots as specified in user configuration (`~/.aiperf/plot_config.yaml`) or system defaults. +- PNG files **MUST** be saved to `{input_path}/plot_export/` directory where `input_path` is either the directory path that the user specifies to load profiling run data from or the default `~/.aiperf/artifacts` directory if left unspecified. +- A summary text file **MUST** be included listing all generated plots. + +All formats **MUST** work with both multi-run comparisons and single-run deep-dive analysis. The mode (multi-run vs single-run) **MUST** be auto-detected based on directory structure. + +### REQ 3 Interactive Visualization Modes + +AIPerf **MUST** provide interactive visualization through two modes: HTML reports (`--html`) and live server (`--host`). Both modes **MUST** share the following core functionality: + +**Shared Interactive Features:** +- **Dynamic axis selection**: Users choose any available metric for x-axis and y-axis +- **Log scale toggles**: Enable/disable logarithmic scaling for axes +- **Run selection**: Toggle which runs to display via checkboxes +- **Plot type switching**: Switch between scatter, line, bar, box plots +- **Interactive controls**: Zoom, pan, hover for detailed values (via Plotly.js) +- **Filtering**: Apply filters on metrics and parameters +- **Export functionality**: Download current view as PNG or save configuration + +**HTML Mode (`--html`) Specific Requirements:** +- **MUST** generate a self-contained HTML file with all sweep data embedded as JSON +- All interactivity **MUST** be client-side via JavaScript (no server required after generation) +- **MUST** work completely offline +- **MUST** be saved to `{input_path}/plot_export/report.html` +- Filtering and plot regeneration **MUST** operate on embedded data using JavaScript +- Best for: sharing with team, remote environments (SLURM, K8s) + +**Server Mode (`--host`) Specific Requirements:** +- **MUST** launch a local web server (default port 8080) using Plotly Dash +- All plot regeneration and filtering **MUST** be server-side using Python +- **MUST** support complex filters with Python logic and server-side data processing +- **MUST** provide lazy loading for efficient handling of large datasets (100+ runs) +- **SHOULD** support `--watch` flag to monitor directory for new runs and update automatically +- **SHOULD** provide URL output (e.g., `http://localhost:8080`) upon successful startup +- **MUST** allow graceful shutdown via Ctrl+C +- Best for: active exploratory analysis, large sweeps, complex computations + +**Shared Requirements for Both Modes:** +- **MUST** run locally without requiring external services or internet connectivity +- **MUST** use the same shared components for loading and plotting data as PNG mode for consistency +- **MUST** respect user configuration for default plots from `~/.aiperf/plot_config.yaml` +- **MUST** support both multi-run comparison mode and single-run time-series analysis mode +- **MUST** auto-detect mode based on directory structure + +### REQ 4 Comprehensive Metric Support + +The visualization system **MUST** support all currently available metrics in AIPerf for axis selection and plotting [See full table here](#metrics) The system **MUST** automatically detect which metrics are available for given runs. + +### REQ 5 Essential Plot Types + +The system **MUST** provide the following essential visualizations: +- **Comparison plots**: Pareto curves, metric vs metric (ex: latency vs throughput), metric vs parameter (ex: output token throughput per user vs concurrency), distribution comparisons +- **Time series plots**: Per-request metrics over time + - **Future: Time slicing plots**: Per-time-slice metrics over time + +Users **SHOULD** be able to configure which plots are generated by default in both static and interactive mode. + +# Proposal + +## Plot Command + +Generates visualizations from stored results. Automatically detects whether to show multi-run comparison or single-run analysis: +```bash +# Default: Generate static PNG plots from all profiling runs in ./artifacts +aiperf plot + +# Generate PNGs from a specific subdirectory +aiperf plot subdir + +# Compare runs across multiple subdirectories +aiperf plot subdir1 subdir2 subdir3 + +# Generate interactive HTML report +aiperf plot --html + +# Launch interactive web server +aiperf plot --host + +# Analyze a single run +aiperf plot subdir/Qwen3-0.6B-concurrency8 + +# Compare specific runs +aiperf plot subdir1/Qwen3-0.6B-concurrency8 subdir2/Qwen3-0.6B-concurrency16 +``` + +### Visualization Modes + +**Default Mode (PNG):** + +Generates static PNG images of all default plots: +- Fast generation for quick preview +- Saves to `{input_path}/plot_export/` directory +- Includes all plots specified in config defaults +- Best for: easy export and sharing + +**HTML Mode (`--html`):** + +Generates self-contained interactive HTML report: +- Single HTML file with all data embedded as JSON +- Full interactivity via client-side JavaScript: + - Change x-axis/y-axis dynamically (redraws plots from embedded data) + - Toggle log scale on/off + - Toggle runs on/off via checkboxes + - Switch plot types (scatter, line, bar) + - Apply filters on embedded dataset + - Add custom plots from available metrics +- Uses Plotly.js for zoom, pan, hover interactions +- Works completely offline (no server required) +- Saves to `{input_path}/plot_export/dashboard.html` +- Best for: sharing with team, reports, remote environments (SLURM, K8s) + +**Interactive Server Mode (`--host`):** + +Launches live web dashboard with Python backend: +- Server runs on `localhost:8080` using Plotly Dash +- Full server-side flexibility: + - Recompute metrics on-the-fly + - Complex filtering with Python logic + - Lazy loading for large datasets (100+ runs) +- All Plotly.js interactions plus server-side computation +- Requires running server and network access +- Best for: active exploratory analysis on local machine, large sweeps (where the `html` with locally stored data might get large) + +### Mode Detection + +The plot command automatically detects visualization content based on input: +- **Multi-run directory** (contains multiple run subdirectories) → Comparison mode +- **Single run directory** (contains `profile_export.jsonl`) → Deep-dive mode +- **Multiple paths** (multiple directories specified) → Comparison mode + +### Multi-Run Comparison Mode + +When multiple runs are detected, generates comparison visualizations. For example: +- Pareto curve (throughput vs request latency) +- TTFT (Time to First Token) versus Average Request Throughput +- TTFT versus ITL (Inter-token Latency) + +Default plots configurable via `~/.aiperf/plot_config.yaml` (see Configuration section). + +### Single-Run Analysis Mode + +When a single run is detected, generates time-series visualizations: +- TTFT (Time to First Token) per request over time +- ITL (Inter-token Latency) per request over time +- End-to-end latency per request over time +- Throughput over time +- GPU memory and utilization over time (if available) + +Default plots configurable via `~/.aiperf/plot_config.yaml` (see Configuration section). + + +## Configuration + +Users can configure default plots and make their own presets for multi-run comparison and single-run analysis in `~/.aiperf/plot_config.yaml`: +```yaml +visualization: + # ============================================================================= + # MULTI RUN COMPARISON: Default plots when comparing multiple runs + # ============================================================================= + multi_run_defaults: + - pareto_curve_latency_vs_throughput + - ttft_vs_throughput + - ttft_vs_itl + - output_token_throughput_per_user_vs_concurrency + + # ============================================================================= + # SINGLE RUN: Default plots for analyzing one run over time + # ============================================================================= + single_run_defaults: + - ttft_over_time + - itl_over_time + - latency_over_time + - throughput_over_time + + # ============================================================================= + # MULTI RUN COMPARISON PRESETS + # ============================================================================= + multi_run_plots: + pareto_curve_latency_vs_throughput: + name: "Pareto Curve" + description: "Throughput vs latency trade-offs across concurrency levels" + x: request_latency_p50 + y: request_throughput + y_scale: log # optional + labels: concurrency + highlight_pareto: true + + ttft_vs_throughput: + name: "TTFT vs Throughput" + description: "Time to first token vs request throughput across concurrency levels" + x: ttft_p50 + y: request_throughput + labels: concurrency + + ttft_vs_itl: + name: "TTFT vs ITL" + description: "Time to first token vs inter-token latency" + x: ttft_p50 + y: inter_token_latency_p50 + labels: concurrency + + output_token_throughput_per_user_vs_concurrency: + name: "Output Token Throughput per User" + description: "Per-user output token throughput at different concurrency levels" + x: concurrency + y: output_token_throughput_per_user + labels: concurrency + + # ============================================================================= + # SINGLE RUN PRESETS (time-series over duration of profiling run) + # ============================================================================= + single_run_plots: + ttft_over_time: + name: "TTFT Over Time" + description: "Time to first token for each request" + x: request_number + y: ttft + type: scatter + rolling_window: 20 + show_legend: true + + itl_over_time: + name: "Inter-Token Latency Over Time" + description: "Inter-token latency for each request" + x: request_number + y: itl + type: scatter + rolling_window: 20 + show_legend: true + + latency_over_time: + name: "Request Latency Over Time" + description: "End-to-end request latency over time" + x: timestamp + y: request_latency + type: area + + throughput_over_time: + name: "Throughput Over Time" + description: "Request throughput over time" + x: timestamp + y: request_throughput + type: area +``` +**[Interactive Dashboard - Multi Run Comparison Mode]** +![Example: Multi Run Comparison Dashboard](./AIP-0003_images/dashboard-multirun.gif) + +**[Interactive Dashboard - Single Run Analysis]** +![Example: Single Run Time Series Dashboard](./AIP-0003_images/dashboard-singlerun.png) + +# Implementation + +## Architecture Overview + +All three visualization modes share core components for consistency and maintainability: +``` +┌────────────────────────────────────────────────────────────────────┐ +│ User Command │ +│ aiperf plot ./results [--html | --host | default] │ +└─────────────────────────────┬──────────────────────────────────────┘ + │ + ┌─────────────┴─────────────┐ + │ │ + ▼ ▼ + 1. Load Config 2. Load & Parse Data + ~/.aiperf/plot_config.yaml profile_export.jsonl + │ │ + └─────────────┬─────────────┘ + │ + ▼ + 3. Detect Mode + multi_run or single_run + │ + ▼ + 4. Get Default Plots + Based on mode + config + │ + ▼ + 5. Create Figures + Using PlotGenerator + │ + ┌─────────────┼─────────────┐ + │ │ │ + ▼ ▼ ▼ + ┌────────────┐ ┌────────┐ ┌──────────┐ + │ PNG Mode │ │ HTML │ │ Dash │ + │ (default) │ │ Mode │ │ Mode │ + └─────┬──────┘ └───┬────┘ └────┬─────┘ + │ │ │ + ▼ ▼ ▼ + Save PNGs to Save HTML to Start server + ./plot_export/ ./plot_export/ + callbacks + │ │ │ + ▼ ▼ ▼ + PNG files dashboard.html localhost:8080 + (live server) +``` + +### Shared Components + +All modes reuse the following modules: + +**DataLoader** (`plot/core/data_loader.py`): +- Parses `profile_export.jsonl` files from run directories +- Extracts metadata and metrics per request +- Aggregates statistics (p50, p99, averages) +- Detects swept parameters by comparing runs +- Returns standardized data structure + +**PlotGenerator** (`plot/core/plot_generator.py`): +- Creates Plotly Figure objects for all plot types +- Implements pareto curves, bar charts, time series, heatmaps +- Shared by all modes (PNG converts to image, HTML converts to JSON, Dash returns directly) + +**PlotConfig** (`plot/core/config.py`): +- Loads user configuration from `~/.aiperf/plot_config.yaml` +- Merges with system defaults +- Provides plot settings based on mode (multi_run vs single_run) + +## PNG Mode Implementation + +The PNG generator retrieves default plots from user configuration, then uses the shared PlotGenerator to create Plotly Figure objects for each plot. These figures are exported to PNG format using static image rendering and saved to the output directory. A summary text file is also generated listing all created plots with their file paths. + +**Key characteristics**: +- One-time execution that completes and exits +- Uses `kaleido` library for high-quality static image export + +## HTML Mode Implementation + +The HTML generator embeds the complete dataset as JSON within a single HTML file. It converts default Plotly figures to JSON specifications and generates JavaScript functions that can manipulate this embedded data to regenerate plots based on user interactions. The resulting HTML file contains all data, Plotly.js library (via CDN), and event handlers for UI controls (dropdowns, checkboxes). When users interact with controls, JavaScript functions filter the embedded data and call Plotly.js to redraw plots entirely in the browser. + +**Key characteristics**: +- All computation happens at generation time (Python) and interaction time (JavaScript) +- No server required after HTML file is created +- File size scales with data volume +- Uses Jinja2 templates for HTML structure +- Self-contained + +## Interactive Hosted Mode: Plotly Dash + +**Framework Choice**: Plotly Dash + +**Rationale**: +- Pure Python implementation maintains consistency with AIPerf codebase +- Callback architecture provides explicit event handling aligned with plot regeneration model +- Full customization enables AIPerf-specific visualizations and layouts +- Flask-based foundation ensures production-ready local server +- Efficient update mechanism only reruns changed callbacks, not entire application + +**Alternatives Rejected:** +- **Streamlit** - Lacks fine-grained control over layout and callbacks; more suited for simple apps than complex multi-plot dashboards with dynamic metric selection +- **Grafana** - Requires database integration and persistent server setup; designed for real-time monitoring rather than file-based post-profiling analysis +- **Gradio** - Focused on ML model I/O interfaces; insufficient charting capabilities and not designed for comprehensive data visualization dashboards +- **TensorBoard** - Tailored for training metrics and neural network visualization; requires specific logging format and not suitable for general performance profiling data +- **WanDB** - Requires external service, internet connectivity, and account authentication; violates self-contained local execution requirement + +The Dash implementation loads data once at server startup using shared DataLoader, then defines a layout with sidebar controls and main plotting area. Default plots from user configuration are displayed in tabs. Python callbacks are registered for each plot that respond to user interactions (run selection, axis changes, filters). When triggered, callbacks use the shared PlotGenerator to create updated Plotly figures with new parameters, which are then serialized to JSON and sent to the browser where Plotly.js renders them. + +**Key characteristics**: +- Server process runs continuously until manually stopped +- Data kept in server memory for fast access +- All plot regeneration happens server-side in Python +- Callbacks can perform complex computations not feasible in browser JavaScript (large dataset filtering, on-demand metric derivation, pareto optimization) +- Supports lazy loading and streaming for very large datasets +- Uses Dash Bootstrap Components for UI styling + +## Metrics for Visualization + +The plot command must support all metrics available in AIPerf for x-axis and y-axis selection. The system automatically detects which metrics are available in the provided result files by parsing results.json and results.csv. + +### Metrics + +| Category | Metric Name | Display Name / Header | Unit | Source | Percentiles Available | +|----------|------------|----------------------|------|--------|----------------------| +| **Latency Metrics** | `time_to_first_token` | Time to First Token (TTFT) | milliseconds | +| | `inter_token_latency` | Inter Token Latency (ITL) | milliseconds | +| | `request_latency` | Request Latency (E2E) | milliseconds | +| | `time_to_first_output_token` | Time to First Output Token (TTFO) | milliseconds | +| | `time_to_second_token` | Time to Second Token (TTST) | milliseconds | +| | `inter_chunk_latency` | Inter Chunk Latency (ICL) | milliseconds | +| | `credit_drop_latency` | Credit Drop Latency | milliseconds | +| | `stream_setup_latency` | Stream Setup Latency | milliseconds | +| | `stream_prefill_latency` | Stream Prefill Latency | milliseconds | +| **Throughput Metrics** | `request_throughput` | Request Throughput | requests/sec | avg only | +| | `output_token_throughput` | Output Token Throughput | tokens/sec | avg only | +| | `output_token_throughput_per_user` | Output Token Throughput Per User | tokens/sec/user | +| | `prefill_throughput` | Prefill Throughput | tokens/sec | avg only | +| | `goodput` | Goodput | tokens/sec | avg only | +| **Sequence Length Metrics** | `input_sequence_length` | Input Sequence Length (ISL) | tokens | +| | `output_sequence_length` | Output Sequence Length (OSL) | tokens | +| | `total_isl` | Total Input Sequence Length | tokens | sum | +| | `total_osl` | Total Output Sequence Length | tokens | sum | +| | `error_isl` | Error Input Sequence Length | tokens | +| | `total_error_isl` | Total Error Input Sequence Length | tokens | sum | +| **Token Count Metrics** | `output_token_count` | Output Token Count | tokens | sum | +| | `reasoning_token_count` | Reasoning Token Count | tokens | +| | `total_reasoning_tokens` | Total Reasoning Tokens | tokens | sum | +| **Request Count Metrics** | `request_count` | Request Count | count | total | +| | `good_request_count` | Good Request Count | count | total | +| | `error_request_count` | Error Request Count | count | total | +| **Efficiency Metrics** | `thinking_efficiency` | Thinking Efficiency | percent | +| | `overall_thinking_efficiency` | Overall Thinking Efficiency | percent | avg only | +| **Time Metrics** | `benchmark_duration` | Benchmark Duration | seconds | total | +| | `min_request_timestamp` | Minimum Request Timestamp | nanoseconds | min | +| | `max_response_timestamp` | Maximum Response Timestamp | nanoseconds | max | +| **GPU Telemetry Metrics** | `gpu_power_usage` | GPU Power Usage | watts | +| | `power_management_limit` | GPU Power Limit | watts | +| | `energy_consumption` | Energy Consumption | megajoules | +| | `gpu_utilization` | GPU Utilization | percent | +| | `memory_copy_utilization` | Memory Copy Utilization | percent | +| | `gpu_memory_used` | GPU Memory Used | gigabytes | +| | `gpu_memory_free` | GPU Memory Free | gigabytes | +| | `gpu_memory_total` | GPU Memory Total | gigabytes | +| | `sm_clock_frequency` | SM Clock Frequency | megahertz | +| | `memory_clock_frequency` | Memory Clock Frequency | megahertz | +| | `memory_temperature` | Memory Temperature | celsius | +| | `gpu_temperature` | GPU Temperature | celsius | +| | `xid_errors` | XID Errors | count | +| | `power_violation` | Power Violation | microseconds | +| | `thermal_violation` | Thermal Violation | microseconds | +| **Configuration Parameters** | `concurrency` | Concurrency | count | +| | `input_seq_len` | Input Sequence Length Config | tokens | +| | `output_seq_len` | Output Sequence Length Config | tokens | +| | `duration` | Benchmark Duration Config | seconds | + +# Alternate Solutions + +## Alt 1: Single Command (Sweep + Auto-Visualize) + +Automatically generate visualizations after sweep completes. + +**Pros:** +- One command for everything +- Immediate results + +**Cons:** +- Sweeps can take hours; users may want to visualize later +- Cannot re-visualize with different settings without re-running sweep +- Not flexible for CI/CD + +**Reason Rejected:** +Separate commands provide better control and avoid re-running expensive profiling operations when users only want different visualizations or running visualizations when users only want raw profiling exports. + +## Alt 2: External Tools (TensorBoard/WanDB) + +Use existing visualization platforms. + +**Pros:** +- No maintenance burden for UI/UX integrations. +- Feature-rich + +**Cons:** +- WanDB needs login and WiFi access +- Tensorboard is more specific to ML applications +- Less control over AIPerf-specific visualizations + +## Alt 3: Jupyter Notebook as Primary Interface + +Generate Jupyter notebooks with embedded plots and analysis code. + +**Pros:** +- Familiar to data scientists +- Reproducible analysis +- Can add custom analysis cells +- Interactive exploration with code + +**Cons:** +- Requires Jupyter installation and understanding +- Heavier dependencies +- Mixing code execution with visualization complicates simple viewing + +**Reason Rejected:** +While valuable as an optional export format, making it the primary interface excludes too many valid use cases (quick viewing, sharing with non-technical stakeholders, automated pipelines). Prioritizing PNG/HTML/Dash covers broader user base. + +# Future Enhancements + +## Connector with Dynamo Metrics +Access KVBM and Dynamo metrics through the Dynamo metrics endpoint. + +## 3D Plots +Having 3 configurable axes. For example, other metrics can change with changing OSL/ISL. + +## Visualization Configuration Export from Interactive HTML or Hosted Dashboard + +**Description**: Allow users to export their interactive HTML session or hosted dashboard settings as an AIPerf configuration file. + +**Workflow**: +1. User opens HTML report or hosted dashboard and explores visualizations interactively +2. User adjusts settings to their preferences: + - Selects which plots to display + - Configures axes for custom plots + - Sets filters and plot types + - Chooses color schemes and layout options +3. User clicks "Export Configuration" button in HTML interface +4. Browser downloads a `plot_config.yaml` file containing current settings +5. User can save this configuration for reuse: +```bash + # Option 1: Save to default location + mv plot_config.yaml ~/.aiperf/plot_config.yaml + + # Option 2: Place in project directory and pass as an argument + mv plot_config.yaml ./my_project/.plot_config.yaml + aiperf plot ./results --config ./my_project/.plot_config.yaml +``` + +**Benefits**: +- Discover preferred visualizations through exploration rather than writing YAML manually +- Share team-standard visualization configurations +- Maintain project-specific visualization templates +- Quickly reproduce previous analysis views + +**Implementation Notes**: +- HTML JavaScript captures current UI state (selected plots, axes, filters) +- Generates valid YAML matching AIPerf config schema +- Includes comments explaining each setting +- Can be merged with existing user config or used standalone + +**Example Generated Config**: +```yaml +# Generated from interactive HTML session on 2025-10-22 +visualization: + multi_run_defaults: + - pareto_curve + - custom_efficiency_plot + + multi_run_plots: + custom_efficiency_plot: + name: "Efficiency Analysis" + x: gpu_memory_used + y: throughput + type: scatter + color_by: concurrency +``` + +This feature bridges the gap between exploratory analysis (HTML mode) and repeatable automated workflows (config-driven visualization). \ No newline at end of file diff --git a/deps/AIP-0003_images/dashboard-multirun.gif b/deps/AIP-0003_images/dashboard-multirun.gif new file mode 100644 index 00000000..86da71d6 Binary files /dev/null and b/deps/AIP-0003_images/dashboard-multirun.gif differ diff --git a/deps/AIP-0003_images/dashboard-singlerun.png b/deps/AIP-0003_images/dashboard-singlerun.png new file mode 100644 index 00000000..68b8bbe5 Binary files /dev/null and b/deps/AIP-0003_images/dashboard-singlerun.png differ