Skip to content

AMD-AGI/TraceLens

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TraceLens

TraceLens is a Python library focused on automating analysis from trace files and enabling rich performance insights. Designed with simplicity and extensibility in mind, this library provides tools to simplify the process of profiling and debugging complex distributed training and inference systems. Find the PyTorch Conference 2025 poster for TraceLens here.

Key Features

Hierarchical Performance Breakdowns: Pinpoint bottlenecks with a top-down view, moving from the overall GPU timeline (idle/busy) to operator categories (e.g., convolutions), individual operators, and right down to unique argument shapes.

⚙️ Compute & Roofline Modeling: Automatically translate raw timings into efficiency metrics like TFLOP/s and TB/s for popular operations. Determine if an op is compute- or memory-bound and see how effectively your code is using the hardware.

🔗 Multi-GPU Communication Analysis: Accurately diagnose scaling issues by dissecting collective operations. TraceLens separates pure communication time from synchronization skew and calculates effective bandwidth on your real workload, not a synthetic benchmark.

🔄 Trace Comparison: Quantify the impact of your changes with powerful trace diffing. By analyzing performance at the CPU dispatch level, TraceLens enables meaningful side-by-side comparisons across different hardware and software versions.

▶️ Event Replay: Isolate any operation for focused debugging. TraceLens generates minimal, self-contained replay scripts directly from trace metadata, making it simple to share IP-safe test cases with kernel developers.

🔧 Extensible SDK: Get started instantly with ready-to-use scripts, then build your own custom workflows using a flexible and hackable Python API.

Quick Start

Installation

1. Install TraceLens directly from GitHub:

pip install git+https://github.com/AMD-AGI/TraceLens.git

2. Command Line Scripts for popular analyses

  • Generate Excel Reports from Traces Detailed docs here (you can use compressed traces too such as .zip and .gz)
# PyTorch profiler traces
TraceLens_generate_perf_report_pytorch --profile_json_path path/to/your/trace.json

# rocprofv3 traces
TraceLens_generate_perf_report_rocprof --profile_json_path path/to/results.json
  • Compare Traces Detailed docs here
TraceLens_compare_perf_reports_pytorch \
    baseline.xlsx \
    candidate.xlsx \
    --names baseline candidate \
    --sheets all \
    -o comparison.xlsx
  • Generate Collective Performance Report Detailed docs here
TraceLens_generate_multi_rank_collective_report_pytorch \
    --trace_dir /path/to/traces \
    --world_size 8 \

Refer to the individual module docs in the docs/ directory and the example notebooks under examples/ for further guidance.

📦 Custom Workflows: Check out examples/custom_workflows/ for community-contributed utilities including roofline_analyzer and traceMap — powerful tools we're working on integrating more tightly into the core library.

Supported Profile Formats

TraceLens supports multiple profiling formats:

Format Tool Documentation
PyTorch torch.profiler docs/generate_perf_report.md
JAX XPlane protobuf docs/generate_perf_report_jax.md
rocprofv3 AMD ROCm rocprofiler-sdk docs/generate_perf_report_rocprof.md

rocprofv3 Support

TraceLens now supports AMD's rocprofv3 (rocprofiler-sdk) JSON format:

# Generate performance report from rocprofv3 trace
TraceLens_generate_perf_report_rocprof \
    --profile_json_path trace_results.json \
    --short_kernel_study \
    --kernel_details

Features:

  • GPU timeline breakdown (kernel, memory, idle time)
  • Kernel summary with statistical analysis
  • Automatic kernel categorization (GEMM, Attention, Elementwise, etc.)
  • Short kernel analysis
  • Grid/block dimension tracking

See docs/generate_perf_report_rocprof.md for detailed usage.

Contributing

We welcome issues, bug reports, and pull requests. Feel free to open discussions in the GitHub repository or contribute new performance models, operator mappings or analysis modules. Please see CONTRIBUTING.md for guidelines.

About

Automating analysis from trace files

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 23

Languages