Skip to content

Commit e59226b

Browse files
authored
bugfix: add driver support to CUPTI benchmark function, issue #2145 (#2154)
<!-- .github/pull_request_template.md --> ## 📌 Description <!-- What does this PR do? Briefly describe the changes and why they’re needed. --> ## 🔍 Related Issues <!-- Link any related issues here --> ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [ ] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [ ] I have installed the hooks with `pre-commit install`. - [ ] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [ ] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes <!-- Optional: anything you'd like reviewers to focus on, concerns, etc. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * GPU timing now captures driver-level activities alongside runtime and kernel events for more complete timing. * Activity records include richer metadata to improve event correlation and reporting. * CUPTI measurement window adjusted to ensure driver activity is collected during profiling. * **Bug Fixes** * Improved filtering and aggregation so collected activities are correlated and reported more accurately. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
1 parent dc37789 commit e59226b

File tree

1 file changed

+18
-5
lines changed

1 file changed

+18
-5
lines changed

flashinfer/testing/utils.py

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -740,7 +740,7 @@ def func_buffer_requested():
740740
return buffer_size, max_num_records
741741

742742
def func_buffer_completed(
743-
launches: list[tuple[float, float, int]],
743+
launches: list[tuple[float, float, int, int, int]],
744744
kernels: list[tuple[str, float, float, int]],
745745
activities: list,
746746
):
@@ -755,9 +755,20 @@ def func_buffer_completed(
755755
activity.correlation_id,
756756
)
757757
)
758-
elif activity.kind == cupti.ActivityKind.RUNTIME:
759-
# Runtime activity
760-
launches.append((activity.start, activity.end, activity.correlation_id))
758+
elif activity.kind in (
759+
cupti.ActivityKind.RUNTIME,
760+
cupti.ActivityKind.DRIVER,
761+
):
762+
# Runtime or Driver activity
763+
launches.append(
764+
(
765+
activity.start,
766+
activity.end,
767+
activity.correlation_id,
768+
activity.cbid,
769+
activity.kind,
770+
)
771+
)
761772

762773
if l2_flush:
763774
l2_flush_size = int(l2_flush_size_mb) * 1024 * 1024
@@ -815,11 +826,12 @@ def func_buffer_completed(
815826
torch.cuda.synchronize()
816827

817828
# CUPTI measurement
818-
launches: list[tuple[float, float, int]] = []
829+
launches: list[tuple[float, float, int, int, int]] = []
819830
kernels: list[tuple[str, float, float, int]] = []
820831
iter_timestamps = []
821832
cupti.activity_enable(cupti.ActivityKind.RUNTIME)
822833
cupti.activity_enable(cupti.ActivityKind.CONCURRENT_KERNEL)
834+
cupti.activity_enable(cupti.ActivityKind.DRIVER)
823835
cupti.activity_register_callbacks(
824836
func_buffer_requested, partial(func_buffer_completed, launches, kernels)
825837
)
@@ -836,6 +848,7 @@ def func_buffer_completed(
836848
cupti.activity_flush_all(0)
837849
cupti.activity_disable(cupti.ActivityKind.RUNTIME)
838850
cupti.activity_disable(cupti.ActivityKind.CONCURRENT_KERNEL)
851+
cupti.activity_disable(cupti.ActivityKind.DRIVER)
839852
cupti.finalize()
840853

841854
# Process activities

0 commit comments

Comments
 (0)