src/aiu_trace_analyzer/pipeline/rcu_utilization.py: average PT active… by tirtha-ibm · Pull Request #83 · IBM/aiu-trace-analyzer

tirtha-ibm · 2026-02-06T10:13:28Z

… information with autopilot on in torch profiler

… information with autopilot on in torch profiler Signed-off-by: Tirtha Biswas <Tirtha.Biswas@ibm.com>

lasch

Thanks for your contribution. A few comments to address please:

currently it's failing the flake8 test with an unused global variable
there are more flake8 issues reported:

./src/aiu_trace_analyzer/pipeline/flow_launch.py:71:9: E265 block comment should start with '# '
./src/aiu_trace_analyzer/pipeline/rcu_utilization.py:353:11: E271 multiple spaces after keyword
./src/aiu_trace_analyzer/pipeline/rcu_utilization.py:702:1: E305 expected 2 blank lines after class or function definition, found 1
./src/aiu_trace_analyzer/pipeline/rcu_utilization.py:704:1: C901 'compute_utilization' is too complex (18)
./src/aiu_trace_analyzer/pipeline/rcu_utilization.py:704:1: E302 expected 2 blank lines, found 0
./src/aiu_trace_analyzer/pipeline/rcu_utilization.py:710:121: E501 line too long (140 > 120 characters)
./src/aiu_trace_analyzer/pipeline/rcu_utilization.py:731:13: F824 `global jobhash_map` is unused: name is never assigned in scope
./src/aiu_trace_analyzer/pipeline/rcu_utilization.py:753:5: E303 too many blank lines (2)
./src/aiu_trace_analyzer/pipeline/rcu_utilization.py:758:16: E111 indentation is not a multiple of 4

additional comments below

This is a welcome addition and feature. I'm happy to work this out directly.

lasch · 2026-02-06T16:34:48Z

src/aiu_trace_analyzer/pipeline/flow_launch.py

        else:
            aiulog.log(aiulog.WARN, "FLOWS: Ignoring event with ts after schedule wait", event)
-        assert self.queues[qid]["last_ts"] <= sched_wait_end, f"{qid}: {event}, {self.queues[qid]}"
+        #assert self.queues[qid]["last_ts"] <= sched_wait_end, f"{qid}: {event}, {self.queues[qid]}"


Is there a reason for commenting out this check? Did you encounter traces where it failed?

lasch · 2026-02-06T16:39:42Z

src/aiu_trace_analyzer/pipeline/rcu_utilization.py

    _autopilot_pattern = re.compile(r'DSM-AutoPilot BEGIN')
    _iteration_mode_pattern = re.compile(r'^\s+(DECODING|PREFILL)\s+$')
    _non_kernel_names = ["Total"]
+    _total = re.compile(r'Total     ')


Please follow the pattern of the other variable names and use _total_pattern

lasch · 2026-02-06T16:54:18Z

src/aiu_trace_analyzer/pipeline/rcu_utilization.py

+jobhash_map: dict[int, bool] = {}
+processID: int = 0


Global variables should be avoided here. The context classes should be used to hold 'global' state for the functions of the pipeline.

lasch · 2026-02-06T17:07:08Z

src/aiu_trace_analyzer/pipeline/rcu_utilization.py

+        if event["name"] != "aiuScheduleWait":
+            return [event]
    try:
-        ideal_dur = float(context.get_ideal_dur(kernel_name, pid, job_fingerprint))
+        if autopilot and event["name"] == "aiuScheduleWait":
+            global jobhash_map
+            if event["args"]["jobhash"] not in jobhash_map:
+                ideal_dur = list(totals)[1]
+                jobhash_map[event["args"]["jobhash"]] = True
+            else:
+                ideal_dur = list(totals)[0]
+        else:
+            ideal_dur = float(context.get_ideal_dur(kernel_name, pid, job_fingerprint))


This functionality should be baked into the context. I suggest the following rough idea if autopilot is enabled:

we add the 'total' cycles to the corresponding table structure as a 'mock-up' kernel name: e.g. 'supernode_kernel'

extract_kernel_from_event_name returns the 'supernode_kernel' as the name to lookup

the rest should then fall into place automatically

tirtha-ibm requested review from WarningRan, eklee15, lasch, mcalman and yuhaohaoyu as code owners February 6, 2026 10:13

tirtha-ibm force-pushed the autopilot branch from ba27533 to baa74ba Compare February 6, 2026 10:25

src/aiu_trace_analyzer/pipeline/rcu_utilization.py: average PT active…

baa74ba

… information with autopilot on in torch profiler Signed-off-by: Tirtha Biswas <Tirtha.Biswas@ibm.com>

lasch requested changes Feb 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src/aiu_trace_analyzer/pipeline/rcu_utilization.py: average PT active…#83

src/aiu_trace_analyzer/pipeline/rcu_utilization.py: average PT active…#83
tirtha-ibm wants to merge 1 commit intoIBM:mainfrom
tirtha-ibm:autopilot

tirtha-ibm commented Feb 6, 2026

Uh oh!

lasch left a comment

Uh oh!

lasch Feb 6, 2026

Uh oh!

lasch Feb 6, 2026

Uh oh!

lasch Feb 6, 2026

Uh oh!

lasch Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tirtha-ibm commented Feb 6, 2026

Uh oh!

lasch left a comment

Choose a reason for hiding this comment

Uh oh!

lasch Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

lasch Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

lasch Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

lasch Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants