Skip to content

Comments

Refactor aicpu logical#95

Merged
poursoul merged 2 commits intoChaoWao:mainfrom
poursoul:refactor-aicpu-logical
Feb 14, 2026
Merged

Refactor aicpu logical#95
poursoul merged 2 commits intoChaoWao:mainfrom
poursoul:refactor-aicpu-logical

Conversation

@poursoul
Copy link
Collaborator

  • Replace std::mutex with spinlocks and bitmask circular queues in
    AICPU scheduler ready queues; remove redundant atomic counters
  • Add orchestrator ready queue (SPSC ring) to push early-return ready
    tasks directly, replacing O(N) readiness scan in scheduler
  • Stack-allocate tensormap lookup results (PTO2LookupResult) to avoid
    per-lookup heap allocation from std::vector
  • Inline Tensor copy ctor/operator= into header; skip memset on task
    ring slot allocation; bulk memcpy params in submit_task
  • Add PTO2_SPIN_PAUSE_LIGHT() (yield without sched_yield) for
    tight spinloops on aarch64/x86_64
  • Add DEV_ALWAYS log level for diagnostics; change diagnostic reports
    from DEV_ERROR to DEV_ALWAYS, per-task logs from DEV_INFO to DEV_DEBUG
  • Add profiling instrumentation to scheduler, orchestrator, and host
    runtime init
  • Vectorize paged attention golden computation across batch dimension
  • 修复64batch卡死的问题,暂时将window大小改成65536,后面再支持task ring的环形分配
  • replace pa golden

- Replace std::mutex with spinlocks and bitmask circular queues in
  AICPU scheduler ready queues; remove redundant atomic counters
- Add orchestrator ready queue (SPSC ring) to push early-return ready
  tasks directly, replacing O(N) readiness scan in scheduler
- Stack-allocate tensormap lookup results (PTO2LookupResult) to avoid
  per-lookup heap allocation from std::vector
- Inline Tensor copy ctor/operator= into header; skip memset on task
  ring slot allocation; bulk memcpy params in submit_task
- Add PTO2_SPIN_PAUSE_LIGHT() (yield without sched_yield) for
  tight spinloops on aarch64/x86_64
- Add DEV_ALWAYS log level for diagnostics; change diagnostic reports
  from DEV_ERROR to DEV_ALWAYS, per-task logs from DEV_INFO to DEV_DEBUG
- Add profiling instrumentation to scheduler, orchestrator, and host
  runtime init
- Vectorize paged attention golden computation across batch dimension
@poursoul poursoul force-pushed the refactor-aicpu-logical branch from b43ff97 to d372c32 Compare February 14, 2026 07:02
@poursoul poursoul merged commit a0bd37d into ChaoWao:main Feb 14, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants