Single-Hart O(1) Enhancement #23

vicLin8712 · 2025-09-18T12:38:11Z

This is still a draft. I’ve implemented O(1) task selection using a
priority bitmap and per-priority queues. Key design points:

mo_task_spawn() enqueues new tasks, except the very first one which
is assigned to task_current with state TASK_RUNNING.
sched_select_next_task() reinserts the running task into its ready
queue, then scans ready_bitmap to select the highest-priority ready
queue in O(1).

Could you confirm if this approach aligns with the project’s design
expectations?
In particular, I’d like feedback on:

The use of the current sched_t data structure.
Whether revisions to sched_t or kcb would be preferred to better
support future SMP design.

jserv

Validate with proper statistics.

visitorckw · 2025-09-18T13:19:21Z

include/sys/task.h


+/* Scheduler attribution */
+typedef struct sched {
+    volatile uint32_t ready_bitmap; /* 8-bit priority bitmap */


Why is this variable set to volatile?

ISR might modify this value, so I added volatile to ensure bitmap always read directly from memory.

I'm a bit doubtful whether this is the right approach, since it sounds like you're using volatile to handle a synchronization problem. Even with the volatile keyword, accesses to the variable are not guaranteed to be atomic, so a race condition can still occur. Should we reconsider how synchronization is handled here? Maybe something like CRITICAL_ENTER/LEAVE() would be more appropriate.

Reference: https://docs.kernel.org/process/volatile-considered-harmful.html

I appreciate your feedback and agree with your point of view. CRITICAL_ENTER()/LEAVE() method is a better approach to protect bitmap correctness. I'll make sure the bitmap manipulated in the critical section.

visitorckw · 2025-09-28T18:13:43Z

kernel/task.c

        CRITICAL_LEAVE();
        return ERR_TASK_CANT_SUSPEND;
    }
-


Please avoid adding or removing blank lines unnecessarily.

visitorckw

Please rebase your branch to keep the git history clean.

visitorckw · 2025-09-28T18:18:22Z

include/sys/task.h


+/* Scheduler attribution */
+typedef struct sched {
+    uint32_t ready_bitmap; /* 8-bit priority bitmap */


Hmm, the comment says this is an 8-bit bitmap, but it's declared as uint32_t?

Will there be more priority level extensions in the future? If not, I'll modify this declaration.

I think if it's necessary in the future, we can easily come back and modify this.

visitorckw · 2025-09-28T18:27:49Z

Regarding your question in the PR description about the use of sched_t and the changes to kcb_t:

I'm not sure why you split kcb_t and sched_t into two separate structures, and I don't see an explanation for this in the commit message. From my perspective, both seem to store data needed by the scheduler, though perhaps you had other reasons or considerations in mind?

vicLin8712 · 2025-09-30T06:17:50Z

Hi, @visitorckw,

The reason I split data structure sched from kcb is that I thought there must be a major control block that maintains different hart ID tasks. So I plan that sched will control per-hart tasks and kcb will play the major controller role.
It seems that you hope I only focus on modifying kcb to satisfy the O(1) scheduler, but without further multi-hart consideration?

Introduce sched_t structure to encapsulate O(1) scheduler state. sched_t maintains ready_queues separated by priority, enabling task selection in O(1) complexity. kcb now contains a sched_t instance, allowing per-hart scheduler control and future SMP extensions.

New tasks created by `mo_task_spawn()` are enqueued into their priority ready queue. The very first task is special-cased: it becomes `task_current` and its state is set to `TASK_RUNNING` (not enqueued). `sched_select_next_task()` first returns the running task to its ready queue, then finds the highest non-empty priority by scanning the `ready_bitmap`, pops one node from that queue, and assigns it to `task_current`. The corresponding priority bit position in bitmap will be cleared when the queue becomes empty. This reduces selection from O(n) to O(1) and remains existing function calls.

`sched_enqueue_task`/`sched_dequeue_task` also put/remove task from corresponding prior queue and check the length of queue after put/remove task to setup bitmap. bitmap marco used here for simple operation purpose, it can be modified for the further development. The enqueue and dequeue process in `mo_task_spawn`/`sched_select_next_task` are all replaced by `sched_enqueue_task` and `sched_dequeue_task`.

visitorckw · 2025-10-01T03:58:26Z

I don't have a strong opinion on the implementation, but I have a question about the design.

It appears there might be some redundancy. sched_t stores a per-hart ready_queue, while kcb_t retains the tasks list and task_current. When I looked for an explanation for this separation, I couldn't find one.

Could we clarify the design? It seems we should clearly define what data is per-hart and belongs in sched_t, versus what data is shared among all harts and should be in kcb_t.

visitorckw · 2025-10-01T04:02:20Z

This design also raises a question about task affinity. If we add a task to a specific hart's queue, is the intention for it to be bound to that hart, or is there a mechanism for tasks to be rebalanced across different hart queues later?

vicLin8712 · 2025-10-02T03:00:49Z

Hi @visitorckw ,

Thanks for your feedback, and let me clarify my original design.

kcb plays the role that holding a list that contains whole tasks for simple task tracing and data collection, for example, find_task_by_id and such API can still work. Also, task_current remains in kcb for context switch, which means all new tasks' node will be updated by the scheduler here. I will modify task_current as an array for multi-hart utilization.

sched maintains the per-hart ready queue and corresponding queue information. The reason why it split from kcb is to avoid frequent access kcb when the ready queue might be removed/new tasks added. Also, separate sched can be better for information tracing based on different hearts.

Take one heart as an example, below diagram illustrates my design more clearly:

visitorckw · 2025-10-02T03:13:00Z

kcb plays the role that holding a list that contains whole tasks for simple task tracing and data collection, for example, find_task_by_id and such API can still work. Also, task_current remains in kcb for context switch, which means all new tasks' node will be updated by the scheduler here. I will modify task_current as an array for multi-hart utilization.

But since we expect to modify current_task in kcb_t into an array where each hart has its own entry, would it be more like per-hart information that belongs in sched_t? I do see that you have indeed added a current_task field in sched_t, but what are the respective roles of it and task_current in kcb_t?

vicLin8712 · 2025-10-02T03:15:07Z

This design also raises a question about task affinity. If we add a task to a specific hart's queue, is the intention for it to be bound to that hart, or is there a mechanism for tasks to be rebalanced across different hart queues later?

Yes, it do a performance issue. I plan tasks bound into each hart. And I will added pre_heart ID into tcb to solve task affinity between task state transition.

For rebalanced problem across different hart, I think I will use IPI mechnism to rearrange tasks belonging. But the implementation detail still need be discussed.

Currently, I'm focus on experiment design to prove new design is available and better than current O(n) RR scheduler. And if you think my ideal is achieveable, I'll keep working on validation and will revise code based on what we have dissussed previously.

vicLin8712 · 2025-10-02T03:22:23Z

kcb plays the role that holding a list that contains whole tasks for simple task tracing and data collection, for example, find_task_by_id and such API can still work. Also, task_current remains in kcb for context switch, which means all new tasks' node will be updated by the scheduler here. I will modify task_current as an array for multi-hart utilization.

But since we expect to modify current_task in kcb_t into an array where each hart has its own entry, would it be more like per-hart information that belongs in sched_t? I do see that you have indeed added a current_task field in sched_t, but what are the respective roles of it and task_current in kcb_t?

You are correct, I tried to context switch based each hart, which means I won't access kcb in per-hart context switch. But I found in #6, your design is more backward compatible and more efficient. I'll modify tcb and kcb later.

visitorckw · 2025-10-02T03:29:21Z

kcb plays the role that holding a list that contains whole tasks for simple task tracing and data collection, for example, find_task_by_id and such API can still work. Also, task_current remains in kcb for context switch, which means all new tasks' node will be updated by the scheduler here. I will modify task_current as an array for multi-hart utilization.

I'm still not sure if it's necessary to maintain separate task lists in both kcb_t and sched_t.

Consider an API like find_task_by_id(), which needs to iterate over every task across all harts. To do this safely, we must prevent any hart from modifying its ready queue during the iteration to avoid race conditions. This means we would need a mechanism to lock all queues anyway.

If that's the case, we can simply iterate through each per-hart list from sched_t without needing the global list in kcb_t, like this:

for (i = 0; i < hart_nums; i++) {
    list_for_each(...) {
        if (node->id == id) {
             return node;
        }
    }
}

This approach seems to work. Am I missing a more complex use case that justifies keeping the global task list in kcb_t?

vicLin8712 · 2025-10-02T05:31:49Z

kcb plays the role that holding a list that contains whole tasks for simple task tracing and data collection, for example, find_task_by_id and such API can still work. Also, task_current remains in kcb for context switch, which means all new tasks' node will be updated by the scheduler here. I will modify task_current as an array for multi-hart utilization.

I'm still not sure if it's necessary to maintain separate task lists in both kcb_t and sched_t.

Consider an API like find_task_by_id(), which needs to iterate over every task across all harts. To do this safely, we must prevent any hart from modifying its ready queue during the iteration to avoid race conditions. This means we would need a mechanism to lock all queues anyway.

If that's the case, we can simply iterate through each per-hart list from sched_t without needing the global list in kcb_t, like this:
for (i = 0; i < hart_nums; i++) {
    list_for_each(...) {
        if (node->id == id) {
             return node;
        }
    }
}
This approach seems to work. Am I missing a more complex use case that justifies keeping the global task list in kcb_t?

I gauss that you mean the list in kcb, which records whole tasks can be separated into each sched. And sched also has list, ready_queue to maintain whole task in this hart with state TASK_RUNNING and TASK_READY? like

kcb_t->task_list // Redundant? Split into local_hart_task_list.
sched_t{
    list local_hart_task_list;    // Originally maintained by kcb->task_list, now split and maintained by local_hart_task_list
    list local_hart_ready_queue;    // Ready queue with tasks state of TASK_RUNNING and TASK_READY
    ...
}

visitorckw · 2025-10-02T07:58:39Z

I'm just thinking that each task should only appear in one list at a time, and never be duplicated across two different lists.

For running and ready tasks, using a per-hart data structure seems appropriate, as it would be efficient for the scheduler on each hart.

I'm less certain about tasks in other states. It's unclear to me whether placing them in a single global list or in per-hart lists would be more suitable. This decision likely depends on how we intend to handle task affinity.

vicLin8712 · 2025-10-03T03:36:34Z

I'm just thinking that each task should only appear in one list at a time, and never be duplicated across two different lists.

For running and ready tasks, using a per-hart data structure seems appropriate, as it would be efficient for the scheduler on each hart.

I'm less certain about tasks in other states. It's unclear to me whether placing them in a single global list or in per-hart lists would be more suitable. This decision likely depends on how we intend to handle task affinity.

Yes, but I think there still needs other list to handle remaining tasks with different task states. And the ready queue needs external node to linked which task state is READY or RUNNING.

I will suggest kcb hold whole tasks rather than per-hart because the tasks will be linked in ascending tid in this list, while tid will be disordered in each hart task list. For example, 1->2->3->4->5->6->7 might be split as 1->2, 5->6, 3->7, 4 on different harts.

In order to solve task affinity, I think maybe we can add last_exe_hart_id into tcb?

visitorckw · 2025-10-03T03:56:12Z

Yes, but I think there still needs other list to handle remaining tasks with different task states. And the ready queue needs external node to linked which task state is READY or RUNNING.

I thought a running task would not appear in the ready queue, but instead each hart would keep track of its own currently running task node. Sorry if I misunderstood your description — could you clarify or correct my misunderstanding?

I will suggest kcb hold whole tasks rather than per-hart because the tasks will be linked in ascending tid in this list, while tid will be disordered in each hart task list. For example, 1->2->3->4->5->6->7 might be split as 1->2, 5->6, 3->7, 4 on different harts.

I am not necessarily against this, but I don't see why we need to keep the task id in sorted order or what benefit we gain from it. Do we have any feature that depends on maintaining this ordering?

In order to solve task affinity, I think maybe we can add last_exe_hart_id into tcb?

No, I didn't mean adding a feature that lets the user specify which hart a task must run on — that should be relatively easy. What I meant is that the scheduler decides when a task will be placed onto a particular hart's ready queue, which determines which hart it will run on.

vicLin8712 · 2025-10-03T04:23:19Z

Yes, but I think there still needs other list to handle remaining tasks with different task states. And the ready queue needs external node to linked which task state is READY or RUNNING.

I thought a running task would not appear in the ready queue, but instead each hart would keep track of its own currently running task node. Sorry if I misunderstood your description — could you clarify or correct my misunderstanding?

Once the RUNNING task need back to READY, there must process "enqueue" function to ensure it back to the last of ready queue. This is redundant and cycle costly process because it will use list_push_back API to push back to ready queue which time cost depended on the length of it.

To solve this issue, there is a pointer tracing the current RUNNING task and this pointer will go to next node use API list_cnext when task switch.

I will suggest kcb hold whole tasks rather than per-hart because the tasks will be linked in ascending tid in this list, while tid will be disordered in each hart task list. For example, 1->2->3->4->5->6->7 might be split as 1->2, 5->6, 3->7, 4 on different harts.

I am not necessarily against this, but I don't see why we need to keep the task id in sorted order or what benefit we gain from it. Do we have any feature that depends on maintaining this ordering?

When the task want to be deleted, I think each hart can just delete task from local ready queue without further operation if tasks list held at global.

visitorckw · 2025-10-03T06:17:42Z

Yes, but I think there still needs other list to handle remaining tasks with different task states. And the ready queue needs external node to linked which task state is READY or RUNNING.

I thought a running task would not appear in the ready queue, but instead each hart would keep track of its own currently running task node. Sorry if I misunderstood your description — could you clarify or correct my misunderstanding?

Once the RUNNING task need back to READY, there must process "enqueue" function to ensure it back to the last of ready queue. This is redundant and cycle costly process because it will use list_push_back API to push back to ready queue which time cost depended on the length of it.

To solve this issue, there is a pointer tracing the current RUNNING task and this pointer will go to next node use API list_cnext when task switch.

I don't think this is the reason why we must keep the running task in the ready queue. If we are really concerned about the efficiency of list_push_back, we can simply add a pointer to keep track of both ends of the list to make the insertion operation have an O(1) time complexity.

I will suggest kcb hold whole tasks rather than per-hart because the tasks will be linked in ascending tid in this list, while tid will be disordered in each hart task list. For example, 1->2->3->4->5->6->7 might be split as 1->2, 5->6, 3->7, 4 on different harts.

I am not necessarily against this, but I don't see why we need to keep the task id in sorted order or what benefit we gain from it. Do we have any feature that depends on maintaining this ordering?

When the task want to be deleted, I think each hart can just delete task from local ready queue without further operation if tasks list held at global.

Still confused. I still don't see the reason why the list needs to be kept sorted. At the same time, I'm also not quite clear on why this approach allows us to delete a task by only removing it from each hart's queue.

vicLin8712 · 2025-10-04T01:15:22Z

Yes, but I think there still needs other list to handle remaining tasks with different task states. And the ready queue needs external node to linked which task state is READY or RUNNING.

I thought a running task would not appear in the ready queue, but instead each hart would keep track of its own currently running task node. Sorry if I misunderstood your description — could you clarify or correct my misunderstanding?

Once the RUNNING task need back to READY, there must process "enqueue" function to ensure it back to the last of ready queue. This is redundant and cycle costly process because it will use list_push_back API to push back to ready queue which time cost depended on the length of it.
To solve this issue, there is a pointer tracing the current RUNNING task and this pointer will go to next node use API list_cnext when task switch.

I don't think this is the reason why we must keep the running task in the ready queue. If we are really concerned about the efficiency of list_push_back, we can simply add a pointer to keep track of both ends of the list to make the insertion operation have an O(1) time complexity.

Thanks for your suggestion, I'll follow this idea.

I will suggest kcb hold whole tasks rather than per-hart because the tasks will be linked in ascending tid in this list, while tid will be disordered in each hart task list. For example, 1->2->3->4->5->6->7 might be split as 1->2, 5->6, 3->7, 4 on different harts.

I am not necessarily against this, but I don't see why we need to keep the task id in sorted order or what benefit we gain from it. Do we have any feature that depends on maintaining this ordering?

When the task want to be deleted, I think each hart can just delete task from local ready queue without further operation if tasks list held at global.

Still confused. I still don't see the reason why the list needs to be kept sorted. At the same time, I'm also not quite clear on why this approach allows us to delete a task by only removing it from each hart's queue.

Yes, currently, there do not have relative feature depends on task list is sorted or not, but I'm just thinking that maximum keep the same structure as current version and focus on one hart scheduler case. Maybe the design that tasks held by local hart or global can make experiments to figure out the performance difference later.

jserv requested changes Sep 18, 2025

View reviewed changes

visitorckw reviewed Sep 18, 2025

View reviewed changes

visitorckw reviewed Sep 28, 2025

View reviewed changes

visitorckw suggested changes Sep 28, 2025

View reviewed changes

visitorckw reviewed Sep 28, 2025

View reviewed changes

vicLin8712 added 3 commits September 30, 2025 14:35

vicLin8712 force-pushed the o1-sched branch from a2ebd5e to e43cc5a Compare September 30, 2025 06:37

Single-Hart O(1) Enhancement #23

Are you sure you want to change the base?

Single-Hart O(1) Enhancement #23

Uh oh!

Conversation

vicLin8712 commented Sep 18, 2025

Uh oh!

jserv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

visitorckw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

visitorckw commented Sep 28, 2025

Uh oh!

vicLin8712 commented Sep 30, 2025

Uh oh!

visitorckw commented Oct 1, 2025

Uh oh!

visitorckw commented Oct 1, 2025

Uh oh!

vicLin8712 commented Oct 2, 2025

Uh oh!

visitorckw commented Oct 2, 2025

Uh oh!

vicLin8712 commented Oct 2, 2025

Uh oh!

vicLin8712 commented Oct 2, 2025

Uh oh!

visitorckw commented Oct 2, 2025

Uh oh!

vicLin8712 commented Oct 2, 2025

Uh oh!

visitorckw commented Oct 2, 2025

Uh oh!

vicLin8712 commented Oct 3, 2025

Uh oh!

visitorckw commented Oct 3, 2025

Uh oh!

vicLin8712 commented Oct 3, 2025

Uh oh!

visitorckw commented Oct 3, 2025

Uh oh!

vicLin8712 commented Oct 4, 2025

Uh oh!

Uh oh!