Skip to content

Conversation

vicLin8712
Copy link
Collaborator

Hi @jserv ,

This is still a draft. I’ve implemented O(1) task selection using a
priority bitmap and per-priority queues. Key design points:

  • mo_task_spawn() enqueues new tasks, except the very first one which
    is assigned to task_current with state TASK_RUNNING.
  • sched_select_next_task() reinserts the running task into its ready
    queue, then scans ready_bitmap to select the highest-priority ready
    queue in O(1).

Could you confirm if this approach aligns with the project’s design
expectations?
In particular, I’d like feedback on:

  1. The use of the current sched_t data structure.
  2. Whether revisions to sched_t or kcb would be preferred to better
    support future SMP design.

Copy link
Contributor

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validate with proper statistics.


/* Scheduler attribution */
typedef struct sched {
volatile uint32_t ready_bitmap; /* 8-bit priority bitmap */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this variable set to volatile?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ISR might modify this value, so I added volatile to ensure bitmap always read directly from memory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit doubtful whether this is the right approach, since it sounds like you're using volatile to handle a synchronization problem. Even with the volatile keyword, accesses to the variable are not guaranteed to be atomic, so a race condition can still occur. Should we reconsider how synchronization is handled here? Maybe something like CRITICAL_ENTER/LEAVE() would be more appropriate.

Reference: https://docs.kernel.org/process/volatile-considered-harmful.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate your feedback and agree with your point of view. CRITICAL_ENTER()/LEAVE() method is a better approach to protect bitmap correctness. I'll make sure the bitmap manipulated in the critical section.

CRITICAL_LEAVE();
return ERR_TASK_CANT_SUSPEND;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid adding or removing blank lines unnecessarily.

Copy link
Collaborator

@visitorckw visitorckw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase your branch to keep the git history clean.


/* Scheduler attribution */
typedef struct sched {
uint32_t ready_bitmap; /* 8-bit priority bitmap */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the comment says this is an 8-bit bitmap, but it's declared as uint32_t?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be more priority level extensions in the future? If not, I'll modify this declaration.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if it's necessary in the future, we can easily come back and modify this.

@visitorckw
Copy link
Collaborator

Regarding your question in the PR description about the use of sched_t and the changes to kcb_t:

I'm not sure why you split kcb_t and sched_t into two separate structures, and I don't see an explanation for this in the commit message. From my perspective, both seem to store data needed by the scheduler, though perhaps you had other reasons or considerations in mind?

@vicLin8712
Copy link
Collaborator Author

Hi, @visitorckw,

The reason I split data structure sched from kcb is that I thought there must be a major control block that maintains different hart ID tasks. So I plan that sched will control per-hart tasks and kcb will play the major controller role.
It seems that you hope I only focus on modifying kcb to satisfy the O(1) scheduler, but without further multi-hart consideration?

Introduce sched_t structure to encapsulate O(1) scheduler state.
sched_t maintains ready_queues separated by priority, enabling task
selection in O(1) complexity.

kcb now contains a sched_t instance, allowing per-hart scheduler
control and future SMP extensions.
New tasks created by `mo_task_spawn()` are enqueued into their priority
ready queue. The very first task is special-cased: it becomes
`task_current` and its state is set to `TASK_RUNNING` (not enqueued).

`sched_select_next_task()` first returns the running task  to its
ready queue, then finds the highest non-empty priority by scanning the
`ready_bitmap`, pops one node from that queue, and assigns it to
`task_current`. The corresponding priority bit position in
bitmap will be cleared when the queue becomes empty.

This reduces selection from O(n) to O(1) and remains existing function
calls.
`sched_enqueue_task`/`sched_dequeue_task` also put/remove task
from corresponding prior queue and check the length of queue after
put/remove task to setup bitmap.

bitmap marco used here for simple operation purpose, it can be modified
for the further development.

The enqueue and dequeue process in `mo_task_spawn`/`sched_select_next_task`
are all replaced by `sched_enqueue_task` and `sched_dequeue_task`.
@visitorckw
Copy link
Collaborator

I don't have a strong opinion on the implementation, but I have a question about the design.

It appears there might be some redundancy. sched_t stores a per-hart ready_queue, while kcb_t retains the tasks list and task_current. When I looked for an explanation for this separation, I couldn't find one.

Could we clarify the design? It seems we should clearly define what data is per-hart and belongs in sched_t, versus what data is shared among all harts and should be in kcb_t.

@visitorckw
Copy link
Collaborator

This design also raises a question about task affinity. If we add a task to a specific hart's queue, is the intention for it to be bound to that hart, or is there a mechanism for tasks to be rebalanced across different hart queues later?

@vicLin8712
Copy link
Collaborator Author

Hi @visitorckw ,

Thanks for your feedback, and let me clarify my original design.

kcb plays the role that holding a list that contains whole tasks for simple task tracing and data collection, for example, find_task_by_id and such API can still work. Also, task_current remains in kcb for context switch, which means all new tasks' node will be updated by the scheduler here. I will modify task_current as an array for multi-hart utilization.

sched maintains the per-hart ready queue and corresponding queue information. The reason why it split from kcb is to avoid frequent access kcb when the ready queue might be removed/new tasks added. Also, separate sched can be better for information tracing based on different hearts.

Take one heart as an example, below diagram illustrates my design more clearly:
未命名绘图 drawio (15)

@visitorckw
Copy link
Collaborator

kcb plays the role that holding a list that contains whole tasks for simple task tracing and data collection, for example, find_task_by_id and such API can still work. Also, task_current remains in kcb for context switch, which means all new tasks' node will be updated by the scheduler here. I will modify task_current as an array for multi-hart utilization.

But since we expect to modify current_task in kcb_t into an array where each hart has its own entry, would it be more like per-hart information that belongs in sched_t? I do see that you have indeed added a current_task field in sched_t, but what are the respective roles of it and task_current in kcb_t?

@vicLin8712
Copy link
Collaborator Author

This design also raises a question about task affinity. If we add a task to a specific hart's queue, is the intention for it to be bound to that hart, or is there a mechanism for tasks to be rebalanced across different hart queues later?

Yes, it do a performance issue. I plan tasks bound into each hart. And I will added pre_heart ID into tcb to solve task affinity between task state transition.

For rebalanced problem across different hart, I think I will use IPI mechnism to rearrange tasks belonging. But the implementation detail still need be discussed.

Currently, I'm focus on experiment design to prove new design is available and better than current O(n) RR scheduler. And if you think my ideal is achieveable, I'll keep working on validation and will revise code based on what we have dissussed previously.

@vicLin8712
Copy link
Collaborator Author

kcb plays the role that holding a list that contains whole tasks for simple task tracing and data collection, for example, find_task_by_id and such API can still work. Also, task_current remains in kcb for context switch, which means all new tasks' node will be updated by the scheduler here. I will modify task_current as an array for multi-hart utilization.

But since we expect to modify current_task in kcb_t into an array where each hart has its own entry, would it be more like per-hart information that belongs in sched_t? I do see that you have indeed added a current_task field in sched_t, but what are the respective roles of it and task_current in kcb_t?

You are correct, I tried to context switch based each hart, which means I won't access kcb in per-hart context switch. But I found in #6, your design is more backward compatible and more efficient. I'll modify tcb and kcb later.

@visitorckw
Copy link
Collaborator

kcb plays the role that holding a list that contains whole tasks for simple task tracing and data collection, for example, find_task_by_id and such API can still work. Also, task_current remains in kcb for context switch, which means all new tasks' node will be updated by the scheduler here. I will modify task_current as an array for multi-hart utilization.

I'm still not sure if it's necessary to maintain separate task lists in both kcb_t and sched_t.

Consider an API like find_task_by_id(), which needs to iterate over every task across all harts. To do this safely, we must prevent any hart from modifying its ready queue during the iteration to avoid race conditions. This means we would need a mechanism to lock all queues anyway.

If that's the case, we can simply iterate through each per-hart list from sched_t without needing the global list in kcb_t, like this:

for (i = 0; i < hart_nums; i++) {
    list_for_each(...) {
        if (node->id == id) {
             return node;
        }
    }
}

This approach seems to work. Am I missing a more complex use case that justifies keeping the global task list in kcb_t?

@vicLin8712
Copy link
Collaborator Author

kcb plays the role that holding a list that contains whole tasks for simple task tracing and data collection, for example, find_task_by_id and such API can still work. Also, task_current remains in kcb for context switch, which means all new tasks' node will be updated by the scheduler here. I will modify task_current as an array for multi-hart utilization.

I'm still not sure if it's necessary to maintain separate task lists in both kcb_t and sched_t.

Consider an API like find_task_by_id(), which needs to iterate over every task across all harts. To do this safely, we must prevent any hart from modifying its ready queue during the iteration to avoid race conditions. This means we would need a mechanism to lock all queues anyway.

If that's the case, we can simply iterate through each per-hart list from sched_t without needing the global list in kcb_t, like this:

for (i = 0; i < hart_nums; i++) {
    list_for_each(...) {
        if (node->id == id) {
             return node;
        }
    }
}

This approach seems to work. Am I missing a more complex use case that justifies keeping the global task list in kcb_t?

I gauss that you mean the list in kcb, which records whole tasks can be separated into each sched. And sched also has list, ready_queue to maintain whole task in this hart with state TASK_RUNNING and TASK_READY? like

kcb_t->task_list // Redundant? Split into local_hart_task_list.
sched_t{
    list local_hart_task_list;    // Originally maintained by kcb->task_list, now split and maintained by local_hart_task_list
    list local_hart_ready_queue;    // Ready queue with tasks state of TASK_RUNNING and TASK_READY
    ...
}

@visitorckw
Copy link
Collaborator

I'm just thinking that each task should only appear in one list at a time, and never be duplicated across two different lists.

For running and ready tasks, using a per-hart data structure seems appropriate, as it would be efficient for the scheduler on each hart.

I'm less certain about tasks in other states. It's unclear to me whether placing them in a single global list or in per-hart lists would be more suitable. This decision likely depends on how we intend to handle task affinity.

@vicLin8712
Copy link
Collaborator Author

I'm just thinking that each task should only appear in one list at a time, and never be duplicated across two different lists.

For running and ready tasks, using a per-hart data structure seems appropriate, as it would be efficient for the scheduler on each hart.

I'm less certain about tasks in other states. It's unclear to me whether placing them in a single global list or in per-hart lists would be more suitable. This decision likely depends on how we intend to handle task affinity.

Yes, but I think there still needs other list to handle remaining tasks with different task states. And the ready queue needs external node to linked which task state is READY or RUNNING.

I will suggest kcb hold whole tasks rather than per-hart because the tasks will be linked in ascending tid in this list, while tid will be disordered in each hart task list. For example, 1->2->3->4->5->6->7 might be split as 1->2, 5->6, 3->7, 4 on different harts.

In order to solve task affinity, I think maybe we can add last_exe_hart_id into tcb?

@visitorckw
Copy link
Collaborator

Yes, but I think there still needs other list to handle remaining tasks with different task states. And the ready queue needs external node to linked which task state is READY or RUNNING.

I thought a running task would not appear in the ready queue, but instead each hart would keep track of its own currently running task node. Sorry if I misunderstood your description — could you clarify or correct my misunderstanding?

I will suggest kcb hold whole tasks rather than per-hart because the tasks will be linked in ascending tid in this list, while tid will be disordered in each hart task list. For example, 1->2->3->4->5->6->7 might be split as 1->2, 5->6, 3->7, 4 on different harts.

I am not necessarily against this, but I don't see why we need to keep the task id in sorted order or what benefit we gain from it. Do we have any feature that depends on maintaining this ordering?

In order to solve task affinity, I think maybe we can add last_exe_hart_id into tcb?

No, I didn't mean adding a feature that lets the user specify which hart a task must run on — that should be relatively easy. What I meant is that the scheduler decides when a task will be placed onto a particular hart's ready queue, which determines which hart it will run on.

@vicLin8712
Copy link
Collaborator Author

Yes, but I think there still needs other list to handle remaining tasks with different task states. And the ready queue needs external node to linked which task state is READY or RUNNING.

I thought a running task would not appear in the ready queue, but instead each hart would keep track of its own currently running task node. Sorry if I misunderstood your description — could you clarify or correct my misunderstanding?

Once the RUNNING task need back to READY, there must process "enqueue" function to ensure it back to the last of ready queue. This is redundant and cycle costly process because it will use list_push_back API to push back to ready queue which time cost depended on the length of it.

To solve this issue, there is a pointer tracing the current RUNNING task and this pointer will go to next node use API list_cnext when task switch.

I will suggest kcb hold whole tasks rather than per-hart because the tasks will be linked in ascending tid in this list, while tid will be disordered in each hart task list. For example, 1->2->3->4->5->6->7 might be split as 1->2, 5->6, 3->7, 4 on different harts.

I am not necessarily against this, but I don't see why we need to keep the task id in sorted order or what benefit we gain from it. Do we have any feature that depends on maintaining this ordering?

When the task want to be deleted, I think each hart can just delete task from local ready queue without further operation if tasks list held at global.

@visitorckw
Copy link
Collaborator

Yes, but I think there still needs other list to handle remaining tasks with different task states. And the ready queue needs external node to linked which task state is READY or RUNNING.

I thought a running task would not appear in the ready queue, but instead each hart would keep track of its own currently running task node. Sorry if I misunderstood your description — could you clarify or correct my misunderstanding?

Once the RUNNING task need back to READY, there must process "enqueue" function to ensure it back to the last of ready queue. This is redundant and cycle costly process because it will use list_push_back API to push back to ready queue which time cost depended on the length of it.

To solve this issue, there is a pointer tracing the current RUNNING task and this pointer will go to next node use API list_cnext when task switch.

I don't think this is the reason why we must keep the running task in the ready queue. If we are really concerned about the efficiency of list_push_back, we can simply add a pointer to keep track of both ends of the list to make the insertion operation have an O(1) time complexity.

I will suggest kcb hold whole tasks rather than per-hart because the tasks will be linked in ascending tid in this list, while tid will be disordered in each hart task list. For example, 1->2->3->4->5->6->7 might be split as 1->2, 5->6, 3->7, 4 on different harts.

I am not necessarily against this, but I don't see why we need to keep the task id in sorted order or what benefit we gain from it. Do we have any feature that depends on maintaining this ordering?

When the task want to be deleted, I think each hart can just delete task from local ready queue without further operation if tasks list held at global.

Still confused. I still don't see the reason why the list needs to be kept sorted. At the same time, I'm also not quite clear on why this approach allows us to delete a task by only removing it from each hart's queue.

@vicLin8712
Copy link
Collaborator Author

Yes, but I think there still needs other list to handle remaining tasks with different task states. And the ready queue needs external node to linked which task state is READY or RUNNING.

I thought a running task would not appear in the ready queue, but instead each hart would keep track of its own currently running task node. Sorry if I misunderstood your description — could you clarify or correct my misunderstanding?

Once the RUNNING task need back to READY, there must process "enqueue" function to ensure it back to the last of ready queue. This is redundant and cycle costly process because it will use list_push_back API to push back to ready queue which time cost depended on the length of it.
To solve this issue, there is a pointer tracing the current RUNNING task and this pointer will go to next node use API list_cnext when task switch.

I don't think this is the reason why we must keep the running task in the ready queue. If we are really concerned about the efficiency of list_push_back, we can simply add a pointer to keep track of both ends of the list to make the insertion operation have an O(1) time complexity.

Thanks for your suggestion, I'll follow this idea.

I will suggest kcb hold whole tasks rather than per-hart because the tasks will be linked in ascending tid in this list, while tid will be disordered in each hart task list. For example, 1->2->3->4->5->6->7 might be split as 1->2, 5->6, 3->7, 4 on different harts.

I am not necessarily against this, but I don't see why we need to keep the task id in sorted order or what benefit we gain from it. Do we have any feature that depends on maintaining this ordering?

When the task want to be deleted, I think each hart can just delete task from local ready queue without further operation if tasks list held at global.

Still confused. I still don't see the reason why the list needs to be kept sorted. At the same time, I'm also not quite clear on why this approach allows us to delete a task by only removing it from each hart's queue.

Yes, currently, there do not have relative feature depends on task list is sorted or not, but I'm just thinking that maximum keep the same structure as current version and focus on one hart scheduler case. Maybe the design that tasks held by local hart or global can make experiments to figure out the performance difference later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants