Skip to content

Conversation

@lsf37
Copy link
Member

@lsf37 lsf37 commented Sep 10, 2025

This RFC proposes to add the ability to set, maintain, and switch between domain schedules at runtime, authorised by the existing DomainCap, retaining the current information flow proof and behaviour when the DomainCap no longer exists in the system or is not used.

Rendered version

@Indanz
Copy link
Contributor

Indanz commented Sep 10, 2025

Initial draft implementation can be found at seL4/seL4#1511. But please keep design discussions here. That implementation differs in details from the RFC because I didn't had the draft RFC in front of me while writing the code.

  • The implementation only checks the domain length for end markers and there is no need to force zero domain. As domain 0 is also a valid domain, requiring both length and domain to be set to zero for end markers does not make much sense. If we do set requirements for the end marker domain value, an invalid domain would make more sense. But that would be less user friendly as it depends on numDomains and is harder to explain (and again, the actual code will just ignore the domain value).

  • Another difference is that the domain time is not checked for MIN_BUDGET. The rationale being that if the user messes up their scheduling so much that nothing can run, then having a domain time less than MIN_BUDGET is the least of their problems. The other reason to not enforce it is if the tasks in such a domain make no system calls, then there is no reason to enforce length > MIN_BUDGET. Most MIN_BUDGET checks in the kernel seem arbitrary, I rather not add more of them.

  • One issue I ran into was that having the time in word_t makes it either 32-bit or 64-bit. Perhaps we should make it fixed size, either always 32-bits or 64-bits instead of word_t. Limiting dom_t to 32-bits if the length is 32-bits would also make sense. With a 1GHz timer clock, 32-bit is enough for a 4 second long domain time, so seems sufficient. Users can always concatenate entries if 32-bits isn't enough time for one domain for whatever reason.

  • Now that we're breaking the API anyway, it's an excellent moment to rename DomainSetSet to DomainSet. I understand the humour, but as the function is actually changing the domain of a TCB and does not modify the set of domains, calling it DomainSetSet is somewhat misleading.

@lsf37
Copy link
Member Author

lsf37 commented Sep 10, 2025

If we are standardising on ticks, making duration always uint64_t would definitely make sense.

@Indanz
Copy link
Contributor

Indanz commented Sep 10, 2025

If we are standardising on ticks, making duration always uint64_t would definitely make sense.

If it was up to me, I would combine domain and duration in 64 bits, e.g. 16 bits domain and 48 bits for ticks.

Anyway, a downside of making it more than 32-bits is that retrieving the argument becomes harder, as it needs two registers on 32-bit platforms and only one on 64-bit. So no matter what we do, it will result in some ugliness somewhere. (Need something similar to mode_parseTimeArg on non-MCS.)

@lsf37 lsf37 marked this pull request as ready for review October 12, 2025 09:32
Co-authored-by: Rafal Kolanski <rafal.kolanski.proofcraft.systems>
Co-authored-by: Indan Zupancic <indan@nul.nu>
Signed-off-by: Gerwin Klein <gerwin.klein@proofcraft.systems>
@lsf37 lsf37 force-pushed the 0200-domain-schedules branch from 173b662 to 70afb70 Compare October 15, 2025 00:22
@lsf37
Copy link
Member Author

lsf37 commented Dec 2, 2025

  • The implementation only checks the domain length for end markers and there is no need to force zero domain. As domain 0 is also a valid domain, requiring both length and domain to be set to zero for end markers does not make much sense. If we do set requirements for the end marker domain value, an invalid domain would make more sense. But that would be less user friendly as it depends on numDomains and is harder to explain (and again, the actual code will just ignore the domain value).

Forcing the marker to be (0,0) is a guard against programming errors and misunderstanding the API. It does not really matter much either way, and guarding against programming errors is not really a design goal of seL4, but it would be available for no real cost here.

  • Another difference is that the domain time is not checked for MIN_BUDGET. The rationale being that if the user messes up their scheduling so much that nothing can run, then having a domain time less than MIN_BUDGET is the least of their problems. The other reason to not enforce it is if the tasks in such a domain make no system calls, then there is no reason to enforce length > MIN_BUDGET. Most MIN_BUDGET checks in the kernel seem arbitrary, I rather not add more of them.

I have to confess that I only put that in because that check is everywhere else in MCS. I was thinking that having a domain time < MIN_BUDGET would make no thread schedulable in that domain, which would be a situation we'd want to avoid. I might be wrong about that. Happy to remove it. The check itself only affects the decode function, so it is very cheap to have if we do want it.

  • One issue I ran into was that having the time in word_t makes it either 32-bit or 64-bit. Perhaps we should make it fixed size, either always 32-bits or 64-bits instead of word_t. Limiting dom_t to 32-bits if the length is 32-bits would also make sense. With a 1GHz timer clock, 32-bit is enough for a 4 second long domain time, so seems sufficient. Users can always concatenate entries if 32-bits isn't enough time for one domain for whatever reason.

I don't think having time as sometimes 32 bit and sometimes 64 bit in MCS is a good idea. For MCS, the kernel has a standardised time_t which is 64 bit and that seems perfectly appropriate to use here, including the existing functions for parsing it as a 2-word argument where necessary on 32 bit architectures.

This would only affect MCS -- the domain schedule is in kernel ticks for master, which are fine to be word_t, because kernel ticks are much longer than timer ticks.

  • Now that we're breaking the API anyway, it's an excellent moment to rename DomainSetSet to DomainSet. I understand the humour, but as the function is actually changing the domain of a TCB and does not modify the set of domains, calling it DomainSetSet is somewhat misleading.

The naming scheme comes from the Set operation on the DomainSet cap, but I don't mind. It would make sense to rename the cap in the manual to just Domain cap. If people feel this is a good time to make this a bit clearer, I'm happy to add it to the RFC. This will be a breaking change for just "cleanup", but it is true that we are changing interaction with the domain schedule anyway and for most frameworks/libraries this would be hidden from the developer.

@lsf37
Copy link
Member Author

lsf37 commented Dec 2, 2025

My summary of the discussion so far would be that we have not fully converged yet on the minor encoding details above, but nobody has voiced any objections to the overall design.

@Indanz
Copy link
Contributor

Indanz commented Dec 2, 2025

I've implemented the 64-bit only version and I must admit that looks very clean, it only adds conditional padding to struct dschedule.

My main objection is that it wastes a lot of memory, but that can be solved in the future by manually packing the 8-bit domain with a 56-bit length into one 64-bit value, if we really want to fix it.

I'm not keen on making it word_t for non-MCS, as that would give it a different API depending on whether it's MCS or non-MCS, causing more ifdeffery. I suppose we could define tick_t as word_t for non-MCS (and same for seL4_Time or seL4_Tick in libsel4).

Checking for (0,0) is one extra if-check, it's up to you whether you mind doing the additional verification work for that.

@lsf37 lsf37 added the active approved RFC that is being implemented label Dec 15, 2025
@lsf37 lsf37 mentioned this pull request Dec 15, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

active approved RFC that is being implemented

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants