Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion searchindex.js

This file was deleted.

2 changes: 2 additions & 0 deletions src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -931,3 +931,5 @@
- [Post Exploitation](todo/post-exploitation.md)
- [Investment Terms](todo/investment-terms.md)
- [Cookies Policy](todo/cookies-policy.md)

- [Posix Cpu Timers Toctou Cve 2025 38352](linux-hardening/privilege-escalation/linux-kernel-exploitation/posix-cpu-timers-toctou-cve-2025-38352.md)
7 changes: 7 additions & 0 deletions src/linux-hardening/privilege-escalation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,13 @@ Tools that could help to search for kernel exploits are:
[linux-exploit-suggester2.pl](https://github.com/jondonas/linux-exploit-suggester-2)\
[linuxprivchecker.py](http://www.securitysift.com/download/linuxprivchecker.py) (execute IN victim,only checks exploits for kernel 2.x)

New kernel exploit note:
- POSIX CPU timers TOCTOU race (CVE-2025-38352): expiry vs deletion under task exit can corrupt timer state in kernels with CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n. See details:

{{#ref}}
linux-kernel-exploitation/posix-cpu-timers-toctou-cve-2025-38352.md
{{#endref}}

Always **search the kernel version in Google**, maybe your kernel version is written in some kernel exploit and then you will be sure that this exploit is valid.

### CVE-2016-5195 (DirtyCow)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
# POSIX CPU Timers TOCTOU race (CVE-2025-38352)

{{#include ../../../banners/hacktricks-training.md}}

This page documents a TOCTOU race condition in Linux/Android POSIX CPU timers that can corrupt timer state and crash the kernel, and under some circumstances be steered toward privilege escalation.

- Affected component: kernel/time/posix-cpu-timers.c
- Primitive: expiry vs deletion race under task exit
- Config sensitive: CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n (IRQ-context expiry path)

Quick internals recap (relevant for exploitation)
- Three CPU clocks drive accounting for timers via cpu_clock_sample():
- CPUCLOCK_PROF: utime + stime
- CPUCLOCK_VIRT: utime only
- CPUCLOCK_SCHED: task_sched_runtime()
- Timer creation wires a timer to a task/pid and initializes the timerqueue nodes:

```c
static int posix_cpu_timer_create(struct k_itimer *new_timer) {
struct pid *pid;
rcu_read_lock();
pid = pid_for_clock(new_timer->it_clock, false);
if (!pid) { rcu_read_unlock(); return -EINVAL; }
new_timer->kclock = &clock_posix_cpu;
timerqueue_init(&new_timer->it.cpu.node);
new_timer->it.cpu.pid = get_pid(pid);
rcu_read_unlock();
return 0;
}
```

- Arming inserts into a per-base timerqueue and may update the next-expiry cache:

```c
static void arm_timer(struct k_itimer *timer, struct task_struct *p) {
struct posix_cputimer_base *base = timer_base(timer, p);
struct cpu_timer *ctmr = &timer->it.cpu;
u64 newexp = cpu_timer_getexpires(ctmr);
if (!cpu_timer_enqueue(&base->tqhead, ctmr)) return;
if (newexp < base->nextevt) base->nextevt = newexp;
}
```

- Fast path avoids expensive processing unless cached expiries indicate possible firing:

```c
static inline bool fastpath_timer_check(struct task_struct *tsk) {
struct posix_cputimers *pct = &tsk->posix_cputimers;
if (!expiry_cache_is_inactive(pct)) {
u64 samples[CPUCLOCK_MAX];
task_sample_cputime(tsk, samples);
if (task_cputimers_expired(samples, pct))
return true;
}
return false;
}
```

- Expiration collects expired timers, marks them firing, moves them off the queue; actual delivery is deferred:

```c
#define MAX_COLLECTED 20
static u64 collect_timerqueue(struct timerqueue_head *head,
struct list_head *firing, u64 now) {
struct timerqueue_node *next; int i = 0;
while ((next = timerqueue_getnext(head))) {
struct cpu_timer *ctmr = container_of(next, struct cpu_timer, node);
u64 expires = cpu_timer_getexpires(ctmr);
if (++i == MAX_COLLECTED || now < expires) return expires;
ctmr->firing = 1; // critical state
rcu_assign_pointer(ctmr->handling, current);
cpu_timer_dequeue(ctmr);
list_add_tail(&ctmr->elist, firing);
}
return U64_MAX;
}
```

Two expiry-processing modes
- CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y: expiry is deferred via task_work on the target task
- CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n: expiry handled directly in IRQ context

```c
void run_posix_cpu_timers(void) {
struct task_struct *tsk = current;
__run_posix_cpu_timers(tsk);
}
#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
static inline void __run_posix_cpu_timers(struct task_struct *tsk) {
if (WARN_ON_ONCE(tsk->posix_cputimers_work.scheduled)) return;
tsk->posix_cputimers_work.scheduled = true;
task_work_add(tsk, &tsk->posix_cputimers_work.work, TWA_RESUME);
}
#else
static inline void __run_posix_cpu_timers(struct task_struct *tsk) {
lockdep_posixtimer_enter();
handle_posix_cpu_timers(tsk); // IRQ-context path
lockdep_posixtimer_exit();
}
#endif
```

In the IRQ-context path, the firing list is processed outside sighand

```c
static void handle_posix_cpu_timers(struct task_struct *tsk) {
struct k_itimer *timer, *next; unsigned long flags, start;
LIST_HEAD(firing);
if (!lock_task_sighand(tsk, &flags)) return; // may fail on exit
do {
start = READ_ONCE(jiffies); barrier();
check_thread_timers(tsk, &firing);
check_process_timers(tsk, &firing);
} while (!posix_cpu_timers_enable_work(tsk, start));
unlock_task_sighand(tsk, &flags); // race window opens here
list_for_each_entry_safe(timer, next, &firing, it.cpu.elist) {
int cpu_firing;
spin_lock(&timer->it_lock);
list_del_init(&timer->it.cpu.elist);
cpu_firing = timer->it.cpu.firing; // read then reset
timer->it.cpu.firing = 0;
if (likely(cpu_firing >= 0)) cpu_timer_fire(timer);
rcu_assign_pointer(timer->it.cpu.handling, NULL);
spin_unlock(&timer->it_lock);
}
}
```

Root cause: TOCTOU between IRQ-time expiry and concurrent deletion under task exit
Preconditions
- CONFIG_POSIX_CPU_TIMERS_TASK_WORK is disabled (IRQ path in use)
- The target task is exiting but not fully reaped
- Another thread concurrently calls posix_cpu_timer_del() for the same timer

Sequence
1) update_process_times() triggers run_posix_cpu_timers() in IRQ context for the exiting task.
2) collect_timerqueue() sets ctmr->firing = 1 and moves the timer to the temporary firing list.
3) handle_posix_cpu_timers() drops sighand via unlock_task_sighand() to deliver timers outside the lock.
4) Immediately after unlock, the exiting task can be reaped; a sibling thread executes posix_cpu_timer_del().
5) In this window, posix_cpu_timer_del() may fail to acquire state via cpu_timer_task_rcu()/lock_task_sighand() and thus skip the normal in-flight guard that checks timer->it.cpu.firing. Deletion proceeds as if not firing, corrupting state while expiry is being handled, leading to crashes/UB.

Why TASK_WORK mode is safe by design
- With CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y, expiry is deferred to task_work; exit_task_work runs before exit_notify, so the IRQ-time overlap with reaping does not occur.
- Even then, if the task is already exiting, task_work_add() fails; gating on exit_state makes both modes consistent.

Fix (Android common kernel) and rationale
- Add an early return if current task is exiting, gating all processing:

```c
// kernel/time/posix-cpu-timers.c (Android common kernel commit 157f357d50b5038e5eaad0b2b438f923ac40afeb)
if (tsk->exit_state)
return;
```

- This prevents entering handle_posix_cpu_timers() for exiting tasks, eliminating the window where posix_cpu_timer_del() could miss it.cpu.firing and race with expiry processing.

Impact
- Kernel memory corruption of timer structures during concurrent expiry/deletion can yield immediate crashes (DoS) and is a strong primitive toward privilege escalation due to arbitrary kernel-state manipulation opportunities.

Triggering the bug (safe, reproducible conditions)
Build/config
- Ensure CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n and use a kernel without the exit_state gating fix.

Runtime strategy
- Target a thread that is about to exit and attach a CPU timer to it (per-thread or process-wide clock):
- For per-thread: timer_create(CLOCK_THREAD_CPUTIME_ID, ...)
- For process-wide: timer_create(CLOCK_PROCESS_CPUTIME_ID, ...)
- Arm with a very short initial expiration and small interval to maximize IRQ-path entries:

```c
static timer_t t;
static void setup_cpu_timer(void) {
struct sigevent sev = {0};
sev.sigev_notify = SIGEV_SIGNAL; // delivery type not critical for the race
sev.sigev_signo = SIGUSR1;
if (timer_create(CLOCK_THREAD_CPUTIME_ID, &sev, &t)) perror("timer_create");
struct itimerspec its = {0};
its.it_value.tv_nsec = 1; // fire ASAP
its.it_interval.tv_nsec = 1; // re-fire
if (timer_settime(t, 0, &its, NULL)) perror("timer_settime");
}
```

- From a sibling thread, concurrently delete the same timer while the target thread exits:

```c
void *deleter(void *arg) {
for (;;) (void)timer_delete(t); // hammer delete in a loop
}
```

- Race amplifiers: high scheduler tick rate, CPU load, repeated thread exit/re-create cycles. The crash typically manifests when posix_cpu_timer_del() skips noticing firing due to failing task lookup/locking right after unlock_task_sighand().

Detection and hardening
- Mitigation: apply the exit_state guard; prefer enabling CONFIG_POSIX_CPU_TIMERS_TASK_WORK when feasible.
- Observability: add tracepoints/WARN_ONCE around unlock_task_sighand()/posix_cpu_timer_del(); alert when it.cpu.firing==1 is observed together with failed cpu_timer_task_rcu()/lock_task_sighand(); watch for timerqueue inconsistencies around task exit.

Audit hotspots (for reviewers)
- update_process_times() → run_posix_cpu_timers() (IRQ)
- __run_posix_cpu_timers() selection (TASK_WORK vs IRQ path)
- collect_timerqueue(): sets ctmr->firing and moves nodes
- handle_posix_cpu_timers(): drops sighand before firing loop
- posix_cpu_timer_del(): relies on it.cpu.firing to detect in-flight expiry; this check is skipped when task lookup/lock fails during exit/reap

Notes for exploitation research
- The disclosed behavior is a reliable kernel crash primitive; turning it into privilege escalation typically needs an additional controllable overlap (object lifetime or write-what-where influence) beyond the scope of this summary. Treat any PoC as potentially destabilizing and run only in emulators/VMs.

## References
- [Race Against Time in the Kernel’s Clockwork (StreyPaws)](https://streypaws.github.io/posts/Race-Against-Time-in-the-Kernel-Clockwork/)
- [Android security bulletin – September 2025](https://source.android.com/docs/security/bulletin/2025-09-01)
- [Android common kernel patch commit 157f357d50b5…](https://android.googlesource.com/kernel/common/+/157f357d50b5038e5eaad0b2b438f923ac40afeb%5E%21/#F0)

{{#include ../../../banners/hacktricks-training.md}}