Skip to content

sys/xtimer: Fix possible race condition in xtimer where sleep is added to the wrong list.#7116

Closed
DipSwitch wants to merge 1 commit intoRIOT-OS:masterfrom
DipSwitch:pr/fix/xtimer_race_condition
Closed

sys/xtimer: Fix possible race condition in xtimer where sleep is added to the wrong list.#7116
DipSwitch wants to merge 1 commit intoRIOT-OS:masterfrom
DipSwitch:pr/fix/xtimer_race_condition

Conversation

@DipSwitch
Copy link
Member

I have a problem where xtimer_usleep thinks that it should place the timer into the timers long list while (timer->long_target > _long_cnt) are both 0 but || !_this_high_period(target) says it's not this period. But when the timer is added to the long list. It doesn't trigger for aprox 1,5 hour instead of in 69uS.

Below the log and possible solution to fix.

11:58:43:726	COM9	8	_xtimer_set64() offset=207 long_offset=0
11:58:43:726	COM9	8	timer_set(): offset=207 now=1524772 (17446)
11:58:43:726	COM9	8	timer_set_absolute(): now=1525813 target=1526018
11:58:43:726	COM9	8	timer_set_absolute(): timer will expire in this timer period.
11:58:43:726	COM9	8	timer_set_absolute(): timer is new list head. updating lltimer.
11:58:43:774	COM9	8	_xtimer_set64() offset=69 long_offset=0
11:58:43:793	COM9	8	timer_set(): offset=69 now=1526992 (19666)
11:58:43:793	COM9	8	timer_set_absolute(): now=1528012 target=1528079
11:58:43:793	COM9	8	timer_set_absolute(): timer will expire in this timer period.
11:58:43:794	COM9	8	timer_set_absolute(): timer is new list head. updating lltimer.
11:58:43:861	COM9	8	_xtimer_set64() offset=207 long_offset=0
11:58:43:861	COM9	8	timer_set(): offset=207 now=1529335 (22008)
11:58:43:861	COM9	8	timer_set_absolute(): now=1530376 target=1530581
11:58:43:861	COM9	8	timer_set_absolute(): timer will expire in this timer period.
11:58:43:861	COM9	8	timer_set_absolute(): timer is new list head. updating lltimer.
11:58:43:909	COM9	8	INF: irq: clear
11:58:43:909	COM9	8	INF: mode: command
11:58:43:936	COM9	9	INF: data recv: < 8
11:58:43:936	COM9	9	INF: irq: set
11:58:43:936	COM9	9	INF: data recv: < 8
11:58:43:936	COM9	9	INF: cmd proc: AT+HDL=1
11:58:43:937	COM9	9	INF: cmd resp: OK
11:58:43:937	COM9	9	INF: mode data: start write
11:58:43:937	COM9	9	_xtimer_set64() offset=207 long_offset=0
11:58:43:937	COM9	9	timer_set(): offset=207 now=1540075 (32748)
11:58:43:937	COM9	9	timer_set_absolute(): now=1541116 target=1541321
11:58:43:937	COM9	9	timer_set_absolute(): timer will expire in this timer period.
11:58:43:937	COM9	9	timer_set_absolute(): timer is new list head. updating lltimer.
11:58:44:008	COM9	9	_xtimer_set64() offset=207 long_offset=0
11:58:44:008	COM9	9	timer_set(): offset=207 now=1546722 (39395)
11:58:44:008	COM9	9	timer_set_absolute(): now=1547762 target=1547968
11:58:44:008	COM9	9	timer_set_absolute(): timer will expire in this timer period.
11:58:44:008	COM9	9	timer_set_absolute(): timer is new list head. updating lltimer.
11:58:44:056	COM9	9	INF: mode data: write 3 bytes
11:58:44:056	COM9	9	INF: data send: > 9
11:58:44:073	COM9	9	INF: mode: command
11:58:44:073	COM9	9	INF: cmd proc: AT+IRQ?1
11:58:44:073	COM9	9	INF: cmd resp: OK
11:58:44:095	COM9	9	INF: cmd proc: AT+IRQ=23
11:58:44:095	COM9	9	INF: cmd resp: OK
11:58:44:095	COM9	9	INF: mode data: send 3
11:58:44:095	COM9	9	_xtimer_set64() offset=207 long_offset=0
11:58:44:095	COM9	9	timer_set(): offset=207 now=1568667 (61340)
11:58:44:095	COM9	9	timer_set_absolute(): now=1569708 target=1569913
11:58:44:095	COM9	9	timer_set_absolute(): timer will expire in this timer period.
11:58:44:095	COM9	9	timer_set_absolute(): timer is new list head. updating lltimer.
11:58:44:162	COM9	9	_xtimer_set64() offset=69 long_offset=0
>> line of interrest 11:58:44:162	COM9	9	timer_set(): offset=69 now=1570886 (63559)
11:58:44:162	COM9	9	timer_set_absolute(): now=1571905 target=1571972
11:58:44:162	COM9	9	xtimer_set_absolute(): the timer doesn't fit into the low-level timer's mask.
11:58:45:106	COM9	9	Master error (IOException): No data received from the modem

To solve this problem xtimer_set_absolute should be changed to something like this:

int _xtimer_set_absolute_dbg(xtimer_t *timer, uint32_t target)
{
>>	/* move IRQ disable to top */
	unsigned state = irq_disable();
	if (_is_set(timer)) {
		_remove(timer);
	}

	uint32_t now = _xtimer_now();
>>	/* to avoid a race condition through printing or other interrupts interrupting the process and make less than 32 bit timers miss there overflow interrupt */
	int res = 0;

	printf("timer_set_absolute(): now=%" PRIu32 " target=%" PRIu32 "\n", now, target);

	timer->next = NULL;
	if ((target >= now) && ((target - XTIMER_BACKOFF) < now)) {
		/* backoff */
		irq_restore(state);
		xtimer_spin_until(target + XTIMER_BACKOFF);
		_shoot(timer);
		return 0;
	}

	timer->target = target;
	timer->long_target = _long_cnt;
	if (target < now) {
		timer->long_target++;
	}

	if ( (timer->long_target > _long_cnt) || !_this_high_period(target)) {
		if (timer->long_target == _long_cnt) {
			printf("xtimer_set_absolute(): increasing long count.\n");
	>> /* tried fix but didn't work */		//++timer->long_target;
		}
		else
			printf("xtimer_set_absolute(): the timer doesn't fit into the low-level timer's mask.\n");
		_add_timer_to_long_list(&long_list_head, timer);
	}
	else {
		if (_xtimer_lltimer_mask(now) >= target) {
			printf("xtimer_set_absolute(): the timer will expire in the next timer period\n");
			_add_timer_to_list(&overflow_list_head, timer);
		}
		else {
			printf("timer_set_absolute(): timer will expire in this timer period.\n");
			_add_timer_to_list(&timer_list_head, timer);

			if (timer_list_head == timer) {
				printf("timer_set_absolute(): timer is new list head. updating lltimer.\n");
				_lltimer_set(target - XTIMER_OVERHEAD);
			}
		}
	}

	irq_restore(state);

	return res;
}

Possible related threads:

@DipSwitch DipSwitch added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Impact: major The PR changes a significant part of the code base. It should be reviewed carefully Area: timers Area: timer subsystems labels May 31, 2017
@DipSwitch DipSwitch added this to the Release 2017.04 milestone May 31, 2017
@DipSwitch
Copy link
Member Author

fixed by: #7053

@aabadie
Copy link
Contributor

aabadie commented Jun 21, 2017

What should we do here ? I'm not sure #7053 has consensus.

@smlng
Copy link
Member

smlng commented Jan 15, 2018

inconclusive discussion, remove milestone

@smlng smlng removed this from the Release 2018.01 milestone Jan 15, 2018
@miri64
Copy link
Member

miri64 commented Feb 26, 2019

Is there still interest to merge this bug fix??!?

@cladmi
Copy link
Contributor

cladmi commented Jun 18, 2019

Ended up here as I was testing xtimer_now in masked interrupts/interrupt context and found this issue also listed in the release-notes

I disagree that #7053 solves the issue in general. You could still be de-scheduled before doing irq_disable and I think it does not handle it.

I agree with that change, as it is the only way to ensure that the comparison with the target has an effect and that we do not get de-scheduled after.

However I am afraid as xtimer_now is unsafe to call from masked interrupt context #5338 I was writing tests to show it is easy to make it fail but it was known for years.

@cladmi cladmi added this to the Release 2019.07 milestone Jun 18, 2019
@kaspar030
Copy link
Contributor

I think this has been fixed with #9530.

@kaspar030 kaspar030 closed this Feb 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: timers Area: timer subsystems Impact: major The PR changes a significant part of the code base. It should be reviewed carefully Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants