Skip to content

xtimer: set now pointer correctly in _update_short_timers() loop#13850

Merged
kaspar030 merged 3 commits intoRIOT-OS:masterfrom
miri64:xtimer/fix/now32-overflow
Apr 10, 2020
Merged

xtimer: set now pointer correctly in _update_short_timers() loop#13850
kaspar030 merged 3 commits intoRIOT-OS:masterfrom
miri64:xtimer/fix/now32-overflow

Conversation

@miri64
Copy link
Member

@miri64 miri64 commented Apr 9, 2020

Contribution description

This fixes xtimer to use xtimer_now64() instead of xtimer_now() for updating the *now variable during the iteration in _update_short_timers() function. The same function is used to initialize *now in _timer_callback() below.

While using xtimer_now() in this iteration step does not hinder the proper execution of all timers in the short term timers (for those the xtimer module only looks at the start_time member, not the long_start_time member) at least for the current long term time window (I did not test higher cases), it sets the long_start_time member to 0 for all timers following in the list of timers after this iteration step. However, external modules that rely on this to be correct, e.g. evtimer, fail their calculations when trying to compare to the current value to xtimer_now64().

If the changed header becomes the new head this behavior won't show up, as it is overwritten once the update of all headers is done. However, all timers later in the list will now have the wrong values.

Testing procedure

A regression test testing this behavior is provided. Confirm the bug, by reverting the fix and running it, run it with the patch to confirm that it fixes the bug:

export BOARD="<knock yourself out>"
make -C tests/xtimer_now32_overflow flash test

Issues/PRs references

None, but should fix a known, but not reported issue, where RPL starts "spamming" DIS every minute instead of every 5 minutes after >71min.

@miri64 miri64 added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: timers Area: timer subsystems Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch labels Apr 9, 2020
@miri64 miri64 added this to the Release 2020.04 milestone Apr 9, 2020
@kaspar030
Copy link
Contributor

Good job tracking this down @miri64 and @cgundogan!

@kaspar030 kaspar030 added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR CI: run tests If set, CI server will run tests on hardware for the labeled PR labels Apr 9, 2020
@miri64 miri64 force-pushed the xtimer/fix/now32-overflow branch from 3c34649 to 11d4b6f Compare April 9, 2020 22:00
@miri64
Copy link
Member Author

miri64 commented Apr 9, 2020

Right. I wanted to mark @cgundogan as a co-author for the commit ^^". My mishap is now fixed ;-)

@kaspar030
Copy link
Contributor

Reminds me of this: #9530 (comment) 😁

@miri64
Copy link
Member Author

miri64 commented Apr 9, 2020

Right. I wanted to mark @cgundogan as a co-author for the commit ^^". My mishap is now fixed ;-)

Wrong mail address apparently :D

Copy link
Contributor

@kaspar030 kaspar030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK for the fix. Please someone else take a look at the test.

This fixes `xtimer` to use `xtimer_now64()` instead of `xtimer_now()`
for updating the `*now` variable during the iteration in
`_update_short_timers()` function. The same function is used to
initialize `*now` in `_timer_callback()` below.

While using `xtimer_now()` in this iteration step does not hinder the
proper execution of all timers in the short term timers (for those the
`xtimer` module only looks at the `start_time` member, not the
`long_start_time` member) at least for the current long term time window
(I did not test higher cases), it sets the `long_start_time` member to 0
for all timers following in the list of timers after this iteration
step. However, external modules that rely on this to be correct,
e.g. evtimer [1], fail their calculations when trying to compare to
the current value to `xtimer_now64()`.

[1] https://github.com/RIOT-OS/RIOT/blob/11f3d68/sys/evtimer/evtimer.c#L118-L121

Co-Authored-By: Cenk Gündoğan <mail+dev@gundogan.net>
@miri64 miri64 force-pushed the xtimer/fix/now32-overflow branch from 11d4b6f to ca8afaf Compare April 9, 2020 22:02
@miri64
Copy link
Member Author

miri64 commented Apr 9, 2020

Wrong mail address apparently :D

Fixed!

@tcschmidt
Copy link
Member

ACK for the fix. Please someone else take a look at the test.

ACK without test? Why is the Ham playing the Chicken here?

@miri64
Copy link
Member Author

miri64 commented Apr 9, 2020

ACK for the fix. Please someone else take a look at the test.

ACK without test? Why is the Ham playing the Chicken here?

Murdock should be running the test.

@kaspar030
Copy link
Contributor

ACK without test?

@miri64 kept me up to date during the bug fixing. Once you see the fix, the bug is obvious.

@miri64
Copy link
Member Author

miri64 commented Apr 9, 2020

Oops, flake8 reminds me that forgot to check the time during the test, as I originally wanted (which is why I imported the time module in the first place).

@miri64
Copy link
Member Author

miri64 commented Apr 9, 2020

Also piggy-backed some major style issues (ignoring the >80 char per line ones) for xtimer_core.c

@JulianHolzwarth
Copy link
Contributor

Because

timer_list_head->long_start_time = (uint32_t)(now >> 32);
fixes the wrong long_start_time

Meaning there should be another timer in between.

@JulianHolzwarth
Copy link
Contributor

This works for me: Did I understand the problem correctly?

Diff:

--- a/tests/xtimer_now32_overflow/main.c
+++ b/tests/xtimer_now32_overflow/main.c
@@ -27,9 +27,11 @@ msg_t _main_msg_queue[MAIN_MSG_QUEUE_SIZE];
 
 int main(void)
 {
-    xtimer_t timer1, timer2;
+    xtimer_t timer1, timer2, timer_test;
     msg_t msg1 = { .content = { .value = 1U } };
     msg_t msg2 = { .content = { .value = 2U } };
+    msg_t msg3 = { .content = { .value = 3U } };
+
 
     msg_init_queue(_main_msg_queue, MAIN_MSG_QUEUE_SIZE);
     /* ensure that xtimer_now64() is greater than UINT32_MAX */
@@ -40,6 +42,7 @@ int main(void)
     print_str(" #1 in 1 sec\n");
     print_str(" #2 in 3 sec\n\n");
     xtimer_set_msg64(&timer1, 1 * US_PER_SEC, &msg1, thread_getpid());
+    xtimer_set_msg64(&timer_test, 2* US_PER_SEC, &msg3, thread_getpid());
     xtimer_set_msg64(&timer2, 3 * US_PER_SEC, &msg2, thread_getpid());
     print_str("now=");
     print_u64_dec(xtimer_now64().ticks64);
@@ -50,6 +53,9 @@ int main(void)
         msg_t msg;
 
         msg_receive(&msg);
+        if (msg.content.value == 3U) {
+            continue;
+        }
         print_str("#");
         print_u32_dec(msg.content.value);
         print_str(":now=");

@miri64
Copy link
Member Author

miri64 commented Apr 10, 2020

Yepp that was the same idea I had yesterday before going to bed and it works for me as well!

@miri64
Copy link
Member Author

miri64 commented Apr 10, 2020

Added a third timer to the test. With that the bug becomes reproducible as expected.

@miri64
Copy link
Member Author

miri64 commented Apr 10, 2020

If this PR will be merged, the description should be updated. In general, I'm not in favor of merging because we don't completely understand the fix.

I think I pretty much understand it now. I updated the OP as such.

@JulianHolzwarth
Copy link
Contributor

Works for me and looks good to me.

@miri64 miri64 force-pushed the xtimer/fix/now32-overflow branch from e1ecae4 to f59ea80 Compare April 10, 2020 10:41
@miri64
Copy link
Member Author

miri64 commented Apr 10, 2020

Works for me and looks good to me.

So I can squash?

@JulianHolzwarth
Copy link
Contributor

yes

miri64 and others added 2 commits April 10, 2020 12:42
@miri64
Copy link
Member Author

miri64 commented Apr 10, 2020

While I wrote the test last night, I think we pretty much came to the same idea while I was keeping it behind the curtain. So I added you as a co-author :-).

@miri64 miri64 force-pushed the xtimer/fix/now32-overflow branch from f59ea80 to 085c62e Compare April 10, 2020 10:43
@JulianHolzwarth
Copy link
Contributor

thanks

@kaspar030
Copy link
Contributor

The shortest reproducer should be:

  1. Wait the first 2**32 usec (so every timer set has long_start_time > 0).
  2. Set a timer to trigger sometime (doesn't need to be an evtmer)
  3. Set an evtimer to trigger sometime after
  4. Wait until 2. Has triggered, but not 3.
  5. Set another evtimer to trigger after the first

This should basically instantly trigger 3.

@miri64
Copy link
Member Author

miri64 commented Apr 10, 2020

The shortest reproducer should be:

1. Wait the first 2**32 usec (so every timer set has long_start_time > 0).

This alone makes the test anything but short :-P

2. Set a timer to trigger sometime (doesn't need to be an evtmer)

3. Set an evtimer to trigger sometime after

4. Wait until 2. Has triggered, but not 3.

5. Set another evtimer to trigger after the first

This should basically instantly trigger 3.

Unsurprisingly if you set 3 to trigger just after 2.

I think my test is faster, involves less modules, and has less potential to be unstable because precise timings are required.

@miri64
Copy link
Member Author

miri64 commented Apr 10, 2020

@JulianHolzwarth seems to be happy about the current state of the tests and has understood the bug enough to come up with his own version of the test. So can we merge?

@miri64
Copy link
Member Author

miri64 commented Apr 10, 2020

Please don't piggybag style fixes. The actual oneliner fix is hard enough to find.

Go through the list of changes, as I typically do to understand the thoughts behind a PR, and you find it pretty easy.

@kaspar030
Copy link
Contributor

Go through the list of changes,

Exactly. Nicer if it's length is one, not a multiple of that. If a PR fixes an important bug, it should do just that, and not piggyback style changes. If the diff of a PR fixes one line and that is the first, with only additions of a regression test following, that's easy to parse. If a PR fixes one line but the diff shows a couple of others, it is harder to parse. Just my opinion, I guess...

I think my test is faster, involves less modules, and has less potential to be unstable because precise timings are required.

With wait 2**32min, I obviously meant "set _xtimer_current_time". :)

Anyhow, I see this as sport, so here's something even simpler, faster, involving less modules and is even less timing critical (so overall much better):

https://gist.github.com/272d7eb3e79b47a551e6c14798dfdadb

Feel free to use that instead, I can also PR that after this is merged.

@miri64
Copy link
Member Author

miri64 commented Apr 10, 2020

Nah, I keep it as is now.

@kaspar030
Copy link
Contributor

Ok then, let's go!

@miri64
Copy link
Member Author

miri64 commented Apr 11, 2020

Backport provided in #13854

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: timers Area: timer subsystems CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants