Skip to content

sys/xtimer: compare two offsets rather than absolute times#13212

Merged
MichelRottleuthner merged 1 commit intoRIOT-OS:masterfrom
Hyungsin:forupstream_xtimer_bug
Jan 30, 2020
Merged

sys/xtimer: compare two offsets rather than absolute times#13212
MichelRottleuthner merged 1 commit intoRIOT-OS:masterfrom
Hyungsin:forupstream_xtimer_bug

Conversation

@Hyungsin
Copy link

@Hyungsin Hyungsin commented Jan 27, 2020

Contribution description

This PR fixes a bug in #9530 reported by #13209, making sure to compare two offsets rather than two absolute times when ordering timers.

Testing procedure

Issues/PRs references

#9530
#13209

@benpicco benpicco added Area: sys Area: System CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) labels Jan 27, 2020
@benpicco benpicco added this to the Release 2020.01 milestone Jan 27, 2020
@benpicco benpicco added the Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch label Jan 27, 2020
@aabadie
Copy link
Contributor

aabadie commented Jan 28, 2020

I'm testing this PR on iotlab using the testing procedure described in #13209. There are 2 nodes: one running this PR and the other one running with master. I expect master will crash after some time and hope this PR won't. Let's wait a bit for some results.

uint32_t elapsed = timerB->start_time - timerA->start_time;
/* it is necessary to compare two offsets, instead of two absolute times
* two conditions: (1) timerA was already expired before timerB starts
* (2) timerA will expire ealier than or equal to timerB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* (2) timerA will expire ealier than or equal to timerB
* (2) timerA will expire earlier than or equal to timerB

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

@aabadie
Copy link
Contributor

aabadie commented Jan 28, 2020

The node running the master version already crashed. The one running this PR is still alive. Let's wait a couple of hours more.

@MichelRottleuthner
Copy link
Contributor

@aabadie can you try to reproduce it with the steps of my comment int the related Issue?

@fjmolinas
Copy link
Contributor

fjmolinas commented Jan 28, 2020

@aabadie can you try to reproduce it with the steps of my comment int the related Issue?

I still get the fault:
CFLAGS=-DDEBUG_ASSERT_VERBOSE BOARD=iotlab-m3 make -C tests/xtimer_drift/ clean all flash term -j3
I had not edited properly, so the error was on my side. I DO NOT get the fault with this pr.

@aabadie
Copy link
Contributor

aabadie commented Jan 28, 2020

can you try to reproduce it with the steps of my comment int the related Issue?

I can reproduce the crash in tests/xtimer_drift when applying your change. And with this PR it doesn't crash.
I'm testing this on examples/gnrc_border_router, to see if it crashes earlier.

@fjmolinas
Copy link
Contributor

Also tested on 8bit:

/home/francisco/workspace/RIOT/dist/tools/pyterm/pyterm -p "/dev/riot/tty-atmega256rfr2-xpro" -b "115200" 
2020-01-28 11:20:49,064 # Connect to serial port /dev/riot/tty-atmega256rfr2-xpro
Welcome to pyterm!
Type '/exit' to exit.
2020-01-28 11:20:50,066 # now=4281.021104 (0xffcaccac ticks), drift=-80 us, jitter=-456 us
2020-01-28 11:20:50,788 # now=4282.021040 (0xffce9d2c ticks), drift=-80 us, jitter=0 us
2020-01-28 11:20:51,788 # now=4283.020976 (0xffd26dac ticks), drift=-80 us, jitter=0 us
2020-01-28 11:20:52,788 # now=4284.020912 (0xffd63e2c ticks), drift=-80 us, jitter=0 us
2020-01-28 11:20:53,788 # now=4285.020908 (0xffda0ebb ticks), drift=-20 us, jitter=60 us
2020-01-28 11:20:54,788 # now=4286.021060 (0xffdddf71 ticks), drift=196 us, jitter=216 us
2020-01-28 11:20:55,789 # now=4287.021252 (0xffe1b031 ticks), drift=452 us, jitter=256 us
2020-01-28 11:20:56,788 # now=4288.020656 (0xffe5802c ticks), drift=-80 us, jitter=-532 us
2020-01-28 11:20:57,788 # now=4289.020868 (0xffe950f1 ticks), drift=196 us, jitter=276 us
2020-01-28 11:20:58,788 # now=4290.020528 (0xffed212c ticks), drift=-80 us, jitter=-276 us
2020-01-28 11:20:59,788 # now=4291.020740 (0xfff0f1f1 ticks), drift=196 us, jitter=276 us
2020-01-28 11:21:00,788 # now=4292.020400 (0xfff4c22c ticks), drift=-80 us, jitter=-276 us
2020-01-28 11:21:01,788 # now=4293.020908 (0xfff8933b ticks), drift=492 us, jitter=572 us
2020-01-28 11:21:02,788 # now=4294.020684 (0xfffc6393 ticks), drift=332 us, jitter=-160 us
2020-01-28 11:21:03,787 # now=0.052912 (0x000033ac ticks), drift=-80 us, jitter=-412 us
2020-01-28 11:21:04,787 # now=1.053124 (0x00040471 ticks), drift=196 us, jitter=276 us
2020-01-28 11:21:05,787 # now=2.052784 (0x0007d4ac ticks), drift=-80 us, jitter=-276 us

@aabadie
Copy link
Contributor

aabadie commented Jan 28, 2020

I'm testing this on examples/gnrc_border_router, to see if it crashes earlier.

Well, it didn't crash earlier with your patch @MichelRottleuthner but it did. Using this PR, for the moment the border router is still running.

So from what I've tried so far, this PR seems to solve the problem reported in #13209.

@fjmolinas
Copy link
Contributor

I think I also noted somewhere that It showed a problem with the wraparound sometimes

@MichelRottleuthner does the issue fix here match what you sometimes saw when testing #9530?

Otherwise do you ACK the changes?


/**
* @brief compare two timers. return true if timerA expires earlier than or equal to timerB and false otherwise.
* assume that timerB starts later than timerA.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be safely assumed here in any case?
If not 100% sure, I'd prefer to add an explicit check for the start times to make it a proper compare function without any preconditions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved.

@MichelRottleuthner
Copy link
Contributor

@MichelRottleuthner does the issue fix here match what you sometimes saw when testing #9530?

Yes, it looks like it solves that. The only thing that bothers me a little, is the compare function having a precondition that is not absolutely needed and can be overlooked easily. Looks like a good entry point for the next bug ;)

Copy link
Author

@Hyungsin Hyungsin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing that bothers me a little, is the compare function having a precondition that is not absolutely needed and can be overlooked easily. Looks like a good entry point for the next bug ;)

Added another commit to resolve this issue. Thanks!

@MichelRottleuthner
Copy link
Contributor

After the last changes I tested this PR again on iotlab-m3 with all automated xtimer tests and the procedure described in #13209 (comment) everything worked as expected.
I'm fine with the changes, please go ahead and squash.

@Hyungsin Hyungsin force-pushed the forupstream_xtimer_bug branch from 6e70ae4 to eadd4c9 Compare January 29, 2020 18:53
Copy link
Contributor

@MichelRottleuthner MichelRottleuthner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

@Hyungsin
Copy link
Author

@MichelRottleuthner Thanks for the effort. We have only one (#13103) remaining now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: sys Area: System CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants