Skip to content

xtimer: check in xtimer_mutex_lock_timeout timeout callback if a thread blocked on mutex#11543

Closed
JulianHolzwarth wants to merge 4 commits intoRIOT-OS:masterfrom
JulianHolzwarth:pr/xtimer_mutex_lock_timeout/already_locked
Closed

xtimer: check in xtimer_mutex_lock_timeout timeout callback if a thread blocked on mutex#11543
JulianHolzwarth wants to merge 4 commits intoRIOT-OS:masterfrom
JulianHolzwarth:pr/xtimer_mutex_lock_timeout/already_locked

Conversation

@JulianHolzwarth
Copy link
Contributor

@JulianHolzwarth JulianHolzwarth commented May 17, 2019

The tracking pr: #11660

This pr is the same pr as #10872 with a test and edited commit message.
it also includes a throwaway test commit.

Contribution description

Prevent a possible race condition for xtimer_mutex_lock_timeout when _mutex_timeout fires just after the mutex was locked but before the xtimer was removed

The flow

int xtimer_mutex_lock_timeout(mutex_t *mutex, uint64_t timeout) {
   ...
    mutex_lock(mutex);
    <- mutex locked ->
    <- _mutex_timeout fires and tries to remove thread from mutex queue ->
    xtimer_remove(&t);
    ...
}

Testing procedure

with the fix the test works,

BOARD=native make -C tests/xtimer_mutex_lock_timeout/ flash test
...
starting test: xtimer mutex lock timeout
OK
> 

without the fix

git revert HEAD
BOARD=native make -C tests/xtimer_mutex_lock_timeout/ flash test
...
starting test: xtimer mutex lock timeout
sys/xtimer/xtimer.c:239 => 0x56596856
*** RIOT kernel panic:
FAILED ASSERTION.

*** halted.


native: exiting
Unexpected end of file in expect script at "child.expect("OK")" (tests/xtimer_mutex_lock_timeout/tests/01-run.py:24)

issues/PRs references

This is the same pr as #10872 with a test.
#11660: tracking pr for xtimer_mutex_lock_timeout function

JulianHolzwarth and others added 3 commits May 17, 2019 17:14
Check case where xtimer_mutex_lock_timeout() does not time out.

This will be used to show concurrency issue in further commit.
test commit to show concurrence issue
Prevent a possible race condition when _mutex_timeout fires just after the
mutex was locked but before the xtimer was removed

The flow

int xtimer_mutex_lock_timeout(mutex_t *mutex, uint64_t timeout) {
   ...
    mutex_lock(mutex);
    /* mutex locked */
    /* _mutex_timeout fires and tries to remove thread from mutex queue */
    /* DEBUG: simulate callback call between lock and remove */
    xtimer_spin(xtimer_ticks_from_usec(timeout*2));
    xtimer_remove(&t);
    ...
}
@cladmi cladmi added Area: timers Area: timer subsystems Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR CI: needs squashing Commits in this PR need to be squashed; If set, CI systems will mark this PR as unmergable labels May 17, 2019
@cladmi cladmi added this to the Release 2019.07 milestone May 17, 2019
@JulianHolzwarth JulianHolzwarth changed the title Pr/xtimer mutex lock timeout/already locked xtimer: check in xtimer_mutex_lock_timeout timeout callback if a thread blocked on mutex May 17, 2019
@cladmi cladmi requested review from lebrush and vincent-d May 17, 2019 15:26
@cladmi
Copy link
Contributor

cladmi commented May 17, 2019

We updated the pull request from @pwillem with a test to show the assertion.

The difference in the commit will be reverted when removing the throwaway commit.

The fix is included in #11225 but the pull request does other things that do not all works. Better fix one thing at a time to simplify the review and documentation.

@cladmi cladmi requested review from MrKevinWeiss and jcarrano May 17, 2019 15:28
@cladmi
Copy link
Contributor

cladmi commented May 17, 2019

I contributed to this pull request so cannot really review myself.
The goal was to add the test and include a way to show the issue in the commit message.

The testing procedure gives the same result with BOARD=samr21-xpro.

2019-05-17 17:31:52,573 - INFO # mutex_timeout_test   tests xtimer mutex lock timeout
> 2019-05-17 17:31:52,614 - INFO #  mutex_timeout_test
2019-05-17 17:31:52,618 - INFO # starting test: xtimer mutex lock timeout
2019-05-17 17:31:52,618 - INFO # 0x16ff
2019-05-17 17:31:52,621 - INFO # *** RIOT kernel panic:
2019-05-17 17:31:52,622 - INFO # FAILED ASSERTION.
2019-05-17 17:31:52,622 - INFO # 
2019-05-17 17:31:52,623 - INFO # *** halted.

USEMODULE += xtimer
USEMODULE += shell

include $(RIOTBASE)/Makefile.include No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing new line at the end.

@cladmi
Copy link
Contributor

cladmi commented May 17, 2019

We noticed more issues as this change breaks the case where xtimer_sleep spins…
It needs more pre-changes to have this correct.

@vincent-d
Copy link
Member

@JulianHolzwarth @cladmi thanks for taking care of this. I've not been very active on RIOT recently, but I'll try to take a look at this one and test ASAP.

@kaspar030 kaspar030 added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels May 20, 2019
@cladmi
Copy link
Contributor

cladmi commented May 20, 2019

@vincent-d there are way more issues than this, and basically, what does not go in the normal timeout during mutex_lock is not working correctly, and cannot work. We just need to find a way to see how to fix one case without introducing new bugs in a minimal way.

@JulianHolzwarth
Copy link
Contributor Author

I will close this pr.
There are more issues.
I will solve them with the tracking pr:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: timers Area: timer subsystems CI: needs squashing Commits in this PR need to be squashed; If set, CI systems will mark this PR as unmergable CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants