Skip to content

pyterm: try to reconnect on SerialException during connect #10482

Merged
aabadie merged 3 commits intoRIOT-OS:masterfrom
miri64:pyterm/enh/try-reconnect-on-error
Mar 18, 2020
Merged

pyterm: try to reconnect on SerialException during connect #10482
aabadie merged 3 commits intoRIOT-OS:masterfrom
miri64:pyterm/enh/try-reconnect-on-error

Conversation

@miri64
Copy link
Member

@miri64 miri64 commented Nov 27, 2018

Contribution description

This is what the user would do anyway.

Testing procedure

See testing procedures in #10481. The effect of this PR is that instead of exiting on a SerialException it tries to reconnect every 10 seconds

Issues/PRs references

Based on #10481.

@miri64 miri64 added Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Area: tools Area: Supplementary tools labels Nov 27, 2018
@miri64 miri64 requested a review from aabadie November 27, 2018 16:42
@cladmi
Copy link
Contributor

cladmi commented Nov 27, 2018

Problem for me to try reconnecting, is that in the case or make test, we cannot detect that the term is not ready, and would fail for strange reasons.
What I would love more, is that flash waits until the device is ready to say it finished flashing.
It happens a lot with samr21-xpro.

@miri64
Copy link
Member Author

miri64 commented Nov 28, 2018

Problem for me to try reconnecting, is that in the case or make test, we cannot detect that the term is not ready, and would fail for strange reasons.
What I would love more, is that flash waits until the device is ready to say it finished flashing.
It happens a lot with samr21-xpro.

Since this case mostly happens when the device is unplugged, I'm not sure how valid that concern is.

@miri64 miri64 force-pushed the pyterm/enh/try-reconnect-on-error branch from b5c1c5e to 43506cb Compare November 28, 2018 12:23
@miri64
Copy link
Member Author

miri64 commented Nov 28, 2018

(rebased to current master to remove changes from #10481.

@miri64
Copy link
Member Author

miri64 commented Dec 17, 2018

@cladmi @aabadie ping?

@miri64 miri64 added the Community: Hack'n'ACK candidate This PR is a candidate for review and discussion during one of RIOT's monthly Hack'n'ACK parties label Dec 17, 2018
@aabadie
Copy link
Contributor

aabadie commented Dec 17, 2018

I'm fine with the changes as they are now but I'd also like to have @cladmi's Python expertise here as well ;)

@cladmi
Copy link
Contributor

cladmi commented Dec 17, 2018

I agree with the python implementation.

I still have my remark for the issue when testing, but as said below, it may just mean, not use pyterm as termprog for automated testing.


As make term will never fail if it does not find the right USB and try connecting to it, if you do not have a board, make test will only fail after the expect timeout.

In the case where after flashing, a device is in a "Device or resource busy" state, which happen to me from time to time on samr21-xpro, delaying the moment a node is available by 20 seconds will break our automated tests and any test synchronization mechanism which are not pyterm aware.
And they will not break in setup part but in the test execution with a TIMEOUT instead of an EOF.

I found out that the nodes have there status leds blinking a lot after being plugged and I managed to get this device or resource busy by doing make flash test shortly after connecting the board.

BOARD=samr21-xpro make -C tests/shell/ flash test
make: Entering directory '/home/harter/work/git/RIOT/tests/shell'
Building application "tests_shell" for "samr21-xpro" with MCU "samd21".

"make" -C /home/harter/work/git/RIOT/boards/samr21-xpro
"make" -C /home/harter/work/git/RIOT/core
"make" -C /home/harter/work/git/RIOT/cpu/samd21
"make" -C /home/harter/work/git/RIOT/cpu/cortexm_common
"make" -C /home/harter/work/git/RIOT/cpu/cortexm_common/periph
"make" -C /home/harter/work/git/RIOT/cpu/sam0_common
"make" -C /home/harter/work/git/RIOT/cpu/sam0_common/periph
"make" -C /home/harter/work/git/RIOT/cpu/samd21/periph
"make" -C /home/harter/work/git/RIOT/drivers
"make" -C /home/harter/work/git/RIOT/drivers/periph_common
"make" -C /home/harter/work/git/RIOT/sys
"make" -C /home/harter/work/git/RIOT/sys/isrpipe
"make" -C /home/harter/work/git/RIOT/sys/newlib_syscalls_default
"make" -C /home/harter/work/git/RIOT/sys/pm_layered
"make" -C /home/harter/work/git/RIOT/sys/ps
"make" -C /home/harter/work/git/RIOT/sys/shell
"make" -C /home/harter/work/git/RIOT/sys/shell/commands
"make" -C /home/harter/work/git/RIOT/sys/stdio_uart
"make" -C /home/harter/work/git/RIOT/sys/tsrb
   text    data     bss     dec     hex filename
   9876     140    2612   12628    3154 /home/harter/work/git/RIOT/tests/shell/bin/samr21-xpro/tests_shell.elf
/home/harter/work/git/RIOT/dist/tools/edbg/edbg  -t atmel_cm0p -b -v -p -f /home/harter/work/git/RIOT/tests/shell/bin/samr21-xpro/tests_shell.bin
Debugger: ATMEL EDBG CMSIS-DAP ATML2127031800001911 01.1A.00FB (S)
Clock frequency: 16.0 MHz
Target: SAM R21G18 (Rev B)
Programming........................................... done.
Verification........................................... done.
/home/harter/work/git/RIOT/dist/tools/pyterm/pyterm -p "/dev/ttyACM0" -b "115200"
No handlers could be found for logger "root"
2018-12-17 15:19:16,517 - INFO # Connect to serial port /dev/ttyACM0
2018-12-17 15:19:16,517 - ERROR # could not open port /dev/ttyACM0: [Errno 16] Device or resource busy: '/dev/ttyACM0'
2018-12-17 15:19:16,517 - INFO # Trying to reconnect to /dev/ttyACM0 in 10 sec
2018-12-17 15:19:26,527 - ERROR # could not open port /dev/ttyACM0: [Errno 16] Device or resource busy: '/dev/ttyACM0'
2018-12-17 15:19:26,527 - INFO # Trying to reconnect to /dev/ttyACM0 in 10 sec
Timeout in expect script at "child.expect('test_shell.')" (tests/shell/tests/01-run.py:52)

/home/harter/work/git/RIOT/tests/shell/../../Makefile.include:557: recipe for target 'test' failed
make: *** [test] Error 1
make: Leaving directory '/home/harter/work/git/RIOT/tests/shell'

And when the test has 120s timeout or more… it means waiting for nothing.

Having this added non deterministic behavior would push me to not use pyterm when running automated tests. It may not be a bad thing to not have to deal with the user specific interactive for automated testing but at least I want to mention it.
When testing with nodes on IoT-LAB we are already using socat.

@miri64
Copy link
Member Author

miri64 commented Dec 17, 2018

How is TIMEOUT less deterministic than EOF?

@miri64
Copy link
Member Author

miri64 commented Dec 17, 2018

You have the same behavior already when you disconnect your board during testing ;-).

@cladmi
Copy link
Contributor

cladmi commented Dec 17, 2018

How is TIMEOUT less deterministic than EOF?

EOF is immediate. It will happen instantly on the first call to expect and show an issue with the system under test as make term returned. Not after many seconds as if the valid string was not printed for a long test.

You have the same behavior already when you disconnect your board during testing ;-).

If I disconnect my board when I am running automated testing on it, then I am stupid. It may have happen, but it was my fault to do it.

I understand it may be useful when considering pyterm as an interactive program with user interaction. I personally prefer fail fast but if another maintainer approves the change I do not care.

Just want to clarify it can clash with automated execution, as other things in pyterm already do (empty line does not send an empty line) so could lead to not using it for tests anymore.
murdock is already using picocom instead of pyterm for tests.

@miri64 miri64 removed the Community: Hack'n'ACK candidate This PR is a candidate for review and discussion during one of RIOT's monthly Hack'n'ACK parties label Jan 29, 2019
@stale
Copy link

stale bot commented Aug 10, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

@stale stale bot added the State: stale State: The issue / PR has no activity for >185 days label Aug 10, 2019
@miri64 miri64 removed the State: stale State: The issue / PR has no activity for >185 days label Aug 10, 2019
@miri64 miri64 force-pushed the pyterm/enh/try-reconnect-on-error branch from 43506cb to 86c2a61 Compare September 10, 2019 14:41
@miri64
Copy link
Member Author

miri64 commented Sep 10, 2019

Rebased and inspired by #12120 I made the behavior configurable (however with reconnecting as the new default). @cladmi can you live with that?

@cladmi
Copy link
Contributor

cladmi commented Sep 10, 2019

If used in conjunction with disabling it for tests with #12107 or equivalent yes. It would prevent introducing test failing with a wrong reason, but allow a different user behavior.

@miri64
Copy link
Member Author

miri64 commented Sep 10, 2019

Ok, then @aabadie can you give it another look. I promise to provide the proper flags for cleanterm once either is merged.

Copy link
Contributor

@aabadie aabadie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tested this one and it doesn't work in the current state. See my comment below.

@fjmolinas
Copy link
Contributor

@aabadie your comments seem to have been addressed on this one.

@miri64 miri64 force-pushed the pyterm/enh/try-reconnect-on-error branch from e51e7be to dada82a Compare March 11, 2020 13:53
@miri64
Copy link
Member Author

miri64 commented Mar 11, 2020

Squashed in preparation for a rebase to include config for #12107.

@miri64 miri64 force-pushed the pyterm/enh/try-reconnect-on-error branch from dada82a to 09c05ff Compare March 11, 2020 13:57
@miri64 miri64 requested a review from fjmolinas as a code owner March 11, 2020 13:57
@miri64
Copy link
Member Author

miri64 commented Mar 11, 2020

Squashed in preparation for a rebase to include config for #12107.

Rebased and adapted to #12107

@miri64 miri64 added CI: run tests If set, CI server will run tests on hardware for the labeled PR CI: disable test cache If set, CI will always run all tests regardless of whether they have been run successfully before CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Mar 11, 2020
@miri64
Copy link
Member Author

miri64 commented Mar 11, 2020

Failing tests seem to be the same failing in the nightlies...

@miri64 miri64 added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: disable test cache If set, CI will always run all tests regardless of whether they have been run successfully before CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR CI: run tests If set, CI server will run tests on hardware for the labeled PR labels Mar 11, 2020
@miri64
Copy link
Member Author

miri64 commented Mar 17, 2020

Ping @aabadie? Your comments were addressed.

@aabadie
Copy link
Contributor

aabadie commented Mar 18, 2020

I'm trying this PR but I can't reproduce the original error (Device or resource busy message).

@fjmolinas
Copy link
Contributor

I'm trying this PR but I can't reproduce the original error (Device or resource busy message).

I get it alot, I'lk try to reproduce.

@fjmolinas
Copy link
Contributor

Tested rebased on #12304 (the issue shows up a lot form there):

BOARD=arduino-mkrfox1200 make -C tests/xtimer_usleep reset term
make: Entering directory '/home/francisco/workspace/RIOT/tests/xtimer_usleep'
stty -F /dev/ttyACM0 raw ispeed 600 ospeed 600 cs8 -cstopb ignpar eol 255 eof 255 
sleep 2
/home/francisco/workspace/RIOT/dist/tools/pyterm/pyterm -p "/dev/ttyACM0" -b "115200" 
2020-03-18 09:50:56,262 # Connect to serial port /dev/ttyACM0
2020-03-18 09:50:56,262 # could not open port /dev/ttyACM0: [Errno 16] Device or resource busy: '/dev/ttyACM0'
2020-03-18 09:50:56,263 # Trying to reconnect to /dev/ttyACM0 in 10 sec
2020-03-18 09:51:06,272 # could not open port /dev/ttyACM0: [Errno 16] Device or resource busy: '/dev/ttyACM0'
2020-03-18 09:51:06,272 # Trying to reconnect to /dev/ttyACM0 in 10 sec
Welcome to pyterm!
Type '/exit' to exit.
help
2020-03-18 09:51:24,511 # Help: Press s to start test, r to print it is ready
2020-03-18 09:51:24,512 # Help: Press s to start test, r to print it is ready
2020-03-18 09:51:24,513 # Help: Press s to start test, r to print it is ready
2020-03-18 09:51:24,514 # Help: Press s to start test, r to print it is ready

Copy link
Contributor

@aabadie aabadie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes are good. As reported by @fjmolinas, the new behavior is working well. I couldn't find any regression when testing locally either.

ACK and go

@aabadie aabadie merged commit 8814e60 into RIOT-OS:master Mar 18, 2020
@miri64 miri64 deleted the pyterm/enh/try-reconnect-on-error branch March 18, 2020 10:24
@miri64
Copy link
Member Author

miri64 commented Mar 18, 2020

Thanks for the review and testing!

@leandrolanzieri leandrolanzieri added this to the Release 2020.04 milestone Mar 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: tools Area: Supplementary tools CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants