Skip to content

tests: add a simple proof of concept test for gnrc_sock + gnrc_neterr#12678

Merged
miri64 merged 1 commit intoRIOT-OS:masterfrom
miri64:tests/enh/gnrc_sock+gnrc_neterr
Nov 12, 2019
Merged

tests: add a simple proof of concept test for gnrc_sock + gnrc_neterr#12678
miri64 merged 1 commit intoRIOT-OS:masterfrom
miri64:tests/enh/gnrc_sock+gnrc_neterr

Conversation

@miri64
Copy link
Member

@miri64 miri64 commented Nov 8, 2019

Contribution description

This introduces a simple proof-of-concept test that GNRC's sock implementation still works with gnrc_neterr compiled in and that errors reported by the stack are handed to the sock object.

Testing procedure

Confirm that EHOSTUNREACH is not reported by gnrc_sock itself:

git grep EHOSTUNREACH -- sys/net/gnrc/sock

Run the test on a board of your choice:

make -C tests/gnrc_sock_neterr flash test

Issues/PRs references

Issue popped up in #12625, but overall unrelated.

@miri64 miri64 added Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation Area: network Area: Networking Area: tests Area: tests and testing framework CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR CI: run tests If set, CI server will run tests on hardware for the labeled PR labels Nov 8, 2019
@miri64 miri64 requested review from benpicco and cgundogan November 8, 2019 19:26
@miri64 miri64 force-pushed the tests/enh/gnrc_sock+gnrc_neterr branch 2 times, most recently from 46565c4 to 1f3ebcc Compare November 8, 2019 19:54
@benpicco
Copy link
Contributor

benpicco commented Nov 9, 2019

When I add USEMODULE += gnrc_netdev_default to the Makefile and run this on actual hardware (e.g. samr21-xpro), I get

2019-11-09 18:07:58,169 # main(): This is RIOT! (Version: 2020.01-devel-578-ge4f1b)
2019-11-09 18:07:58,174 # FAILURE: gnrc_udp_send() had an unexpected error code: 8

@miri64
Copy link
Member Author

miri64 commented Nov 9, 2019

When I add USEMODULE += gnrc_netdev_default to the Makefile and run this on actual hardware (e.g. samr21-xpro), I get

2019-11-09 18:07:58,169 # main(): This is RIOT! (Version: 2020.01-devel-578-ge4f1b)
2019-11-09 18:07:58,174 # FAILURE: gnrc_udp_send() had an unexpected error code: 8

Yes, because the packet is actually send in that case (there is no ND for link-local addresses
with 6LoWPAN hosts, address resolution is done by reversing the EUI-64 to IID algorithm). If you change the test parameters, the test fails of course ;-).

@miri64
Copy link
Member Author

miri64 commented Nov 9, 2019

(you can easily confirm this with a sniffer)

@miri64
Copy link
Member Author

miri64 commented Nov 9, 2019

If you change to a global address it should read (on native):

main(): This is RIOT! (Version: 2020.01-devel-579-g1f3eb-tests/enh/gnrc_sock+gnrc_neterr)
FAILURE: gnrc_udp_send() had an unexpected error code: -101
$ errno 101
ENETUNREACH 101 Network is unreachable

@miri64
Copy link
Member Author

miri64 commented Nov 9, 2019

(only of course, if you don't configure a route to that global address beforehand :P)

@benpicco
Copy link
Contributor

benpicco commented Nov 9, 2019

Hm that's what I tried too:

--- a/tests/gnrc_sock_neterr/main.c 	2019-11-09 21:50:10.033931663 +0100
+++ b/tests/gnrc_sock_neterr/main.c	2019-11-09 21:56:06.783147677 +0100
@@ -22,7 +22,7 @@
 #include "net/sock/udp.h"
 
 #define TEST_PORT               (38664U)
-#define TEST_REMOTE             { 0xfe, 0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, \
+#define TEST_REMOTE             { 0x20, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, \
                                   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02 }
 #define TEST_PAYLOAD            { 0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef }
 
@@ -44,6 +44,10 @@
     memcpy(remote.addr.ipv6, _test_remote, sizeof(_test_remote));
     remote.port = TEST_PORT - 1;
 
+    char addr_str[IPV6_ADDR_MAX_STR_LEN];
+    ipv6_addr_to_str(addr_str,(ipv6_addr_t*) &remote.addr.ipv6, sizeof(addr_str));
+    printf("sending %d bytes to %s\n", sizeof(_test_payload), addr_str);
+
     /* remote is not reachable, so it should return an error */
     res = sock_udp_send(&_udp_sock, _test_payload, sizeof(_test_payload), &remote);
     if (res == -EHOSTUNREACH) {
2019-11-09 21:56:12,364 # sending 8 bytes to 2001::2
2019-11-09 21:56:12,369 # FAILURE: gnrc_udp_send() had an unexpected error code: 8

No routes or border routers anywhere to be seen here.

@miri64
Copy link
Member Author

miri64 commented Nov 9, 2019

Again with changed module behavior?

@miri64
Copy link
Member Author

miri64 commented Nov 9, 2019

It may be, that you are just triggering a condition, where it reaches a normal gnrc_pktbuf_release(), so that is why it is noted as success. Again, this is just a simple proof of concept, that packets released with gnrc_pktbuf_release_error() are reported as such. On native (so with classic ND) it works.

@benpicco
Copy link
Contributor

So the expected behavior is to get -EHOSTUNREACH even on real hardware (when a non-reachable global address is used).

Then this test would indeed have uncovered a bug - even more reason to have it!

@miri64
Copy link
Member Author

miri64 commented Nov 10, 2019

Then this test would indeed have uncovered a bug - even more reason to have it!

Yepp, but fixing this bug is the task of another PR. I will amend the test however for the second check.

@miri64
Copy link
Member Author

miri64 commented Nov 10, 2019

The bug is caused by a number of factors playing together btw:

  1. The function gnrc_neterr_reg() just writes the subscriber to the first snip:
    static inline int gnrc_neterr_reg(gnrc_pktsnip_t *pkt)
    {
    if (pkt->err_sub != KERNEL_PID_UNDEF) {
    return EALREADY;
    }
    pkt->err_sub = sched_active_pid;
    return 0;
    }
  2. Compiling in a network device, causes a netif to exist here
    #if GNRC_NETIF_NUMOF == 1
    /* set interface implicitly */
    gnrc_netif_t *netif = gnrc_netif_iter(NULL);
    if (netif != NULL) {
    out->netif = netif->pid;
    }
    #endif
    which sets the interface for the sock implicitly which causes a NETIF header to be generated here
    if (iface != KERNEL_PID_UNDEF) {
    gnrc_pktsnip_t *netif = gnrc_netif_hdr_build(NULL, 0, NULL, 0);
    gnrc_netif_hdr_t *netif_hdr;
    if (netif == NULL) {
    gnrc_pktbuf_release(pkt);
    return -ENOMEM;
    }
    netif_hdr = netif->data;
    netif_hdr->if_pid = iface;
    LL_PREPEND(pkt, netif);
    }
  3. Said NETIF header is removed by the IPv6 send function so it does not have to be checked for correctness beyond the interface
    /* discard to avoid complex checks for correctness (will be re-added
    * with correct addresses anyway as for the case were there is no
    * netif header provided)
    * Also re-establish temporary pointer used for write protection as
    * actual pointer */
    pkt = gnrc_pktbuf_remove_snip(tmp_pkt, tmp_pkt);
    .

My proposed fix is

  1. Set the status report subscription for all snips of a packet in gnrc_neterr_reg() (if any snip is already set return EALREADY).
  2. Only send the status report in gnrc_neterr_report() when (pkt->err_sub != KERNEL_PID_UNDEF) && ((pkt->next == NULL) || (pkt->next->err_sub != pkt->next->err_sub))

@miri64
Copy link
Member Author

miri64 commented Nov 10, 2019

My proposed fix is

1. Set the status report subscription for all snips of a packet in `gnrc_neterr_reg()` (if any snip is already set return `EALREADY`).

2. Only send the status report in [`gnrc_neterr_report()`](https://github.com/RIOT-OS/RIOT/blob/e4f1b67612ef2458e2262301c665ea47415f552d/sys/include/net/gnrc/neterr.h#L52-L62) when `(pkt->err_sub != KERNEL_PID_UNDEF) && ((pkt->next == NULL) || (pkt->next->err_sub != pkt->next->err_sub))`

Mh... no, that doesn't fix the problem, as gnrc_pktbuf_remove_snip() isolates the NETIF hdr snip. So we have to subscribe to the packet snips explicitly.

@miri64
Copy link
Member Author

miri64 commented Nov 10, 2019

Yepp, that fixes it :-).

miri64 added a commit to miri64/RIOT that referenced this pull request Nov 10, 2019
As analyzed in RIOT-OS#12678 there are cases where different reports can be
generated for the different snips of the packet send via the `sock`.

To catch all errors generated by the stack, the sock has to subscribe
for all snips of the packet sent. If any of the snips reports an error
distinct from `GNRC_NETERR_SUCCESS` or the previous one, we report that
status instead of just the first we receive. This way we are ensured to
have the first error reported by the stack for the given packet.
miri64 added a commit to miri64/RIOT that referenced this pull request Nov 10, 2019
As analyzed in RIOT-OS#12678 there are cases where different reports can be
generated for the different snips of the packet send via the `sock`.

To catch all errors generated by the stack, the sock has to subscribe
for all snips of the packet sent. If any of the snips reports an error
distinct from `GNRC_NETERR_SUCCESS` or the previous one, we report that
status instead of just the first we receive. This way we are ensured to
have the first error reported by the stack for the given packet.
miri64 added a commit to miri64/RIOT that referenced this pull request Nov 10, 2019
As analyzed in RIOT-OS#12678 there are cases where different reports can be
generated for the different snips of the packet send via the `sock`.

To catch all errors generated by the stack, the sock has to subscribe
for all snips of the packet sent. If any of the snips reports an error
distinct from `GNRC_NETERR_SUCCESS` or the previous one, we report that
status instead of just the first we receive. This way we are ensured to
have the first error reported by the stack for the given packet.
miri64 added a commit to miri64/RIOT that referenced this pull request Nov 10, 2019
As analyzed in RIOT-OS#12678 there are cases where different reports can be
generated for the different snips of the packet send via the `sock`.

To catch all errors generated by the stack, the sock has to subscribe
for all snips of the packet sent. If any of the snips reports an error
distinct from `GNRC_NETERR_SUCCESS` or the previous one, we report that
status instead of just the first we receive. This way we are ensured to
have the first error reported by the stack for the given packet.
@miri64
Copy link
Member Author

miri64 commented Nov 10, 2019

See #12682


memcpy(remote.addr.ipv6, _test_link_local_remote,
sizeof(_test_link_local_remote));
_test_udp_send(&remote, "EHOSTUNREACH", EHOSTUNREACH);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add something like

Suggested change
_test_udp_send(&remote, "EHOSTUNREACH", EHOSTUNREACH);
#ifdef MODULE_GNRC_SIXLOWPAN
_test_udp_send(&remote, "OK", -sizeof(_test_payload));
#else
_test_udp_send(&remote, "EHOSTUNREACH", EHOSTUNREACH);
#endif

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't expect anyone to test this this way in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I view the code in examples/ and tests/ as extended documentation. I suppose I'm not the only one who does that.

So when I want to understand how to use the gnrc_neterr module, I would look at tests/gnrc_sock_neterr and conclude that sending a datagram to a non-existing link-local address would always generate an error - after all, there is a testcase to ensure it is so.
It wouldn't be an unreasonable thing to assume if ACK_REQ is enabled.

So at least a comment would be appropriate that with 6LoWPAN the behaviour is different, referencing RFC 6775.

Copy link
Member Author

@miri64 miri64 Nov 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when I want to understand how to use the gnrc_neterr module, I would look at tests/gnrc_sock_neterr and conclude that sending a datagram to a non-existing link-local address would always generate an error - after all, there is a testcase to ensure it is so.

Again, if you change the test parameters (adding a device that acts differently under IPv6) you may get different results.

It wouldn't be an unreasonable thing to assume if ACK_REQ is enabled.

I don't get this comment... what has an option in the MAC layer to do with error handling in the application layer? The sock API is not there to ensure the packet reached the next host. This is also not how sockets in POSIX work. If you want reliability there you have to configure the socket as such (e.g. by using TCP).
I will however adapt the test. add a comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will however adapt the test. add a comment.

The #ifdef MODULE_GNRC_SIXLOWPAN is the wrong macro. It should rather check if the netif the packet is sent over is a 6LN. This however is getting too complicated for what is supposed to be a simple test, just for the case that somebody gets the idea of changing test parameters around (btw if GNRC_NETIF_NUMOF>1 the behavior for link-local addresses changes yet again!) and expecting the same results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this comment... what has an option in the MAC layer to do with error handling in the application layer? The sock API is not there to ensure the packet reached the next host. This is also not how sockets in POSIX work.

I'm just saying that someone not familiar with the implementation might assume the lower layer only reports success then if a link-layer ACK was received.
But the comment should help clear that up.

miri64 added a commit to miri64/RIOT that referenced this pull request Nov 11, 2019
As analyzed in RIOT-OS#12678 there are cases where different reports can be
generated for the different snips of the packet send via the `sock`.

To catch all errors generated by the stack, the sock has to subscribe
for all snips of the packet sent. If any of the snips reports an error
distinct from `GNRC_NETERR_SUCCESS` or the previous one, we report that
status instead of just the first we receive. This way we are ensured to
have the first error reported by the stack for the given packet.
Copy link
Contributor

@benpicco benpicco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test demonstrates the use of gnrc_neterr and helped uncover a bug in it, so even more reason to have it.
Please squash.

@miri64 miri64 force-pushed the tests/enh/gnrc_sock+gnrc_neterr branch from 2f367f9 to 6b50db1 Compare November 12, 2019 10:45
@miri64
Copy link
Member Author

miri64 commented Nov 12, 2019

Squashed.

@miri64 miri64 merged commit b546540 into RIOT-OS:master Nov 12, 2019
@miri64 miri64 deleted the tests/enh/gnrc_sock+gnrc_neterr branch November 12, 2019 18:29
@fjmolinas fjmolinas added this to the Release 2020.01 milestone Dec 13, 2019
gdoffe pushed a commit to gdoffe/RIOT that referenced this pull request Dec 17, 2019
As analyzed in RIOT-OS#12678 there are cases where different reports can be
generated for the different snips of the packet send via the `sock`.

To catch all errors generated by the stack, the sock has to subscribe
for all snips of the packet sent. If any of the snips reports an error
distinct from `GNRC_NETERR_SUCCESS` or the previous one, we report that
status instead of just the first we receive. This way we are ensured to
have the first error reported by the stack for the given packet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: network Area: Networking Area: tests Area: tests and testing framework CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR CI: run tests If set, CI server will run tests on hardware for the labeled PR Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants