gnrc_sock_tcp: add gnrc sock tcp#16494
gnrc_sock_tcp: add gnrc sock tcp#16494benpicco merged 1 commit intoRIOT-OS:masterfrom brummer-simon:gnrc_sock_tcp-add_gnrc_sock_tcp
Conversation
|
Now this also needs a rebase 😉 You want to fix the last commit message while you are at it:
|
|
Done |
benpicco
left a comment
There was a problem hiding this comment.
Looks good! If you use the existing types intended for implementing this you should be able to get rid of this casting show 😉
| if (err) { | ||
| printf("%s: returns %s\n", name, strerror(err)); | ||
| } else { | ||
| printf("%s: returns %d\n", name, err); | ||
| } |
There was a problem hiding this comment.
No need for this. strerror(0) returns "Success"
There was a problem hiding this comment.
Not on every platform ;). And the python code calling the functions matches for the specific result. Thats why I handle 0 manually.
There was a problem hiding this comment.
BTW.: You added an implementation returning OK on 0, a few hours ago. :)
There was a problem hiding this comment.
arg sorry for that - feel free to change to return value to "Success" so you can get rid of this adapter function
There was a problem hiding this comment.
I prefer to keep it as it is. I am not sure if the returned string is always "Success" on all platforms. For any error code the return value should always be textual representation of the enum value but for 0 different strings are possible. Since this is test code I think keeping the tiny function is okay.
| dump_args(argc, argv); | ||
|
|
There was a problem hiding this comment.
| dump_args(argc, argv); | |
| if (argc < 3) { | |
| printf("usage: %s <host> <port>\n", argv[0]); | |
| return -1; | |
| } |
let's be a bit more user friendly. Right now if you just call gnrc_tcp_open expecting to get usage information, the test will simply crash on you.
There was a problem hiding this comment.
Also when I try
> gnrc_tcp_open fe80::90a7:a6ff:fe4b:2e32 12345
gnrc_tcp_open: argc=3, argv[0] = gnrc_tcp_open, argv[1] = fe80::90a7:a6ff:fe4b:2e32, argv[2] = 12345
gnrc_tcp_open: returns -EAFNOSUPPORT
> gnrc_tcp_open fe80::90a7:a6ff:fe4b:2e32%5 12345
gnrc_tcp_open: argc=3, argv[0] = gnrc_tcp_open, argv[1] = fe80::90a7:a6ff:fe4b:2e32%5, argv[2] = 12345
gnrc_tcp_open: returns -EAFNOSUPPORT
There was a problem hiding this comment.
I am not sure if user-friendlyness should be is the design goal for a test API only called from python code.
For the -EAFNOSUPPORT thing I need to take a deeper look. This test should only expose sock_connect. Did you run this agains the gnrc_tcp tests instead of the gnrc_sock_tests that are the subject of this PR?
There was a problem hiding this comment.
Oh sorry I ran tests/gnrc_tcp instead of tests/gnrc_sock_tcp 😅
I also figured out the format
> sock_tcp_connect [fe80::90a7:a6ff:fe4b:2e32]:12345 6666
sock_tcp_connect: argc=3, argv[0] = sock_tcp_connect, argv[1] = [fe80::90a7:a6ff:fe4b:2e32]:12345, argv[2] = 6666
sock_tcp_connect: returns Unknown error -110
I wanted to run this against netcat since I expected it to behave very much like netcat, just with more manual steps
There was a problem hiding this comment.
Ah I have to specify the interface as well
> sock_tcp_connect [fe80::90a7:a6ff:fe4b:2e32%5]:12345 7412
sock_tcp_connect: argc=3, argv[0] = sock_tcp_connect, argv[1] = [fe80::90a7:a6ff:fe4b:2e32%5]:12345, argv[2] = 7412
sock_tcp_connect: returns 0
Have you considered using netutils_get_ipv6() in gnrc_tcp_ep_from_str() 😉
There was a problem hiding this comment.
Hmm since we use strerror, most of the used errorcodes seem to be Unknown :(
I have not considered netutils_get_ipv6(), but I would argue that this is not in scope of this PR. Cleaning this up should happen as a seperate change. This PR is for the interface between gnrc and sock, not changes in gnrc.
There was a problem hiding this comment.
Hmm since we use strerror, most of the used errorcodes seem to be Unknown :(
That's because strerror expects positive error codes
With this
--- a/tests/gnrc_sock_tcp/main.c
+++ b/tests/gnrc_sock_tcp/main.c
@@ -36,7 +36,7 @@ void dump_args(int argc, char **argv)
void print_result(const char *name, int err)
{
if (err) {
- printf("%s: returns %s\n", name, strerror(err));
+ printf("%s: returns %s\n", name, strerror(-err));
} else {
printf("%s: returns %d\n", name, err);
}I get
sock_tcp_read: returns Resource temporarily unavailable
instead of
sock_tcp_read: returns Unknown error -11
I have not considered netutils_get_ipv6(), but I would argue that this is not in scope of this PR. Cleaning this up should happen as a seperate change.
I agree. But we still should make the CLI interface of the test a bit nicer (print usage instead of crashing if argument count does not match, maybe even use default timeout / local port if the argument was omitted).
There is value in being able to test out individual bits of the API in an exploitative way, it shouldn't be a frustrating experience.
tests/gnrc_sock_tcp/main.c
Outdated
| int timeout = atol(argv[1]); | ||
| unsigned to_receive = atol(argv[2]); |
There was a problem hiding this comment.
Would be nice if those were swapped
| int timeout = atol(argv[1]); | |
| unsigned to_receive = atol(argv[2]); | |
| int timeout = argv > 2 ? atol(argv[2]) : 0; | |
| unsigned to_receive = atol(argv[1]); |
But I actually get stuck in a loop here if I try to read more than what's in the buffer:
> sock_tcp_read 100 10
sock_tcp_read: argc=3, argv[0] = sock_tcp_read, argv[1] = 100, argv[2] = 10
sock_tcp_read: received 10 It's worki
> sock_tcp_read 0 10
sock_tcp_read: argc=3, argv[0] = sock_tcp_read, argv[1] = 0, argv[2] = 10
sock_tcp_read: returns Unknown error -11
sock_tcp_read: returns Unknown error -11
sock_tcp_read: returns Unknown error -11
…
There was a problem hiding this comment.
Without a timeout, you basically trigger non-blocking behavior. If no data is available, -EAGAIN ist returned. This is not a fault of the implementation, the test just tries to continue to read until the required amount of data is available. The test assumes that amount of data that should be read, is actually sent.
There was a problem hiding this comment.
If you want to return here, you have to specify a non-zero timeout.
There was a problem hiding this comment.
Doesn't this contradict the description of SOCK_NO_TIMEOUT though?
There was a problem hiding this comment.
Hmm you might have a point here. I'll need to think about it in detail after my working hours ;)
|
I tried to build a simple echo server, but it will only accept the first connection: #include "net/sock/tcp.h"
#define SOCK_TCP_QUEUE_SIZE (1)
static sock_tcp_queue_t sock_queue;
static sock_tcp_t socks[SOCK_TCP_QUEUE_SIZE];
static sock_tcp_t *client;
static char echo_stack[THREAD_STACKSIZE_DEFAULT +
THREAD_EXTRA_STACKSIZE_PRINTF];
// HACK
#undef SOCK_NO_TIMEOUT
#define SOCK_NO_TIMEOUT (UINT32_MAX - 1)
static void *echo_thread(void *arg)
{
(void)arg;
static char rx_buf[64];
while (1) {
ssize_t res = sock_tcp_accept(&sock_queue, &client, SOCK_NO_TIMEOUT);
if (res) {
printf("accept error: %s\n", strerror(-res));
continue;
}
puts("connected");
while (1) {
res = sock_tcp_read(client, rx_buf, sizeof(rx_buf), SOCK_NO_TIMEOUT);
if (res == -ETIMEDOUT) {
continue;
}
if (res < 0) {
printf("read error: %s\n", strerror(-res));
break;
}
printf("%.*s", res, rx_buf);
res = sock_tcp_write(client, rx_buf, res);
}
sock_tcp_disconnect(client);
}
return NULL;
}
int echo_start(void)
{
sock_tcp_ep_t ep = SOCK_IPV6_EP_ANY;
ep.port = 23; // not htons(23); ?
int res = sock_tcp_listen(&sock_queue, &ep, socks, ARRAY_SIZE(socks), 0);
if (res) {
return res;
}
/* initiate echo server */
thread_create(echo_stack, sizeof(echo_stack),
THREAD_PRIORITY_MAIN - 1, THREAD_CREATE_STACKTEST,
echo_thread, NULL, "echo");
return 0;
}After I disconnect (I have to kill |
This might actually be standard behavior. Are you sure the sock_tcp_disconnect was finished before you try the next connection attempt? If you try to establish a new connection before sock_tcp_disconnect was finisched, the only TCB that could accept the connection is still in use. Depending on what side terminates the connection, sock_tcp_disconnect can take up to 2 minutes before it is usable again. This duration is defined by the TCP standard. TCP is not really designed for embedded use, full blown TCP implementations would just allocate a new TCB to handle the next connection. |
How do I know |
Yes it blocks. Internally it tries to close the connection. If the other side doesn't send anymore it still has to wait two minutes before it is unblocked. This has something todo with slow roundtrip times from the early internet. So if you try to open a new connection before sock_tcp_disconnect finished it must fail. The entire process is quite complicated to explain in a code review, if you wanna know details, the TCP RFC is your friend. For your test: just print after your disconnect and then call nc again. In addition: you can tweak all TCP Timouts or the number of preallocated buffers via kconfig |
|
The TCP sock API actually contains an example echo server. I would expect this to work with |
I haven't tried it but after the changes related to SOCK_NO_TIMEOUT, we should test it. |
|
please squash! (&rebase) |
Done. I fixed the NO_TIMOUT mapping and the asserts as well. |
|
Hmm the echo server example still does not behave as one would expect :/ main.cMakefile |
|
I managed to figure out whats happening here. In short: The example is broken. Here are more details: We see here interesting behavior: On entering Ctrl+C netcat actually tries to teardown the connection normally by sending a FIN that gets an ACK in reply. Receiving a FIN before sending our own FIN puts a TCP connection into a state called CLOSED_WAIT. This state means basically, our peer has sent all its data and excepts from us to send a FIN in return (aka. call close). The zero bytes read returns mean here: You have read all data and from this connection no more data will come. This behavior is not a bug in gnrc_tcp. Infact there was a bug (see #12367) a while ago that this behavior was not implemented this way. If you change: to It works as intended. |
|
That's good news (or maybe bad news as we have a bug in the example code 😕) I moved the example code to a proper example (#16739) and noticed something else: The example client will not disconnect when the example server disconnects: clientserverMaybe there is a bug in the client code too? |
Not necessarily. Depending on the timing in connection teardown, disconnect can take up to two minutes for its return. |
Hm I thought this was only the case if the other side does not properly disconnect? |
|
Retransmissions of FIN Packets in cases where the ACK sent in response was lost on the way. A TCP connection is practically dead if after two times the maximum segment lifetime no packet was transmitted. Linux handles this this internally and return fast after close and allocates a new TCB for the next connection. RIOT can't do this since everything is statically allocated. Thats why close/disconnect has to wait until the used TCB is reusable again. If you want more details look for TIME-WAIT in the TCP RFC. TCP connection teardown is pretty complicated |
|
But I'm testing with I don't see this issue with |
|
Okay. Let me explain the Issue in detail: At some point later B sends a FIN to A. Now A knows that B wants to terminate its side of the connection.
At this Point A never knows if B received the ACK because in either case B would not answer immediately. So if the so called LAST WAIT timer times out and B didn't send anything within the last to minutes -> A knows that B received the ACK before and closes its connection. If A receives a FIN from B triggered by its re-transmit mechanism, during its waiting period, A sends a new ACK, resets the timer and waits again. That's why you observe this waiting period regardless of the underlying network stability. TCP must assume the most unstable underlying connection (TCP was invented around 1981, just assume the network quality and roundtrip times back then ;)). I don't know how LWIP is handling this (maybe they use some hack like sending reset packages on connection teardown, who knows). I thought about how I implement it differently but all solutions I came up with, would enable easily exploitable memory leakage in the packet buffer that's why I didn't try this. You could of course shorten the maximum segment lifetime constant by defines. The gnrc_tcp tests do this to speedup the connection teardown process. |
|
A possible solution might be to calculate a better fitting and much shorter maximum segment lifetime dynamically based on to the mean round-trip time used by the re-transmission mechanism. This could speed things up, but this is definitely non-standard behavior and out of scope of this PR. |
benpicco
left a comment
There was a problem hiding this comment.
I'm not really happy yet as the explanation doesn't make the unexpected behavior go away or give a way for an application developer to mitigate that (other than always using timeouts).
However, this is indeed unrelated to the SOCK API adoption that now really just is a thin wrapper. So ACK!
|
@benpicco - I did some digging regarding that timeout. Current Linux and BSD Systems don't use the 2 Minute MSL anymore. They use one minute instead. We can to this too. This eases the pain a little bit. |
|
Interestingly enough this has been done already. The current MSL value is 30s. I think I'll take a lock into the dynamic MSL calculation thing. This might be a good but experimental optional solution. |
Contribution description
This PR adds gnrc_sock_tcp, finally adding a sock_tcp implementation based on gnrc_tcp.
Testing procedure
To test this PR a simplified version of the gnrc_tcp testsuite is supplied under tests/gnrc_sock_tcp.
To execute it, just follow the instructions there.
Issues/PRs references
Depends on PR #16459
Depends on PR #16461
Depends on PR #16493
Closes #16493 - TCP Sockets can not be used / built