Skip to content
This repository was archived by the owner on Oct 20, 2024. It is now read-only.

Conversation

@eunikolsky
Copy link

A crash report of an application using PocketSocket shows this
stacktrace:

EXC_BAD_ACCESS KERN_INVALID_ADDRESS 0x0000000000000030
Crashed: PSWebSocket <http://1.2.3.4:5/path>
0  CoreFoundation                 0x88885e9e0 _CFStreamScheduleWithRunLoop + 60
1  CoreFoundation                 0x8888dda34 _CFStreamSetDispatchQueue + 236
2  PocketSocket                   0x888803114 -[PSWebSocket connect] (PSWebSocket.m:341)
3  libdispatch.dylib              0x888818aa0 _dispatch_call_block_and_release + 24
…

Line 341 is if(_inputStream.streamStatus == NSStreamStatusNotOpen) {
it's strange that it would call CFStreamSetDispatchQueue, so the line
number is probably incorrect. The crash could be reproduced by
foregrounding and backgrounding the app quickly (in ~1 second) a few
times; it happened on line 338: CFWriteStreamSetDispatchQueue((__bridge CFWriteStreamRef)_outputStream, _workQueue);, which seems more correct.
The analysis of the disassembled _CFStreamScheduleWithRunLoop showed
the line 0x1fbc01e58 <+60>: ldr x28, [x19, #0x30] and the
exception was Thread 35: EXC_BAD_ACCESS (code=1, address=0x30), so the
x19 register must be zero. x19 is set to x0 a few instructions
above, where x0 is the first argument (according to the "Procedure
Call Standard for the ARM 64-bit Architecture"
). This pointed to
and confirmed the problem: CFWriteStreamSetDispatchQueue crashes when
the stream parameter is nil. So when the socket is disconnected in
the middle of -connect, _inputStream and/or _outputStream become
nil and cause the crash.

The fix is to check if the streams are no longer valid, and if so, stop.
To avoid a race condition when the streams may become nil right after
the check, they are retained strongly for the rest of the -connect
method.

A unit test to reproduce the crash is possible (setup a server socket,
then connect to and disconnect from it without a delay multiple times
(~500) in a loop), but not included here because:

  • it's very unreliable and ugly — reproduced this crash in ~5% of runs
    and required as much CPU usage as possible to slow down the socket
    processing to mess with the concurrency;

  • it caused at least three other types of crashes during the stress
    test;

  • it required changes in the -connect method to add artificial delays
    to force the crash.

A crash report of an application using `PocketSocket` shows this
stacktrace:

```
EXC_BAD_ACCESS KERN_INVALID_ADDRESS 0x0000000000000030
Crashed: PSWebSocket <http://1.2.3.4:5/path>
0  CoreFoundation                 0x88885e9e0 _CFStreamScheduleWithRunLoop + 60
1  CoreFoundation                 0x8888dda34 _CFStreamSetDispatchQueue + 236
2  PocketSocket                   0x888803114 -[PSWebSocket connect] (PSWebSocket.m:341)
3  libdispatch.dylib              0x888818aa0 _dispatch_call_block_and_release + 24
…
```

Line 341 is `if(_inputStream.streamStatus == NSStreamStatusNotOpen) {` —
it's strange that it would call `CFStreamSetDispatchQueue`, so the line
number is probably incorrect. The crash could be reproduced by
foregrounding and backgrounding the app quickly (in ~1 second) a few
times; it happened on line 338: `CFWriteStreamSetDispatchQueue((__bridge
CFWriteStreamRef)_outputStream, _workQueue);`, which seems more correct.
The analysis of the disassembled `_CFStreamScheduleWithRunLoop` showed
the line `0x1fbc01e58 <+60>:   ldr    x28, [x19, #0x30]` and the
exception was `Thread 35: EXC_BAD_ACCESS (code=1, address=0x30)`, so the
`x19` register must be zero. `x19` is set to `x0` a few instructions
above, where `x0` is the first argument (according to the ["Procedure
Call Standard for the ARM 64-bit Architecture"][arm64]). This pointed to
and confirmed the problem: `CFWriteStreamSetDispatchQueue` crashes when
the `stream` parameter is `nil`. So when the socket is disconnected in
the middle of `-connect`, `_inputStream` and/or `_outputStream` become
`nil` and cause the crash.

The fix is to check if the streams are no longer valid, and if so, stop.
To avoid a race condition when the streams may become `nil` right after
the check, they are retained strongly for the rest of the `-connect`
method.

A unit test to reproduce the crash is possible (setup a server socket,
then connect to and disconnect from it without a delay multiple times
(~500) in a loop), but not included here because:

* it's very unreliable and ugly — reproduced this crash in ~5% of runs
and required as much CPU usage as possible to slow down the socket
processing to mess with the concurrency;

* it caused at least three other types of crashes during the stress
test;

* it required changes in the `-connect` method to add artificial delays
to force the crash.

[arm64]: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant