If one worker is fast and the other slow, it is possible for a the fast worker to loop multiple times during a single I/O callback, calling mStartWorkingSemaphore.wait() and decrementing the semaphore several times. I've verified that this occurs when using more than two threads on an iPhone.
This can be fixed by alternating between two start working semaphores.