Skip to content

SdFat file not clearing, publishing stops after getting too backed up #9

@atwalsh

Description

@atwalsh

I'm facing some trouble with a SdFat implementation of the library after using the queue for long periods of time. I am collecting data at fairly fast rates -- maximum speed one message queued every 1.3 seconds -- but on average it's more like one message queued every 5 seconds when active, and even less frequently during slower publish periods. The device I am using has good cellular reception (>= 70% signal strength).

After publishing with the queue for 3-4 weeks, I noticed that my device stopped publishing data to the Cloud. I then added some log messages and noticed the following:

  • header.size = 65528 i.e. a maxed out uint16_t - 8.
  • oldestPos = 8
  • header.numEvents was around 6000 when I became aware of this problem, but steadily increased as more calls to queue.publish() were made

Info about my application:

  • I'm sharing the SD SPI with another IC. (“SPI Transactions” are enabled)
  • The SD card is full of both published and unpublished messages (I'm storing published messages on a server). When I first pulled the SD card from the device, the SD card had almost 74MB of messages on the events.dat file.

At this time, publishing has ceased. Modifying this else statement to the below allowed the queue to clear out messages over time.

else {
    pubqLogger.info("DISCARD!");
    // There must be some number of events in the queue, even though lookup failed
    if (this->getNumEvents() > 5)
        discardOldEvents(false);
}

The DISCARD! text was printed 8 times after flashing the new code to my device.

It appears that the associated if statement was asserting false because the following was returning NULL:

if (header.size >= header.numEvents) {
return NULL;
}

I've been running the same application with the updated else statement above for about 5 days, and data appears to be streaming in properly, but the updated else seems like a bad workaround. I'm curious if you have any suggestions on where else I could add logging statements to help debug the issue?

My initial thought is frequent SPI communications to the other IC (every 50ms) while the queue is also trying to use SPI to connect to the SD card could occasionally result in some malformed/missing data on the queue's event file. I am considering some shared locking mechanism to prevent any issues with this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions