suprsync: Add error handling to each database transaction #973
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR adds error handling for all database transactions within the
runprocess. If an error occurs the agent waits for 5 seconds before continuing at the top of the process loop and retrying all operations.I believe this is a safe operation, but it'd be good to get a second opinion. It may reattempt a file transfer (if files were transferred, but couldn't be marked as such), but that should be fine.
I added
errors_sqliteto the list of errors in the counters stat, stored in session data, so we can see how often this is happening.I also added some comments, mainly to describe whether a write operation was happening within the called functions. (I had thought implementing #886 would have helped us here, but it's mostly writes. Only in the case of hitting a lock during
srfm.get_archive_stats()will this actually help. That said, this is a relatively slow step, especially as the sqlite file grows.)Motivation and Context
We've seen various
OperationalErrormessages in the suprsync agent, which can occur at any point in the process that interacts with the database. This is because the Pysmurf Monitor agent also writes to the database file. This should fix the regularly occurring crashes in the suprsync agents on site.Resolves #483.
Resolves #874.
How Has This Been Tested?
This branch was run on the E2E testing system. Timestreams were generated with the SMuRF file emulator and then manually added to the suprsync database.
The timestream suprsync agent then picked up and copied the files:
I don't really have a good method for testing the database lock and corresponding handling. But I'm satisfied that normal behavior works. Ideas for testing certainly welcome.
Types of changes
Checklist: