Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions mtda/storage/writer.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,8 +93,9 @@ def flush(self, size):
self._receiving = False
self._size = size

self.mtda.debug(2, "storage.writer.flush(): waiting on thread...")
self._thread.join()
if self._thread is not None:
self.mtda.debug(2, "storage.writer.flush(): waiting on thread...")
self._thread.join()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just shortens the windows of the race condition, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should shorten and also ensure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ensure? We still have a TOCTOU issue here if I understand the underlying problem correctly. How can it be, that the self._thread object does not exist anymore?

Copy link
Author

@shikl3x7 shikl3x7 Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue that we observed did not have consistent reproducibility, whenever we tried the storage write as bellow, we used to get the error, but not always.

gourav@gourav:~/images/lx1/example$ mtda-cli -r 134.86.254.94 storage write isar-image-installer-lx1-x86-uefi.wic
Discovered bmap file 'isar-image-installer-lx1-x86-uefi.wic.bmap'
isar-image-installer-lx1-x86-uefi.wic: [####################] 100% (650 MiB read, 5.70 GiB written, 36 KiB/s)

'storage write' failed! ('NoneType' object has no attribute 'join')

And the issue was intermittent and no corruption on the image was observed.

I had seen the stop method, saw that it uses similar logic to ensure the thread is not none and then join.
So, I used the term ensure. And shortens because it also reduces the time window if join method are called simultaneously.

I am not sure if this is a TOCTOU issue, because we have the while loop which makes sure the writing is complete, so thread is not under usage, so at the moment I have some suspects that, somewhere stop is being called, because only that has capability to set to none type, but not sure of why or how.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some suspects that, somewhere stop is being called, because only that has capability to set to none type, but not sure of why or how.

You could print a traceback whenever stop is called to debug this. Anyways, I approved the MR as it solves the issue for us. Once the root cause is found, we still can revert the patch.

result = not self._failed

self.mtda.debug(3, f"storage.writer.flush(): {result}")
Expand Down
Loading