fix: synchronize with udev via BSD flock after losetup to avoid race condition by Copilot · Pull Request #330 · canonical/imagecraft

Copilot · 2026-04-21T22:07:34Z

After losetup --find --show --partscan, udev holds an exclusive BSD flock on the whole-disk device while processing it. Mounting immediately after attach can fail with special device /dev/loopNpM does not exist because udev's inotify watch hasn't finished. The correct fix per systemd BLOCK_DEVICE_LOCKING is to hold a shared flock on the loop device while performing partition operations.

Changes

imagecraft/pack/image.py
- Image.attach_loopdev() now opens the loop device and acquires fcntl.LOCK_SH after attaching — this blocks until udev releases its exclusive lock, then holds the shared lock for the full duration of the context manager so udev cannot interfere with partition operations.
imagecraft/services/image.py
- ImageService.attach_images() opens each loop device and acquires LOCK_SH after attaching (for both fresh attaches and reused devices). File handles are stored in _loop_fds and closed (releasing the lock) in detach_images(). The fd is closed on flock failure to prevent resource leaks.
Tests
- Removed polling-based tests; tests now mock fcntl.flock and builtins.open
- Added test_attach_images_flock_released_on_detach to verify the shared lock is released when images are detached

# After losetup --find --show --partscan returns /dev/loop7:
with open("/dev/loop7", "rb") as loop_fd:
    fcntl.flock(loop_fd, fcntl.LOCK_SH)  # blocks until udev is done
    # partition nodes are guaranteed to exist; lock held while we use them
    yield "/dev/loop7"

… condition After `losetup --find --show --partscan`, the kernel creates partition device nodes (e.g. /dev/loop7p1, /dev/loop7p2) asynchronously via udev. If code tries to mount a partition immediately after attaching, it may fail with "special device /dev/loopNpM does not exist". Fix this by adding `_wait_for_loopdev_partitions()` which: 1. Calls `udevadm settle` to flush all pending udev events 2. Polls for each expected partition device node with a configurable timeout 3. Raises ImageError if nodes don't appear within the timeout The wait is called from: - `Image.attach_loopdev()` in imagecraft/pack/image.py - `ImageService.attach_images()` in imagecraft/services/image.py (fresh attaches only) Adds unit tests for the new helper function and the integration points. Agent-Logs-Url: https://github.com/canonical/imagecraft/sessions/dfb57e1c-b428-4e70-b2b1-83a7428353fc Co-authored-by: lengau <4305943+lengau@users.noreply.github.com>

…rtition number helper - Replace broad `except Exception` with `except (subprocess.CalledProcessError, OSError)` to be precise about what failures are expected from udevadm settle - Extract duplicated partition-number calculation into `_volume_partition_nums()` helper function and import it into services/image.py to avoid code duplication Agent-Logs-Url: https://github.com/canonical/imagecraft/sessions/dfb57e1c-b428-4e70-b2b1-83a7428353fc Co-authored-by: lengau <4305943+lengau@users.noreply.github.com>

mwhudson · 2026-04-22T02:14:59Z

This misattributes the reason for the issue and thus is not the correct fix. It's not a race with udev triggers, it's a race with udev's inotify watch. As explained in https://systemd.io/BLOCK_DEVICE_LOCKING/ the correct fix is to flock th device containing the partition while doing anything (such as mounting) the partition itself.

lengau · 2026-04-22T04:31:56Z

            emit.debug(
                f"Attached image {self.disk_path} as loop device {self.loop_device}"
            )
+            _wait_for_loopdev_partitions(


@copilot We should wait for the partitions before logging that we've attached the device.

Fixed in f7b72b2 — the wait is now called before emit.debug(...) so the log only fires once all partition nodes are confirmed to exist.

lengau · 2026-04-22T05:29:53Z

+    ]
+
+
+def _wait_for_loopdev_partitions(


@copilot since this is being used in another file it can't be private.

Fixed in f7b72b2 — renamed both _wait_for_loopdev_partitions → wait_for_loopdev_partitions and _volume_partition_nums → volume_partition_nums and updated all call sites.

…ug log - Rename _wait_for_loopdev_partitions -> wait_for_loopdev_partitions - Rename _volume_partition_nums -> volume_partition_nums (both are imported by services/image.py so they can't be private) - In Image.attach_loopdev(), call wait_for_loopdev_partitions() before emit.debug() so the log only fires once the partitions are actually ready - Update all references in services/image.py and both test files Agent-Logs-Url: https://github.com/canonical/imagecraft/sessions/065c4f4e-7f04-47a7-9ac7-f5d74ae39dfc Co-authored-by: lengau <4305943+lengau@users.noreply.github.com>

lengau · 2026-04-22T05:34:03Z

@copilot please address mwhudson's feedback too

Per systemd BLOCK_DEVICE_LOCKING, the correct fix for the partition-node race is to hold a shared BSD flock on the loop device while using its partitions. udev holds an exclusive lock while processing; a shared lock blocks until udev is done and then keeps it held during our operations. - imagecraft/pack/image.py: - Remove volume_partition_nums() and wait_for_loopdev_partitions() - Remove _UDEVADM_BIN, _PARTITION_WAIT_TIMEOUT, time, subprocess imports - In Image.attach_loopdev(): open loop device and acquire LOCK_SH for the duration of the context manager - imagecraft/services/image.py: - Remove import of removed helpers - Add _loop_fds list to track open file handles holding flocks - In attach_images(): open device and acquire LOCK_SH after attaching (both fresh attach and reuse); store fd in _loop_fds - In detach_images(): close all fds (releasing flocks) before detaching - tests: remove TestWaitForLoopdevPartitions; update all tests to mock fcntl.flock and builtins.open; add test for flock release on detach Agent-Logs-Url: https://github.com/canonical/imagecraft/sessions/1d13f3fb-01b2-4424-b470-cbc2611b70e0 Co-authored-by: lengau <4305943+lengau@users.noreply.github.com>

- Wrap fcntl.flock in try/except in attach_images() to close the fd if flock raises, preventing a resource leak - Move inline `import fcntl` statements to module level in both test files Agent-Logs-Url: https://github.com/canonical/imagecraft/sessions/1d13f3fb-01b2-4424-b470-cbc2611b70e0 Co-authored-by: lengau <4305943+lengau@users.noreply.github.com>

Copilot · 2026-04-22T05:45:53Z

@copilot please address mwhudson's feedback too

Done in 3814238. Replaced the udevadm settle + polling approach with proper BSD flock synchronization as described in https://systemd.io/BLOCK_DEVICE_LOCKING/:

udev holds an exclusive flock on the whole-disk device while processing it; acquiring a shared LOCK_SH blocks until udev is done, then holds it while we use the partitions
Image.attach_loopdev(): opens the loop device and holds LOCK_SH for the duration of the context manager
ImageService.attach_images(): same pattern for every attached device (fresh and reuse); fds stored in _loop_fds and closed (releasing the lock) in detach_images()
Removed wait_for_loopdev_partitions(), volume_partition_nums(), udevadm call, and all polling logic

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

systemd.io
- Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

mwhudson · 2026-04-22T06:03:02Z

+            # exclusive lock while it processes the new device; taking a shared
+            # lock here blocks until udev is done and then holds it for the
+            # duration of the context so udev does not interfere with our use.
+            with open(self.loop_device, "rb") as loop_fd:


@copilot i think this is going to hold the lock for longer than we want and block udev from running on the device at all, which might not be what is wanted. In my experience it has been better to lock the device only while the partition is being mounted.

mwhudson · 2026-04-22T06:06:31Z

+            # exclusive lock while it processes the device; a shared lock here
+            # blocks until udev is done and then holds it while we use the
+            # device's partitions, preventing udev from interfering.
+            loop_fd = open(attached_device, "rb")  # noqa: SIM115 (held deliberately)


@copilot again i think this holds the lock for too long. we should only hold the lock while mounting or formatting the partition

Initial plan

e4cfd35

Copilot AI assigned Copilot and lengau Apr 21, 2026

Copilot started work on behalf of lengau April 21, 2026 22:07 View session

Copilot AI linked an issue Apr 21, 2026 that may be closed by this pull request

Race condition when mounting loop devices #300

Open

1 task

Copilot AI and others added 2 commits April 21, 2026 22:17

Copilot AI changed the title ~~[WIP] Fix race condition when mounting loop devices~~ fix: wait for loop device partition nodes after losetup to avoid race condition Apr 21, 2026

Copilot finished work on behalf of lengau April 21, 2026 22:22

Copilot AI requested a review from lengau April 21, 2026 22:22

lengau requested changes Apr 22, 2026

View reviewed changes

Copilot started work on behalf of lengau April 22, 2026 05:30 View session

Copilot finished work on behalf of lengau April 22, 2026 05:32

Copilot AI requested a review from lengau April 22, 2026 05:32

Copilot started work on behalf of lengau April 22, 2026 05:34 View session

Copilot AI and others added 2 commits April 22, 2026 05:43

Copilot AI changed the title ~~fix: wait for loop device partition nodes after losetup to avoid race condition~~ fix: synchronize with udev via BSD flock after losetup to avoid race condition Apr 22, 2026

Copilot finished work on behalf of lengau April 22, 2026 05:47

mwhudson reviewed Apr 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: synchronize with udev via BSD flock after losetup to avoid race condition#330

fix: synchronize with udev via BSD flock after losetup to avoid race condition#330
Copilot wants to merge 6 commits intomainfrom
copilot/fix-race-condition-mounting-loop-devices

Copilot AI commented Apr 21, 2026 •

edited

Loading

Uh oh!

mwhudson commented Apr 22, 2026

Uh oh!

lengau Apr 22, 2026

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

lengau Apr 22, 2026

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

lengau commented Apr 22, 2026

Uh oh!

Copilot AI commented Apr 22, 2026 •

edited

Loading

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

mwhudson Apr 22, 2026

Uh oh!

mwhudson Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		]


		def _wait_for_loopdev_partitions(

Conversation

Copilot AI commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

mwhudson commented Apr 22, 2026

Uh oh!

lengau Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

lengau Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

lengau commented Apr 22, 2026

Uh oh!

Copilot AI commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

mwhudson Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

mwhudson Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Apr 21, 2026 •

edited

Loading

Copilot AI commented Apr 22, 2026 •

edited

Loading