Skip to content

fix rootfs propagation mode to shared / unbindable #4724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

saku3
Copy link

@saku3 saku3 commented Apr 13, 2025

This PR adds support for applying mount propagation settings (MS_SHARED or MS_UNBINDABLE) to the container root based on the value of config.RootPropagation.
We apply mount propagation after executing pivot_root and rootfsParentMountPrivate

Fixes #1755

Related:
#1815
youki-dev/youki#3141

Signed-off-by: Yusuke Sakurai yusuke.sakurai@3-shake.com

# Set rootfsPropagation to shared
update_config ' .linux.rootfsPropagation = "shared" '

update_config ' .process.args = ["sh", "-c", "findmnt --noheadings -o PROPAGATION /"] '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to wrap findmnt in sh

}

@test "runc run [rootfsPropagation shared]" {
# Set rootfsPropagation to shared
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is obvious from the code and does not need to be explained in a code comment line

# Run the container and capture the output
runc run test_shared_rootfs
[ "$status" -eq 0 ]
[[ "$output" == *"shared"* ]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[[ "$output" == *"shared"* ]]
[ "$output" == "shared" ]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strictly speaking this should be

[ "$output" = "shared" ]

(i.e. single =), since test (aka [) only support = for string comparison (unlike [[ which supports ==).

@@ -232,6 +236,27 @@ func prepareRootfs(pipe *syncSocket, iConfig *initConfig) (err error) {
return nil
}

// adjustRootMountPropagation applies mount propagation flags such as MS_SHARED or MS_UNBINDABLE
// to the root mount after the pivot_root or chroot has taken place.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a comment to explain why this cannot be done in prepareRoot:

func prepareRoot(config *configs.Config) error {
flag := unix.MS_SLAVE | unix.MS_REC
if config.RootPropagation != 0 {
flag = config.RootPropagation
}
if err := mount("", "/", "", uintptr(flag), ""); err != nil {
return err
}
if err := rootfsParentMountPrivate(config.Rootfs); err != nil {
return err
}
return mount(config.Rootfs, config.Rootfs, "bind", unix.MS_BIND|unix.MS_REC, "")
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a comment.

@rata
Copy link
Member

rata commented Apr 16, 2025

This issue has a lot of history, I wonder if @cyphar or @kolyshkin would like to take a look

@AkihiroSuda
Copy link
Member

Please squash and sign off the commits

@saku3 saku3 force-pushed the fix-rootfspropagation branch 3 times, most recently from 24745a2 to 897ae54 Compare April 18, 2025 03:53
@saku3 saku3 force-pushed the fix-rootfspropagation branch from 897ae54 to bb317a0 Compare May 1, 2025 22:03
@rata
Copy link
Member

rata commented May 5, 2025

This sounds reasonable to me. Do we know what crun is doing?

Also, @cyphar , do you want to take a look?

@saku3
Copy link
Author

saku3 commented May 9, 2025

The behavior of shared mount propagation, as it relates to both crun and runc, can be demonstrated using the following steps, as outlined in the reference below.
opencontainers/runc#1815

Runc

$ podman info --format {{.Host.OCIRuntime.Version}}
runc version 1.3.0-rc.1+dev  
commit: bb317a03 
spec: 1.2.1  
go: go1.23.0  
libseccomp: 2.5.5

$ mkdir ~/hoge && mount -t tmpfs tmpfs ~/hoge
$ podman run -itd --privileged --name testcon --volume ~/hoge:/hoge:z,shared fedora-minimal:42 /bin/bash
babefd2d44a9d237a236482ddf593b578e8d56225c3f42220c582e484a779620
$ cat /var/lib/containers/storage/overlay-containers/babefd2d44a9d237a236482ddf593b578e8d56225c3f42220c582e484a779620/userdata/config.json | jq .linux.rootfsPropagation
"shared"
$ podman exec testcon mkdir /test
$ podman exec testcon mount -t tmpfs tmpfs /test
$ podman exec testcon findmnt -o "TARGET,PROPAGATION"
TARGET               PROPAGATION  
/                    shared  
|-/test              shared  
|-/hoge              shared  
|-/proc              private  
|-/dev               private  
| |-/dev/console     private  
| |-/dev/pts         private  
| |-/dev/mqueue      private  
| `-/dev/shm         private  
|-/sys               private  
| `-/sys/fs/cgroup   private  
|-/etc/resolv.conf   private  
|-/etc/hosts         private  
|-/run/.containerenv private  
`-/etc/hostname      private

With crun

$ podman info --format {{.Host.OCIRuntime.Version}}
crun version 1.14.1  
commit: de537a7965bfbe9992e2cfae0baeb56a08128171  
rundir: /run/crun  
spec: 1.0.0  
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +WASM:wasmedge +YAJL

$ podman run -itd --privileged --name testcon --volume ~/hoge:/hoge:z,shared fedora-minimal:42 /bin/bash
90a94c7a21000c8a83389ae085c4846a505da2e2891ab25f5b0e325f79de1f62
$ cat /var/lib/containers/storage/overlay-containers/90a94c7a21000c8a83389ae085c4846a505da2e2891ab25f5b0e325f79de1f62/userdata/config.json | jq .linux.rootfsPropagation
"shared"
$ podman exec testcon mkdir /test
$ podman exec testcon mount -t tmpfs tmpfs /test
$ podman exec testcon findmnt -o "TARGET,PROPAGATION"
TARGET               PROPAGATION  
/                    shared  
|-/test              shared  
|-/hoge              shared  
|-/proc              private  
|-/dev               private  
| |-/dev/console     private  
| |-/dev/pts         private  
| |-/dev/mqueue      private  
| `-/dev/shm         private  
|-/sys               private  
| `-/sys/fs/cgroup   private  
|-/run/.containerenv private  
|-/etc/hostname      private  
|-/etc/resolv.conf   private  
`-/etc/hosts         private

As shown above, the result of findmnt is identical between crun and runc (with the changes from this PR), confirming consistent behavior regarding shared mount propagation.

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

return nil
}

if err := unix.Mount("", "/", "", flag, ""); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use rootPropagation directly?

@kolyshkin kolyshkin self-requested a review May 15, 2025 23:13
Comment on lines 218 to 229
if err := adjustRootMountPropagation(uintptr(config.RootPropagation)); err != nil {
return err
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that we can use config.RootPropagation directly, and not use a separate function.

Something like

// Apply root mount propagation flags.
if config.RootPropagation != 0 {
 	if err := mount("", "/", "", uintptr(flag), ""); err != nil { 
 		return fmt.Errorf("unable to apply root propagation flags: %w", err )
 	} 
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolyshkin @lifubang
Thank you for the review. I've fixed.

@saku3 saku3 force-pushed the fix-rootfspropagation branch from bb317a0 to 1b28a8a Compare May 16, 2025 23:33
// to the current root ("/"), and not to the old rootfs before it becomes "/". Applying the
// flag in prepareRoot would affect the host mount namespace if the container's
// root mount is shared.
if config.RootPropagation != 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we add a unix.MS_PRIVATE comparison, it will reduce a mount syscall.

if config.RootPropagation != 0 && config.RootPropagation & unix.MS_PRIVATE == 0 {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@saku3 saku3 force-pushed the fix-rootfspropagation branch from 1b28a8a to ecec73d Compare May 19, 2025 12:34
Signed-off-by: Yusuke Sakurai <yusuke.sakurai@3-shake.com>
@saku3 saku3 force-pushed the fix-rootfspropagation branch from ecec73d to 04be81b Compare May 19, 2025 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rootfsPropagation=shared does not work
5 participants