Skip to content

Deployment of m3fs failed on the RXE (Soft RDMA) network. #119

@Stephen-Pu

Description

@Stephen-Pu

I was able to successfully deploy a 3FS cluster using m3fs on nodes with an eRDMA network.
However, when deploying the 3FS cluster using m3fs over a regular Ethernet network, which I had already successfully configured Soft RDMA, and each individual node was able to communicate and operate properly using Soft RDMA (RXE).
The failure occurred at the final step—creating the client FUSE failed.
The same steps worked smoothly on the eRDMA network without any issues.
The specific error message is as follows:
root@ip-172-31-25-206:# ./m3fs cluster create -c ./cluster.yml
INFO[0000] Running task CreateFdbClusterTask
INFO[0000] Started fdb container 3fs-fdb successfully NODE=Hydra-Store-01 TASK=CreateFdbClusterTask
INFO[0000] Started fdb container 3fs-fdb successfully NODE=Hydra-Store-02 TASK=CreateFdbClusterTask
INFO[0000] Initializing fdb cluster NODE=Hydra-Store-01 TASK=CreateFdbClusterTask
INFO[0003] Waiting for fdb cluster initialized NODE=Hydra-Store-01 TASK=CreateFdbClusterTask
INFO[0006] Initialized fdb cluster NODE=Hydra-Store-01 TASK=CreateFdbClusterTask
INFO[0006] Running task CreateClickhouseClusterTask
INFO[0011] Started clickhouse container 3fs-clickhouse successfully NODE=Hydra-Store-01 TASK=CreateClickhouseClusterTask
INFO[0011] Initializing clickhouse cluster NODE=Hydra-Store-01 TASK=CreateClickhouseClusterTask
INFO[0011] Initialized clickhouse cluster NODE=Hydra-Store-01 TASK=CreateClickhouseClusterTask
INFO[0011] Running task CreateGrafanaServiceTask
INFO[0011] Generating datasource.yaml NODE=Hydra-Store-01 TASK=CreateGrafanaServiceTask
INFO[0011] Generating dashboard.yaml NODE=Hydra-Store-01 TASK=CreateGrafanaServiceTask
INFO[0011] Generating 3fs.json NODE=Hydra-Store-01 TASK=CreateGrafanaServiceTask
INFO[0011] Started grafana container 3fs-grafana successfully, service endpoint is http://172.31.25.206:3000, login with username "admin" and password "admin" NODE=Hydra-Store-01 TASK=CreateGrafanaServiceTask
INFO[0011] Running task CreateMonitorTask
INFO[0011] Started monitor container 3fs-monitor successfully NODE=Hydra-Store-01 TASK=CreateMonitorTask
INFO[0011] Running task CreateMgmtdServiceTask
INFO[0011] Create mgmtd_main config dir /opt/3fs/mgmtd NODE=Hydra-Store-01 TASK=CreateMgmtdServiceTask
INFO[0011] Generating mgmtd_main_app.toml to /tmp/prepare-3fs-config.7BA1vU/mgmtd_main_app.toml NODE=Hydra-Store-01 TASK=CreateMgmtdServiceTask
INFO[0011] Generating mgmtd_main_launcher.toml to /tmp/prepare-3fs-config.7BA1vU/mgmtd_main_launcher.toml NODE=Hydra-Store-01 TASK=CreateMgmtdServiceTask
INFO[0011] Generating mgmtd_main.toml to /tmp/prepare-3fs-config.7BA1vU/mgmtd_main.toml NODE=Hydra-Store-01 TASK=CreateMgmtdServiceTask
INFO[0011] Save admin cli config to /tmp/prepare-3fs-config.7BA1vU/admin_cli.toml NODE=Hydra-Store-01 TASK=CreateMgmtdServiceTask
INFO[0011] Generating fdb.cluster to /tmp/prepare-3fs-config.7BA1vU/fdb.cluster NODE=Hydra-Store-01 TASK=CreateMgmtdServiceTask
INFO[0011] Copying mgmtd_main configs from /tmp/prepare-3fs-config.7BA1vU to Hydra-Store-01 /opt/3fs/mgmtd/config.d NODE=Hydra-Store-01 TASK=CreateMgmtdServiceTask
INFO[0012] Cluster initialization success NODE=Hydra-Store-01 TASK=CreateMgmtdServiceTask
INFO[0012] Starting mgmtd_main container 3fs-mgmtd NODE=Hydra-Store-01 TASK=CreateMgmtdServiceTask
INFO[0012] Started mgmtd_main container 3fs-mgmtd successfully NODE=Hydra-Store-01 TASK=CreateMgmtdServiceTask
INFO[0012] Running task CreateMetaServiceTask
INFO[0012] Create meta_main config dir /opt/3fs/meta NODE=Hydra-Store-01 TASK=CreateMetaServiceTask
INFO[0012] Generating meta_main_app.toml to /tmp/prepare-3fs-config.WYQw05/meta_main_app.toml NODE=Hydra-Store-01 TASK=CreateMetaServiceTask
INFO[0012] Generating meta_main_launcher.toml to /tmp/prepare-3fs-config.WYQw05/meta_main_launcher.toml NODE=Hydra-Store-01 TASK=CreateMetaServiceTask
INFO[0012] Generating meta_main.toml to /tmp/prepare-3fs-config.WYQw05/meta_main.toml NODE=Hydra-Store-01 TASK=CreateMetaServiceTask
INFO[0012] Save admin cli config to /tmp/prepare-3fs-config.WYQw05/admin_cli.toml NODE=Hydra-Store-01 TASK=CreateMetaServiceTask
INFO[0012] Generating fdb.cluster to /tmp/prepare-3fs-config.WYQw05/fdb.cluster NODE=Hydra-Store-01 TASK=CreateMetaServiceTask
INFO[0012] Copying meta_main configs from /tmp/prepare-3fs-config.WYQw05 to Hydra-Store-01 /opt/3fs/meta/config.d NODE=Hydra-Store-01 TASK=CreateMetaServiceTask
INFO[0012] Create meta_main config dir /opt/3fs/meta NODE=Hydra-Store-02 TASK=CreateMetaServiceTask
INFO[0012] Generating meta_main_app.toml to /tmp/prepare-3fs-config.qlknub/meta_main_app.toml NODE=Hydra-Store-02 TASK=CreateMetaServiceTask
INFO[0012] Generating meta_main_launcher.toml to /tmp/prepare-3fs-config.qlknub/meta_main_launcher.toml NODE=Hydra-Store-02 TASK=CreateMetaServiceTask
INFO[0012] Generating meta_main.toml to /tmp/prepare-3fs-config.qlknub/meta_main.toml NODE=Hydra-Store-02 TASK=CreateMetaServiceTask
INFO[0012] Save admin cli config to /tmp/prepare-3fs-config.qlknub/admin_cli.toml NODE=Hydra-Store-02 TASK=CreateMetaServiceTask
INFO[0012] Generating fdb.cluster to /tmp/prepare-3fs-config.qlknub/fdb.cluster NODE=Hydra-Store-02 TASK=CreateMetaServiceTask
INFO[0012] Copying meta_main configs from /tmp/prepare-3fs-config.qlknub to Hydra-Store-02 /opt/3fs/meta/config.d NODE=Hydra-Store-02 TASK=CreateMetaServiceTask
INFO[0012] Upload meta_main main config NODE=Hydra-Store-01 TASK=CreateMetaServiceTask
INFO[0013] Service meta_main main config uploaded NODE=Hydra-Store-01 TASK=CreateMetaServiceTask
INFO[0013] Starting meta_main container 3fs-meta NODE=Hydra-Store-01 TASK=CreateMetaServiceTask
INFO[0013] Starting meta_main container 3fs-meta NODE=Hydra-Store-02 TASK=CreateMetaServiceTask
INFO[0013] Started meta_main container 3fs-meta successfully NODE=Hydra-Store-01 TASK=CreateMetaServiceTask
INFO[0013] Started meta_main container 3fs-meta successfully NODE=Hydra-Store-02 TASK=CreateMetaServiceTask
INFO[0013] Running task CreateStorageServiceTask
INFO[0013] Start to run script disk_tool.sh on node NODE=Hydra-Store-01 TASK=CreateStorageServiceTask
INFO[0013] Scp /tmp/remote-run-script.K64dgD/tmp_script.sh to /tmp/tmp.satNi8ALwV NODE=Hydra-Store-01 TASK=CreateStorageServiceTask
INFO[0013] Run disk_tool.sh with [/opt/3fs/storage 8 nvme prepare] NODE=Hydra-Store-01 TASK=CreateStorageServiceTask
INFO[0013] Start to run script disk_tool.sh on node NODE=Hydra-Store-02 TASK=CreateStorageServiceTask
INFO[0013] Scp /tmp/remote-run-script.eHL38r/tmp_script.sh to /tmp/tmp.64YpZvMGAs NODE=Hydra-Store-02 TASK=CreateStorageServiceTask
INFO[0013] Run disk_tool.sh with [/opt/3fs/storage 8 nvme prepare] NODE=Hydra-Store-02 TASK=CreateStorageServiceTask
INFO[0013] Run disk_tool.sh success NODE=Hydra-Store-01 TASK=CreateStorageServiceTask
INFO[0013] Run disk_tool.sh success NODE=Hydra-Store-02 TASK=CreateStorageServiceTask
INFO[0013] Create storage_main config dir /opt/3fs/storage NODE=Hydra-Store-01 TASK=CreateStorageServiceTask
INFO[0013] Generating storage_main_app.toml to /tmp/prepare-3fs-config.Rfexmx/storage_main_app.toml NODE=Hydra-Store-01 TASK=CreateStorageServiceTask
INFO[0013] Generating storage_main_launcher.toml to /tmp/prepare-3fs-config.Rfexmx/storage_main_launcher.toml NODE=Hydra-Store-01 TASK=CreateStorageServiceTask
INFO[0013] Generating storage_main.toml to /tmp/prepare-3fs-config.Rfexmx/storage_main.toml NODE=Hydra-Store-01 TASK=CreateStorageServiceTask
INFO[0013] Save admin cli config to /tmp/prepare-3fs-config.Rfexmx/admin_cli.toml NODE=Hydra-Store-01 TASK=CreateStorageServiceTask
INFO[0013] Generating fdb.cluster to /tmp/prepare-3fs-config.Rfexmx/fdb.cluster NODE=Hydra-Store-01 TASK=CreateStorageServiceTask
INFO[0013] Copying storage_main configs from /tmp/prepare-3fs-config.Rfexmx to Hydra-Store-01 /opt/3fs/storage/config.d NODE=Hydra-Store-01 TASK=CreateStorageServiceTask
INFO[0014] Create storage_main config dir /opt/3fs/storage NODE=Hydra-Store-02 TASK=CreateStorageServiceTask
INFO[0014] Generating storage_main_app.toml to /tmp/prepare-3fs-config.7QUoMz/storage_main_app.toml NODE=Hydra-Store-02 TASK=CreateStorageServiceTask
INFO[0014] Generating storage_main_launcher.toml to /tmp/prepare-3fs-config.7QUoMz/storage_main_launcher.toml NODE=Hydra-Store-02 TASK=CreateStorageServiceTask
INFO[0014] Generating storage_main.toml to /tmp/prepare-3fs-config.7QUoMz/storage_main.toml NODE=Hydra-Store-02 TASK=CreateStorageServiceTask
INFO[0014] Save admin cli config to /tmp/prepare-3fs-config.7QUoMz/admin_cli.toml NODE=Hydra-Store-02 TASK=CreateStorageServiceTask
INFO[0014] Generating fdb.cluster to /tmp/prepare-3fs-config.7QUoMz/fdb.cluster NODE=Hydra-Store-02 TASK=CreateStorageServiceTask
INFO[0014] Copying storage_main configs from /tmp/prepare-3fs-config.7QUoMz to Hydra-Store-02 /opt/3fs/storage/config.d NODE=Hydra-Store-02 TASK=CreateStorageServiceTask
INFO[0014] Upload storage_main main config NODE=Hydra-Store-01 TASK=CreateStorageServiceTask
INFO[0014] Service storage_main main config uploaded NODE=Hydra-Store-01 TASK=CreateStorageServiceTask
INFO[0014] Starting storage_main container 3fs-storage NODE=Hydra-Store-01 TASK=CreateStorageServiceTask
INFO[0014] Starting storage_main container 3fs-storage NODE=Hydra-Store-02 TASK=CreateStorageServiceTask
INFO[0014] Started storage_main container 3fs-storage successfully NODE=Hydra-Store-01 TASK=CreateStorageServiceTask
INFO[0014] Started storage_main container 3fs-storage successfully NODE=Hydra-Store-02 TASK=CreateStorageServiceTask
INFO[0014] Running task InitUserAndChainTask
INFO[0021] Running task Create3FSClientServiceTask
INFO[0021] Create hf3fs_fuse_main config dir /opt/3fs/client NODE=Hydra-Client-01 TASK=Create3FSClientServiceTask
INFO[0021] Generating hf3fs_fuse_main_app.toml to /tmp/prepare-3fs-config.poRqjK/hf3fs_fuse_main_app.toml NODE=Hydra-Client-01 TASK=Create3FSClientServiceTask
INFO[0021] Generating hf3fs_fuse_main_launcher.toml to /tmp/prepare-3fs-config.poRqjK/hf3fs_fuse_main_launcher.toml NODE=Hydra-Client-01 TASK=Create3FSClientServiceTask
INFO[0021] Generating hf3fs_fuse_main.toml to /tmp/prepare-3fs-config.poRqjK/hf3fs_fuse_main.toml NODE=Hydra-Client-01 TASK=Create3FSClientServiceTask
INFO[0021] Save admin cli config to /tmp/prepare-3fs-config.poRqjK/admin_cli.toml NODE=Hydra-Client-01 TASK=Create3FSClientServiceTask
INFO[0021] Save token.txt to /tmp/prepare-3fs-config.poRqjK/token.txt NODE=Hydra-Client-01 TASK=Create3FSClientServiceTask
INFO[0021] Generating fdb.cluster to /tmp/prepare-3fs-config.poRqjK/fdb.cluster NODE=Hydra-Client-01 TASK=Create3FSClientServiceTask
INFO[0021] Copying hf3fs_fuse_main configs from /tmp/prepare-3fs-config.poRqjK to Hydra-Client-01 /opt/3fs/client/config.d NODE=Hydra-Client-01 TASK=Create3FSClientServiceTask
INFO[0022] Upload hf3fs_fuse_main main config NODE=Hydra-Client-01 TASK=Create3FSClientServiceTask
2025/06/05 00:07:12 create cluster: run task Create3FSClientServiceTask: sudo run cmd [cmd: docker run --name 3fs-client --network host --entrypoint '' --rm --privileged --ulimit nofile=1048576:1048576 --volume /opt/3fs/client/config.d:/opt/3fs/etc/ --volume /dev:/dev --volume /opt/3fs/bin/ibdev2netdev:/usr/sbin/ibdev2netdev open3fs/3fs:20250410 /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://172.31.25.206:8000"]' 'set-config --type FUSE --file /opt/3fs/etc/hf3fs_fuse_main.toml']: run sudo docker run --name 3fs-client --network host --entrypoint '' --rm --privileged --ulimit nofile=1048576:1048576 --volume /opt/3fs/client/config.d:/opt/3fs/etc/ --volume /dev:/dev --volume /opt/3fs/bin/ibdev2netdev:/usr/sbin/ibdev2netdev open3fs/3fs:20250410 /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://172.31.25.206:8000"]' 'set-config --type FUSE --file /opt/3fs/etc/hf3fs_fuse_main.toml' failed: Process exited with status 1
root@ip-172-31-25-206:
#

However, I manually run the last failed docker command and succeeded.
see below lines:
oot@ip-172-31-25-206:# sudo docker run --name 3fs-client --network host --entrypoint '' --rm --privileged --ulimit nofile=1048576:1048576 --volume /opt/3fs/client/config.d:/opt/3fs/etc/ --volume /dev:/dev --volume /opt/3fs/bin/ibdev2netdev:/usr/sbin/ibdev2netdev open3fs/3fs:20250410 /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://172.31.25.206:8000"]' 'set-config --type FUSE --file /opt/3fs/etc/hf3fs_fuse_main.toml'
Succeed
ConfigVersion 1
root@ip-172-31-25-206:
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a70352a525e3 open3fs/3fs:20250410 "/opt/3fs/bin/storag…" About a minute ago Up About a minute 3fs-storage
8a2baf350e22 open3fs/3fs:20250410 "/opt/3fs/bin/meta_m…" About a minute ago Up About a minute 3fs-meta
a0dc97d70b0f open3fs/3fs:20250410 "/opt/3fs/bin/mgmtd_…" About a minute ago Up About a minute 3fs-mgmtd
c83febf1e660 open3fs/3fs:20250410 "/opt/3fs/bin/monito…" About a minute ago Up About a minute 3fs-monitor
6f451cb7ae34 open3fs/grafana:12.0.0 "/run.sh" About a minute ago Up About a minute 3fs-grafana
aa9deb5f4a71 open3fs/clickhouse:25.1-jammy "/entrypoint.sh" About a minute ago Up About a minute 3fs-clickhouse
e1d802477515 open3fs/foundationdb:7.3.63 "/usr/bin/tini -g --…" About a minute ago Up About a minute 3fs-fdb

=====================================
But client node dosen't have any mounting fuse.

root@ip-172-31-11-231:# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
root@ip-172-31-11-231:
# mount
/dev/nvme0n1p1 on / type ext4 (rw,relatime,discard,errors=remount-ro)
devtmpfs on /dev type devtmpfs (rw,nosuid,noexec,relatime,size=97424244k,nr_inodes=24356061,mode=755,inode64)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,inode64)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,size=38973068k,nr_inodes=819200,mode=755,inode64)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k,inode64)
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)
bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=29,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=28674)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)
ramfs on /run/credentials/systemd-sysusers.service type ramfs (ro,nosuid,nodev,noexec,relatime,mode=700)
/var/lib/snapd/snaps/amazon-ssm-agent_9881.snap on /snap/amazon-ssm-agent/9881 type squashfs (ro,nodev,relatime,errors=continue,threads=single,x-gdu.hide)
/var/lib/snapd/snaps/core20_2496.snap on /snap/core20/2496 type squashfs (ro,nodev,relatime,errors=continue,threads=single,x-gdu.hide)
/var/lib/snapd/snaps/core22_1748.snap on /snap/core22/1748 type squashfs (ro,nodev,relatime,errors=continue,threads=single,x-gdu.hide)
/var/lib/snapd/snaps/lxd_31333.snap on /snap/lxd/31333 type squashfs (ro,nodev,relatime,errors=continue,threads=single,x-gdu.hide)
/var/lib/snapd/snaps/snapd_23545.snap on /snap/snapd/23545 type squashfs (ro,nodev,relatime,errors=continue,threads=single,x-gdu.hide)
/dev/nvme0n1p15 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /run/snapd/ns type tmpfs (rw,nosuid,nodev,size=38973068k,nr_inodes=819200,mode=755,inode64)
nsfs on /run/snapd/ns/lxd.mnt type nsfs (rw)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=19486532k,nr_inodes=4871633,mode=700,uid=1000,gid=1000,inode64)
/var/lib/snapd/snaps/snapd_24505.snap on /snap/snapd/24505 type squashfs (ro,nodev,relatime,errors=continue,threads=single,x-gdu.hide)
/var/lib/snapd/snaps/core20_2582.snap on /snap/core20/2582 type squashfs (ro,nodev,relatime,errors=continue,threads=single,x-gdu.hide)
/var/lib/snapd/snaps/core22_1981.snap on /snap/core22/1981 type squashfs (ro,nodev,relatime,errors=continue,threads=single,x-gdu.hide)
nsfs on /run/docker/netns/default type nsfs (rw)
root@ip-172-31-11-231:# docker ps -a | grep 3fs
root@ip-172-31-11-231:
# mount | grep /mnt/3fs
root@ip-172-31-11-231:~#

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions