FL Alliance is a decentralized federated learning protocol where multiple participants collaboratively train a shared model — without ever exposing their private data. Participants stake tokens, train on local datasets, and are rewarded or slashed based on the quality of their contributions, all enforced by on-chain smart contracts.
This repository is the production client for FL Alliance. It handles the full lifecycle — staking, model download, local training, parameter upload, voting, aggregation, and reward claiming — so you can participate with a single command.
- Four operating modes — on-chain testnet, local chain dev, fully offline, and chainless pure FL
- Two runtime backends — Docker container or local Python process; repository defaults to
runtime.mode=docker, OCM addon deployments typically override tolocal(direct client, no FLocKit sidecar) - Seal encryption — optional end-to-end encryption of model parameters via Mysten Labs' Seal
- LAN-ready — run multi-client simulations on a single machine or across a local network
- Cross-platform — Linux, macOS (Apple Silicon / Intel), and Windows 11 + WSL2; see Runtime Modes for the support matrix
- Concurrent backend runs — the FastAPI backend supports multiple simultaneous client runs with per-run port and environment isolation
- Structured logging — rotating file logs with full source tracing for production debugging; subprocess (FLockit) errors and warnings are forwarded to the parent log in real time so model-side failures surface without a manual
tail -f - Operator-friendly failure modes — SIGHUP / SIGTERM / SIGINT are caught and logged before exit (no more silent deaths from SSH disconnects), and every long wait emits an INFO heartbeat naming the polled URL and the subprocess log path
| Requirement | When needed |
|---|---|
| Python >= 3.11 | Always |
| uv (recommended) or pip | Always |
| Docker | Only for docker runtime mode (default) |
| $FLOCK tokens (get whitelisted (TBD)) | Online mode only |
| Base Sepolia ETH (Alchemy Faucet) | Online mode only |
Local simulation and pure FL modes require no tokens, no ETH, and no internet (after initial dependency install).
The default quick start is the online testnet (on-chain) flow.
git clone https://github.com/FLock-io/FL-Alliance-Client.git
cd FL-Alliance-Client
# Using uv (recommended)
uv sync
# Or using pip
pip install -r requirements.txtuv sync is the recommended path for the repository-managed environment.
requirements.txt is kept as a compatibility install path and may pull a
heavier dependency set for model/runtime workflows.
cp .env.onchain.example .env
# Edit .env and set:
# PRIVATE_KEY=<your wallet private key>
# BLOCKCHAIN_RPC=<Base Sepolia RPC URL> # WEB3_RPC_URL is also supported
# TOKEN_ADDRESS=<FlockToken address>
# TASK_ADDRESS=<FlockTask address>
# Optional but recommended on testnet/mainnet:
# EXPECTED_CHAIN_ID=84532 # Base Sepolia; refuses to start on mismatch
# BLOCKCHAIN_TX_RECEIPT_TIMEOUT=180 # Seconds for tx receipt (default 120)
# FLOCK_CONTRACTS_FILE=/path/to/contracts.json # Highest-priority contracts source
# Optional:
# HF_TOKEN=<token for gated models>
# Optional — bump for LLM tasks on a cold cache (HF download + venv install):
# PROCESS_STARTUP_TIMEOUT=7200 # Seconds before model startup is declared failed (default 1800)
# PROCESS_RESPONSE_TIMEOUT=7200 # Seconds per train / evaluate / aggregate SDK call (default 3600)python main.py -c config/conf.yaml \
--task-address <TASK_ADDRESS> \
--dataset <DATASET_PATH> \
--hf-token <HF_TOKEN> \
--gpuUse a custom mounted env file (for example in Kubernetes):
python main.py -c config/conf.yaml --env-file /data/.envExample:
python main.py -c config/conf.yaml \
--task-address 0x47B0397C6ae306002788D093b29bcD2EDAd19924 \
--dataset data/asr_sarawakmalay_whisper_format_client_ids.json \
--hf-token $HF_TOKEN \
--gpuLong-running training: wrap the command in
tmux/nohup/systemdso the client survives SSH disconnects. The client now installs SIGHUP / SIGTERM handlers that log the signal name before exit, butSIGKILL(OOM-killer) still terminates the process silently — a session manager is the only reliable defence.
# Use a different PRIVATE_KEY and runtime.port per process
python main.py -c config/conf.yaml \
--task-address <TASK_ADDRESS> \
--dataset <DATASET_PATH> \
--hf-token <HF_TOKEN> \
--gpu \
--override runtime.port=<UNIQUE_PORT>That's it. You are now running on Base Sepolia with incentive-enabled FL Alliance flow.
Recommended: publish both latest and a git-SHA tag, then deploy the SHA tag from flock-addon.
Recommended setup:
export IMAGE_SHA=$(git rev-parse --short=12 HEAD)Build locally:
make image-build IMAGE_OWNER=ray-ruisun IMAGE_TAG=latest IMAGE_IMMUTABLE_TAG="$IMAGE_SHA"Inspect the local image:
make image-inspect IMAGE_OWNER=ray-ruisun IMAGE_TAG=latest IMAGE_IMMUTABLE_TAG="$IMAGE_SHA"Push manually:
make image-login GHCR_USER="$GHCR_USER" GHCR_PAT="$GHCR_PAT"
make image-push IMAGE_OWNER=ray-ruisun IMAGE_TAG=latest IMAGE_IMMUTABLE_TAG="$IMAGE_SHA"One command publish flow:
make image-publish \
IMAGE_OWNER=ray-ruisun \
IMAGE_TAG=latest \
IMAGE_IMMUTABLE_TAG="$IMAGE_SHA" \
GHCR_USER="$GHCR_USER" \
GHCR_PAT="$GHCR_PAT"If Docker on your machine requires sudo, use:
make image-publish \
DOCKER='sudo docker' \
IMAGE_OWNER=ray-ruisun \
IMAGE_TAG=latest \
IMAGE_IMMUTABLE_TAG="$IMAGE_SHA" \
GHCR_USER="$GHCR_USER" \
GHCR_PAT="$GHCR_PAT"Print the exact published tags:
make image-print IMAGE_OWNER=ray-ruisun IMAGE_TAG=latest IMAGE_IMMUTABLE_TAG="$IMAGE_SHA"Recommended handoff to flock-addon:
export IMAGE_TAG=$(git rev-parse --short=12 HEAD)Automatic publishing:
- GitHub Actions now publishes to
ghcr.io/<repository-owner-lowercase>/fl-alliance-client - pushes on
mainand version tags such asv0.1.0will publish automatically workflow_dispatchcan also publish on demand
Dataset format:
DATASETaccepts a single file or a directory.main.pyfirst stages every input into a temporary directory by copying (shutil.copytree/shutil.copy2); the runtime backend then exposes that staging directory to the model — Docker via a read-only bind mount at/app/data, andruntime.mode=localvia a symlink (falling back to an NTFS junction or a full copy on Windows). See Configuration and Runtime Modes for details.
Prefer local simulation first? Use offline mode:
cp .env.local.example .env make chain MODEL_DEFINITION_HASH=$(sha256sum model.tar.gz | cut -d' ' -f1) make sim1 DATASET=data/train.jsonlOn macOS, replace
sha256sumwith:shasum -a 256 model.tar.gz | cut -d' ' -f1
For all scenarios (testnet, dev mode, offline mode, pure FL, and LAN deployment), see the Run Playbook.
| Mode | Chain | Storage | Internet | Config | Command |
|---|---|---|---|---|---|
| Online (testnet) | Base Sepolia | S3 | Required | config/conf.yaml |
python main.py -c config/conf.yaml ... |
| Dev (local chain + object storage) | Local Anvil | S3 Signer / direct S3-compatible + HuggingFace | Required | config/simulation-online.yaml |
make dev1 |
| Offline (fully local) | Local Anvil | Local filesystem | Not needed | config/simulation.yaml |
make sim1 |
| Pure FL (chainless) | None | Local filesystem | Not needed | config/pure-fl.yaml |
make pure-fl1 |
All modes use the same client code — only the configuration differs. Each mode has a dedicated YAML config template and corresponding Makefile targets (dev/sim: up to 20 clients, pure-fl: 3 clients by default).
Choosing a mode:
- Just exploring? Start with Offline mode (
make sim1) — zero external dependencies. - Developing with real storage? Use Dev mode (
make dev1) — local chain + S3 Signer or direct S3-compatible storage. - Running on testnet? Use Online mode (
python main.py -c config/conf.yaml ...) — requires $FLOCK tokens and Base Sepolia ETH. - No blockchain needed? Use Pure FL mode (
make pure-fl1) — coordination via shared files only.
For step-by-step instructions for each mode, see the Run Playbook.
.
├── client/ # Core FL client runtime and managers
│ ├── contracts/ # Smart contract wrappers and ABIs
│ ├── managers/ # Container, storage, sync, metrics, coordination managers
│ ├── encryption/ # Seal encryption integration
│ └── logging_utils.py # Centralized logging configuration
├── contracts/ # Solidity contracts and deployment scripts
├── config/ # Configuration templates (one per mode)
│ ├── conf.yaml # Online mode (Base Sepolia)
│ ├── simulation-online.yaml # Dev mode: local chain + online storage
│ ├── simulation.yaml # Offline mode: local chain + local storage
│ └── pure-fl.yaml # Pure FL mode (chainless)
├── docs/ # Detailed documentation
├── .env.onchain.example # .env template for online mode
├── .env.local.example # .env template for local chain modes
├── main.py # Client entry point
├── docker-compose.yml # Local chain + deployer services
├── Makefile # Developer shortcuts
└── output/ # Runtime logs and task outputs (git-ignored)
| Document | Description |
|---|---|
| Configuration | Config files, env vars, YAML settings, CLI overrides |
| Run Playbook | Step-by-step commands for every scenario |
| Runtime Modes | Docker / local execution backends |
| Local Chain Simulation | Offline and LAN deployment, shared storage setup (NFS/SMB/sshfs) |
| Pure FL Mode | Chainless federated learning without incentive mechanism |
| Encryption & Storage | Seal encryption, S3/Nami/local storage backends |
| FL Alliance Protocol | Protocol deep-dive and smart contract lifecycle |
| Backend API | FastAPI service for runs, metrics, events, artifacts, and task admin |
| Parameter | Default | Description |
|---|---|---|
DATASET |
(required) | Path to dataset file or directory |
GPU |
true |
Enable GPU acceleration (true/false) |
CHAIN_HOST |
localhost |
Anvil chain host IP (for remote LAN clients) |
TOKEN_ADDRESS |
(auto) | FlockToken contract address (auto-detected from $FLOCK_CONTRACTS_FILE, /data/contracts.json, or data/contracts.json — first match wins) |
TASK_ADDRESS |
(auto) | FlockTask contract address (auto-detected from the same set as TOKEN_ADDRESS) |
MODEL_DEFINITION_HASH |
(required for make chain) |
SHA-256 hash of model archive |
ROUNDS |
10 |
Number of training rounds |
MIN_PARTICIPANTS |
3 |
Minimum participants per round |
This project uses uv for Python package management:
uv sync # install dependencies
uv run python main.py # run in project environment
uv add <package> # add a dependencyBefore submitting changes:
make test
uv run python -m compileall main.py clientIf pytest fails during startup because an external plugin is auto-loaded by
your environment, run:
PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 uv run pytest -qLocal mode — dependency or module errors
When using runtime.mode: local, the client creates virtual environments in tmp_envs/ for each task. If you see ModuleNotFoundError or similar after updating baseline packages, remove the cache and retry:
rm -rf tmp_envs/By default, local runtime environments are preserved to speed up restarts. Set FL_KEEP_MODEL_ENV=false to force cleanup on each stop.
Process exited silently after Waiting for model to start...
Almost always one of:
- SSH session dropped — the parent received
SIGHUPand was killed before any handler could run on legacy builds. Recent builds log the signal name before exit; either way, run insidetmux/nohup/systemdso the client outlives the shell. - OOM-killer —
SIGKILLcannot be caught. Confirm withsudo dmesg -T | grep -iE 'killed process|out of memory'. Lower batch size, lower model precision, or move to a larger box. - Genuine startup timeout — bump
PROCESS_STARTUP_TIMEOUT(default1800seconds) andPROCESS_RESPONSE_TIMEOUT(default3600). Both are env-var overridable; LLM cold-starts (HF download + venv install + GPU load) frequently need 2 hours.
In every case the model subprocess uses start_new_session=True, so it survives parent death — inspect output/task_outputs/process_*.log to see exactly how far it got.
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
FL Alliance is based on academic research by the FLock team. See the paper: Defending Against Poisoning Attacks in Federated Learning With Blockchain.