Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,7 @@ For distributed development:
3. Build Store: `mvn clean package -pl hugegraph-store -am -DskipTests`
4. Build Server with HStore backend: `mvn clean package -pl hugegraph-server -am -DskipTests`

See Docker Compose example: `hugegraph-server/hugegraph-dist/docker/example/`
See Docker Compose examples: `docker/` directory. Single-node quickstart (pre-built images): `docker/docker-compose.yml`. Single-node dev build (from source): `docker/docker-compose-dev.yml`. 3-node cluster: `docker/docker-compose-3pd-3store-3server.yml`. See `docker/README.md` for full setup guide.

### Debugging Tips

Expand Down
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,8 +209,11 @@ docker run -itd --name=hugegraph -e PASSWORD=your_password -p 8080:8080 hugegrap

For advanced Docker configurations, see:
- [Docker Documentation](https://hugegraph.apache.org/docs/quickstart/hugegraph-server/#3-deploy)
- [Docker Compose Example](./hugegraph-server/hugegraph-dist/docker/example)
- [Docker README](hugegraph-server/hugegraph-dist/docker/README.md)
- [Docker Compose Examples](./docker/)
- [Docker README](./docker/README.md)
- [Server Docker README](hugegraph-server/hugegraph-dist/docker/README.md)

> **Docker Desktop (Mac/Windows)**: The 3-node distributed cluster (`docker/docker-compose-3pd-3store-3server.yml`) uses Docker bridge networking and works on all platforms including Docker Desktop. Allocate at least 12 GB memory to Docker Desktop.

> **Note**: Docker images are convenience releases, not **official ASF distribution artifacts**. See [ASF Release Distribution Policy](https://infra.apache.org/release-distribution.html#dockerhub) for details.
>
Expand Down
259 changes: 259 additions & 0 deletions docker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,259 @@
# HugeGraph Docker Deployment

This directory contains Docker Compose files for running HugeGraph:

| File | Description |
|------|-------------|
| `docker-compose.yml` | Single-node cluster using pre-built images from Docker Hub |
| `docker-compose-dev.yml` | Single-node cluster built from source (for developers) |
| `docker-compose-3pd-3store-3server.yml` | 3-node distributed cluster (PD + Store + Server) |

## Prerequisites

- **Docker Engine** 20.10+ (or Docker Desktop 4.x+)
- **Docker Compose** v2 (included in Docker Desktop)
- **Memory**: Allocate at least **12 GB** to Docker Desktop (Settings → Resources → Memory). The 3-node cluster runs 9 JVM processes (3 PD + 3 Store + 3 Server) which are memory-intensive. Insufficient memory causes OOM kills that appear as silent Raft failures.

> [!IMPORTANT]
> The 12 GB minimum is for Docker Desktop (Mac/Windows). On Linux with native Docker, ensure the host has at least 12 GB of free memory.

> [!WARNING]
> **Temporary workaround — source clone currently required.** The `docker-compose.yml` (quickstart) and `docker-compose-3pd-3store-3server.yml` compose files currently mount entrypoint scripts directly from the source tree because the published Docker Hub images do not yet include the updated entrypoints. This means these compose files currently require a full repository clone to run. This requirement will be removed in a follow-up image release once updated images are published to Docker Hub with the new entrypoints baked in. The `docker-compose-dev.yml` (dev build) is unaffected since it builds images from source.

## Why Bridge Networking (Not Host Mode)

Previous versions used `network_mode: host`, which only works on Linux and is incompatible with Docker Desktop on Mac/Windows. The cluster now uses a proper Docker bridge network (`hg-net`) where services communicate via container hostnames (`pd0`, `pd1`, `store0`, etc.) instead of `127.0.0.1`. This makes the cluster portable across all platforms.

---

## Single-Node Setup

Two compose files are available for running a single-node cluster (1 PD + 1 Store + 1 Server):

### Option A: Quick Start (pre-built images)

Uses pre-built images from Docker Hub. Best for **end users** who want to run HugeGraph quickly.

```bash
cd docker
docker compose up -d
```

- Images: `hugegraph/pd:latest`, `hugegraph/store:latest`, `hugegraph/server:latest`
- `pull_policy: always` — always pulls the latest image
- PD healthcheck endpoint: `/` (root)
- Single PD, single Store (`HG_PD_INITIAL_STORE_LIST: store:8500`), single Server
- No `STORE_REST` or `wait-partition.sh` — simpler startup

### Option B: Development Build (build from source)

Builds images locally from source Dockerfiles. Best for **developers** who want to test local changes.

```bash
cd docker
docker compose -f docker-compose-dev.yml up -d
```

- Images: built from source via `build: context: ..` with Dockerfiles
- No `pull_policy` — builds locally, doesn't pull
- Entrypoint scripts are baked into the built image (no volume mounts)
- PD healthcheck endpoint: `/v1/health`
- Otherwise identical env vars and structure to the quickstart file

### Key Differences

| | `docker-compose.yml` (quickstart) | `docker-compose-dev.yml` (dev build) |
|---|---|---|
| **Images** | Pull from Docker Hub | Build from source |
| **Who it's for** | End users | Developers |
| **pull_policy** | `always` | not set (build) |

**Verify** (both options):
```bash
curl http://localhost:8080/versions
```

---

## 3-Node Cluster Quickstart

```bash
cd docker
docker compose -f docker-compose-3pd-3store-3server.yml up -d
```

**Startup ordering** is enforced via `depends_on` with `condition: service_healthy`:

1. **PD nodes** start first and must pass healthchecks (`/v1/health`)
2. **Store nodes** start after all PD nodes are healthy
3. **Server nodes** start after all Store nodes are healthy

This ensures Raft leader election and partition assignment complete before dependent services attempt connections.

**Verify the cluster is healthy**:

```bash
# Check PD health
curl http://localhost:8620/v1/health

# Check Store health
curl http://localhost:8520/v1/health

# Check Server (Graph API)
curl http://localhost:8080/versions

# List registered stores via PD
curl http://localhost:8620/v1/stores

# List partitions
curl http://localhost:8620/v1/partitions
```

---

## Environment Variable Reference

Configuration is injected via environment variables. The old `docker/configs/application-pd*.yml` and `docker/configs/application-store*.yml` files are no longer used.

### PD Environment Variables

| Variable | Required | Default | Maps To (`application.yml`) | Description |
|----------|----------|---------|-----------------------------|-------------|
| `HG_PD_GRPC_HOST` | Yes | — | `grpc.host` | This node's hostname/IP for gRPC |
| `HG_PD_RAFT_ADDRESS` | Yes | — | `raft.address` | This node's Raft address (e.g. `pd0:8610`) |
| `HG_PD_RAFT_PEERS_LIST` | Yes | — | `raft.peers-list` | All PD peers (e.g. `pd0:8610,pd1:8610,pd2:8610`) |
| `HG_PD_INITIAL_STORE_LIST` | Yes | — | `pd.initial-store-list` | Expected stores (e.g. `store0:8500,store1:8500,store2:8500`) |
| `HG_PD_GRPC_PORT` | No | `8686` | `grpc.port` | gRPC server port |
| `HG_PD_REST_PORT` | No | `8620` | `server.port` | REST API port |
| `HG_PD_DATA_PATH` | No | `/hugegraph-pd/pd_data` | `pd.data-path` | Metadata storage path |
| `HG_PD_INITIAL_STORE_COUNT` | No | `1` | `pd.initial-store-count` | Min stores for cluster availability |

**Deprecated aliases** (still work but log a warning):

| Deprecated | Use Instead |
|------------|-------------|
| `GRPC_HOST` | `HG_PD_GRPC_HOST` |
| `RAFT_ADDRESS` | `HG_PD_RAFT_ADDRESS` |
| `RAFT_PEERS` | `HG_PD_RAFT_PEERS_LIST` |
| `PD_INITIAL_STORE_LIST` | `HG_PD_INITIAL_STORE_LIST` |

### Store Environment Variables

| Variable | Required | Default | Maps To (`application.yml`) | Description |
|----------|----------|---------|-----------------------------|-------------|
| `HG_STORE_PD_ADDRESS` | Yes | — | `pdserver.address` | PD gRPC addresses (e.g. `pd0:8686,pd1:8686,pd2:8686`) |
| `HG_STORE_GRPC_HOST` | Yes | — | `grpc.host` | This node's hostname (e.g. `store0`) |
| `HG_STORE_RAFT_ADDRESS` | Yes | — | `raft.address` | This node's Raft address (e.g. `store0:8510`) |
| `HG_STORE_GRPC_PORT` | No | `8500` | `grpc.port` | gRPC server port |
| `HG_STORE_REST_PORT` | No | `8520` | `server.port` | REST API port |
| `HG_STORE_DATA_PATH` | No | `/hugegraph-store/storage` | `app.data-path` | Data storage path |

**Deprecated aliases** (still work but log a warning):

| Deprecated | Use Instead |
|------------|-------------|
| `PD_ADDRESS` | `HG_STORE_PD_ADDRESS` |
| `GRPC_HOST` | `HG_STORE_GRPC_HOST` |
| `RAFT_ADDRESS` | `HG_STORE_RAFT_ADDRESS` |

### Server Environment Variables

| Variable | Required | Default | Maps To | Description |
|----------|----------|---------|-----------------------------|-------------|
| `HG_SERVER_BACKEND` | Yes | — | `backend` in `hugegraph.properties` | Storage backend (e.g. `hstore`) |
| `HG_SERVER_PD_PEERS` | Yes | — | `pd.peers` | PD cluster addresses (e.g. `pd0:8686,pd1:8686,pd2:8686`) |
| `STORE_REST` | No | — | Used by `wait-partition.sh` | Store REST endpoint for partition verification (e.g. `store0:8520`) |
| `PASSWORD` | No | — | Enables auth mode | Optional authentication password |

**Deprecated aliases** (still work but log a warning):

| Deprecated | Use Instead |
|------------|-------------|
| `BACKEND` | `HG_SERVER_BACKEND` |
| `PD_PEERS` | `HG_SERVER_PD_PEERS` |

---

## Port Reference

| Service | Container Port | Host Port | Protocol | Purpose |
|---------|---------------|-----------|----------|---------|
| pd0 | 8620 | 8620 | HTTP | REST API |
| pd0 | 8686 | 8686 | gRPC | PD gRPC |
| pd0 | 8610 | — | TCP | Raft (internal only) |
| pd1 | 8620 | 8621 | HTTP | REST API |
| pd1 | 8686 | 8687 | gRPC | PD gRPC |
| pd2 | 8620 | 8622 | HTTP | REST API |
| pd2 | 8686 | 8688 | gRPC | PD gRPC |
| store0 | 8500 | 8500 | gRPC | Store gRPC |
| store0 | 8510 | 8510 | TCP | Raft |
| store0 | 8520 | 8520 | HTTP | REST API |
| store1 | 8500 | 8501 | gRPC | Store gRPC |
| store1 | 8510 | 8511 | TCP | Raft |
| store1 | 8520 | 8521 | HTTP | REST API |
| store2 | 8500 | 8502 | gRPC | Store gRPC |
| store2 | 8510 | 8512 | TCP | Raft |
| store2 | 8520 | 8522 | HTTP | REST API |
| server0 | 8080 | 8080 | HTTP | Graph API |
| server1 | 8080 | 8081 | HTTP | Graph API |
| server2 | 8080 | 8082 | HTTP | Graph API |

---

## Healthcheck Endpoints

| Service | Endpoint | Expected |
|---------|----------|----------|
| PD | `GET /v1/health` | `200 OK` |
| Store | `GET /v1/health` | `200 OK` |
| Server | `GET /versions` | `200 OK` with version JSON |

---

## Troubleshooting

### Containers Exiting or Restarting (OOM Kills)

**Symptom**: Containers exit with code 137, or restart loops. Raft logs show election timeouts.

**Cause**: Docker Desktop does not have enough memory. The 9 JVM processes require at least 12 GB.

**Fix**: Docker Desktop → Settings → Resources → Memory → set to **12 GB** or higher. Restart Docker Desktop.

```bash
# Check if containers were OOM killed
docker inspect hg-pd0 | grep -i oom
docker stats --no-stream
```

### Raft Leader Election Failure

**Symptom**: PD logs show repeated `Leader election timeout`. Store nodes cannot register.

**Cause**: PD nodes cannot reach each other on the Raft port (8610), or `HG_PD_RAFT_PEERS_LIST` is misconfigured.

**Fix**:
1. Verify all PD containers are running: `docker compose -f docker-compose-3pd-3store-3server.yml ps`
2. Check PD logs: `docker logs hg-pd0`
3. Verify network connectivity: `docker exec hg-pd0 ping pd1`
4. Ensure `HG_PD_RAFT_PEERS_LIST` is identical on all PD nodes

### Partition Assignment Not Completing

**Symptom**: Server starts but graph operations fail. Store logs show `partition not found`.

**Cause**: PD has not finished assigning partitions to stores, or stores did not register successfully.

**Fix**:
1. Check registered stores: `curl http://localhost:8620/v1/stores`
2. Check partition status: `curl http://localhost:8620/v1/partitions`
3. Wait for partition assignment (can take 1–3 minutes after all stores register)
4. Check server logs for the `wait-partition.sh` script output: `docker logs hg-server0`

### Connection Refused Errors

**Symptom**: Stores cannot connect to PD, or Server cannot connect to Store.

**Cause**: Services are using `127.0.0.1` instead of container hostnames, or the `hg-net` bridge network is misconfigured.

**Fix**: Ensure all `HG_*` env vars use container hostnames (`pd0`, `store0`, etc.), not `127.0.0.1` or `localhost`.
4 changes: 2 additions & 2 deletions hugegraph-pd/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ store:
### Common Configuration Errors

1. **Raft peer discovery failure**: `raft.peers-list` must include all PD nodes' `raft.address` values
2. **Store connection issues**: `grpc.host` must be a reachable IP (not `127.0.0.1`) for distributed deployments
2. **Store connection issues**: `grpc.host` must be a reachable IP (not `127.0.0.1`) for distributed deployments. In Docker bridge networking, use the container hostname (e.g., `pd0`) set via `HG_PD_GRPC_HOST` env var.
3. **Split-brain scenarios**: Always run 3 or 5 PD nodes in production for Raft quorum
4. **Partition imbalance**: Adjust `patrol-interval` for faster/slower rebalancing

Expand Down Expand Up @@ -331,7 +331,7 @@ docker run -d -p 8620:8620 -p 8686:8686 -p 8610:8610 \
hugegraph-pd:latest

# For production clusters, use Docker Compose or Kubernetes
# See: hugegraph-server/hugegraph-dist/docker/example/
# See: docker/docker-compose-3pd-3store-3server.yml and docker/README.md
```

Exposed ports: 8620 (REST), 8686 (gRPC), 8610 (Raft)
Expand Down
39 changes: 36 additions & 3 deletions hugegraph-pd/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,36 @@ raft:

For detailed configuration options and production tuning, see [Configuration Guide](docs/configuration.md).

#### Docker Bridge Network Example

When running PD in Docker with bridge networking (e.g., `docker/docker-compose-3pd-3store-3server.yml`), configuration is injected via environment variables instead of editing `application.yml` directly. Container hostnames are used instead of IP addresses:

**pd0** container:
```bash
HG_PD_GRPC_HOST=pd0
HG_PD_RAFT_ADDRESS=pd0:8610
HG_PD_RAFT_PEERS_LIST=pd0:8610,pd1:8610,pd2:8610
HG_PD_INITIAL_STORE_LIST=store0:8500,store1:8500,store2:8500
```

**pd1** container:
```bash
HG_PD_GRPC_HOST=pd1
HG_PD_RAFT_ADDRESS=pd1:8610
HG_PD_RAFT_PEERS_LIST=pd0:8610,pd1:8610,pd2:8610
HG_PD_INITIAL_STORE_LIST=store0:8500,store1:8500,store2:8500
```

**pd2** container:
```bash
HG_PD_GRPC_HOST=pd2
HG_PD_RAFT_ADDRESS=pd2:8610
HG_PD_RAFT_PEERS_LIST=pd0:8610,pd1:8610,pd2:8610
HG_PD_INITIAL_STORE_LIST=store0:8500,store1:8500,store2:8500
```

See [docker/README.md](../docker/README.md) for the full environment variable reference.

### Verify Deployment

Check if PD is running:
Expand Down Expand Up @@ -210,15 +240,18 @@ docker run -d \
-p 8620:8620 \
-p 8686:8686 \
-p 8610:8610 \
-v /path/to/conf:/hugegraph-pd/conf \
-e HG_PD_GRPC_HOST=<your-ip> \
-e HG_PD_RAFT_ADDRESS=<your-ip>:8610 \
-e HG_PD_RAFT_PEERS_LIST=<your-ip>:8610 \
-e HG_PD_INITIAL_STORE_LIST=<store-ip>:8500 \
-v /path/to/data:/hugegraph-pd/pd_data \
--name hugegraph-pd \
hugegraph-pd:latest
hugegraph/pd:latest
```

For Docker Compose examples with HugeGraph Store and Server, see:
```
hugegraph-server/hugegraph-dist/docker/example/
docker/docker-compose-3pd-3store-3server.yml
```

## Documentation
Expand Down
Loading
Loading