Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
411 changes: 411 additions & 0 deletions DEPLOYMENT_ALTERNATIVES.md

Large diffs are not rendered by default.

265 changes: 265 additions & 0 deletions DEPLOY_TO_FAL_QUICKSTART.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,265 @@
# Deploy to fal.ai - Quick Start

## TL;DR

```bash
# 1. Install fal CLI
pip install fal-client
fal auth login

# 2. Update fal_app.py with your Docker image name

# 3. Deploy
fal deploy fal_app.py

# 4. Test
curl https://your-fal-url/health
```

## Prerequisites

- [ ] Docker image built and pushed to a registry
- [ ] fal.ai account (sign up at https://fal.ai)
- [ ] fal CLI installed (`pip install fal-client`)

## Step-by-Step

### 1. Prepare Your Docker Image

```bash
# Build
docker build -t your-dockerhub-username/scope-runner:latest .

# Push to Docker Hub (or your registry)
docker push your-dockerhub-username/scope-runner:latest
```

### 2. Update Deployment File

Edit `fal_app.py`:

```python
# Change this line:
DOCKER_IMAGE = "your-dockerhub-username/scope-runner:latest"
```

### 3. Authenticate with fal.ai

```bash
# Option A: Interactive login
fal auth login

# Option B: Use API key
export FAL_KEY="your-fal-api-key"
```

Get your API key from: https://fal.ai/dashboard/keys

### 4. Deploy

```bash
fal deploy fal_app.py
```

You'll get a URL like: `https://your-username-scope-runner.fal.run`

### 5. Test

```bash
# Health check
curl https://your-fal-url/health

# Check if server is responding
curl https://your-fal-url/

# Test WebSocket (if you have a client)
# Connect to: ws://your-fal-url/live
```

## Configuration Options

Edit `fal_app.py` to customize:

```python
class ScopeRunnerApp(fal.App, kind="container", image=custom_image):
# GPU type
machine_type = "GPU-A100" # or "GPU-A100-80GB"

# Keep instance warm (seconds)
keep_alive = 300 # 5 minutes

# Port your app listens on
exposed_port = 8000
```

### GPU Options

- `"GPU-T4"` - 16GB VRAM (may be insufficient)
- `"GPU-A10G"` - 24GB VRAM
- `"GPU-A100"` - 40GB VRAM (recommended)
- `"GPU-A100-80GB"` - 80GB VRAM (for large models)

### Keep Alive Options

- `0` - No warm instances (coldest start, cheapest)
- `300` - 5 minutes (balanced)
- `3600` - 1 hour (fastest, more expensive)

## Troubleshooting

### "Image not found"

Make sure your Docker image is:
- Pushed to a public registry, OR
- Configure private registry credentials in `fal_app.py`

### "GPU not available"

Check that:
- Your Docker image includes NVIDIA drivers
- You're using a CUDA base image
- `nvidia-smi` works in the container

### "Port not accessible"

Verify:
- `exposed_port = 8000` matches your app's port
- ai-runner is actually listening on that port
- fal.ai supports port forwarding (check their docs)

### "Container exits immediately"

Check logs with:
```bash
fal logs your-username/scope-runner
```

Common issues:
- Missing models (models not in image)
- Environment variables not set
- CMD in Dockerfile is incorrect

## Monitoring

### View Logs

```bash
fal logs your-username/scope-runner
```

### Check Status

```bash
fal status your-username/scope-runner
```

### View Metrics

Go to: https://fal.ai/dashboard

## Cost Estimation

**A100 GPU:**
- Compute: ~$0.0015/second (~$5.40/hour)
- Keep alive (5 min): ~$0.45/hour baseline
- Per request: Variable based on generation time

**Example costs:**
- Always on (24/7): ~$3,900/month
- With keep_alive (busy hours): ~$500-1,500/month
- Pure pay-per-use: ~$0.10-0.50 per video

## Alternative: If fal.ai Doesn't Work

If you encounter issues with fal.ai's container model:

### RunPod (Proven Alternative)

```bash
# 1. Go to runpod.io
# 2. Deploy → GPU Pod
# 3. Select: RTX A5000 or A100
# 4. Docker Image: your-dockerhub-username/scope-runner:latest
# 5. Expose Port: 8000
# 6. Deploy
```

Cost: ~$0.20-0.80/hour
Setup time: 5 minutes

See `DEPLOYMENT_ALTERNATIVES.md` for more options.

## Next Steps

After successful deployment:

1. **Test thoroughly**
- Health endpoints
- WebSocket connections
- Video generation
- Performance under load

2. **Monitor costs**
- Check fal.ai dashboard
- Adjust keep_alive if needed
- Consider alternatives if too expensive

3. **Set up CI/CD** (optional)
- Use `.github/workflows/deploy-fal.yml`
- Automate deployments on push

4. **Document your endpoint**
- Share URL with team
- Document API usage
- Set up monitoring/alerts

## Resources

- **Complete guide**: `FAL_REALTIME_DEPLOYMENT.md`
- **Alternatives**: `DEPLOYMENT_ALTERNATIVES.md`
- **Corrections**: `FAL_INTEGRATION_CORRECTED.md`
- **fal.ai docs**: https://docs.fal.ai
- **Support**: https://fal.ai/support

## Quick Reference

```bash
# Deploy
fal deploy fal_app.py

# Check logs
fal logs your-username/scope-runner

# Check status
fal status your-username/scope-runner

# Update deployment
# (edit fal_app.py, then)
fal deploy fal_app.py

# Remove deployment
fal delete your-username/scope-runner
```

## Success Checklist

- [ ] Docker image built and pushed
- [ ] fal CLI installed and authenticated
- [ ] `fal_app.py` updated with image name
- [ ] Deployed successfully
- [ ] Health endpoint responds
- [ ] WebSocket connections work
- [ ] Video generation works
- [ ] Costs are acceptable
- [ ] Monitoring set up

## Need Help?

- **fal.ai issues**: https://fal.ai/support or Discord
- **Scope Runner issues**: GitHub Issues
- **Architecture questions**: `FAL_REALTIME_DEPLOYMENT.md`

---

**Remember:** This is a realtime streaming server, not a simple function. Make sure fal.ai's container model supports indefinite runtime before committing to production use.

30 changes: 29 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
ARG BASE_IMAGE=livepeer/ai-runner:live-base-v0.14.1
FROM ${BASE_IMAGE}

# Install Python 3.12 and gcc-13 FIRST before creating venv
RUN apt update && apt install -yqq \
wget git curl \
build-essential software-properties-common \
Expand All @@ -9,18 +10,44 @@ RUN apt update && apt install -yqq \
python3-dev \
&& apt clean && rm -rf /var/lib/apt/lists/*

RUN apt-get update && apt-get install -y \
software-properties-common \
&& add-apt-repository ppa:deadsnakes/ppa -y \
&& add-apt-repository ppa:ubuntu-toolchain-r/test -y \
&& apt-get update \
&& apt-get install -y \
python3.12 \
python3.12-dev \
python3.12-venv \
gcc-13 \
g++-13 \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

# Set Python 3.12 as the default python3 and set up compiler alternatives
RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 1 \
&& update-alternatives --set python3 /usr/bin/python3.12 \
&& ln -sf /usr/bin/python3.12 /usr/bin/python \
&& update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-13 100 \
&& update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-13 100

WORKDIR /app

COPY pyproject.toml uv.lock ./
# Copy stub for editable install validation
COPY src/scope_runner/pipeline/__init__.py ./src/scope_runner/pipeline/

# Now sync with Python 3.12 and gcc-13
RUN uv sync --locked --no-install-project

COPY src/scope_runner/ ./src/scope_runner/

RUN uv sync --locked

# Force rebuild sageattention from source with the new environment (Python 3.12 + gcc-13)
# This replaces the precompiled wheel with a version built for this container
RUN uv pip install --force-reinstall --no-binary sageattention sageattention

ENV HF_HUB_OFFLINE=1

ARG GIT_SHA
Expand All @@ -29,4 +56,5 @@ ARG VERSION="undefined"
ENV GIT_SHA="${GIT_SHA}" \
VERSION="${VERSION}"

CMD ["uv", "run", "--frozen", "scope-runner"]
# Use the venv binary directly to avoid uv reinstalling the precompiled sageattention wheel
CMD ["/app/.venv/bin/scope-runner"]
Loading
Loading