Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
9536f78
chore(deps): add fastapi, uvicorn, and slowapi
spideystreet Mar 10, 2026
cd43304
feat(api): add API config module with pydantic-settings
spideystreet Mar 10, 2026
6550e67
feat(api): add connection pool with psycopg2 SimpleConnectionPool
spideystreet Mar 10, 2026
4c54745
feat(api): add pydantic response schemas
spideystreet Mar 10, 2026
95692eb
feat(api): add FastAPI app with health endpoint and rate limiting
spideystreet Mar 10, 2026
9a7998a
feat(api): add categories, domains, and techstacks endpoints
spideystreet Mar 10, 2026
14cc9a3
feat(api): add project search, detail, and similarity endpoints
spideystreet Mar 10, 2026
7bed7c9
feat(api): add trending recommendations endpoint
spideystreet Mar 10, 2026
b47437c
fix(api): escape ILIKE wildcards and add consistent type::text cast
spideystreet Mar 10, 2026
6d8af99
chore(docker): add FastAPI service to compose stack
spideystreet Mar 10, 2026
59fdaa9
test(api): add auto-marker for api test directory
spideystreet Mar 10, 2026
775ed4b
docs(env): add API configuration variables to .env.example
spideystreet Mar 10, 2026
b798d07
style(api): fix lint and type issues
spideystreet Mar 10, 2026
f53dd64
test(api): verify SQL params and response relations in project tests
spideystreet Mar 10, 2026
a13455d
fix(api): harden security for deployment
spideystreet Mar 10, 2026
3ba0089
docs: update CLAUDE.md and architecture for REST API & MCP
spideystreet Mar 10, 2026
02442fb
test(api): add MCP contract tests for all endpoints
spideystreet Mar 10, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 19 additions & 1 deletion .claude/rules/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@ GitHub API (Go scraper) [ingestion]
User profiles (public.User) [user_ml]
-> dbt user ML prep + embeddings
-> user recommendations

REST API (FastAPI, read-only) [serving]
-> consumed by ost-mcp (MCP server)
-> surfaces data to AI assistants
```

## Resources (`src/linker/resources/`)
Expand Down Expand Up @@ -44,7 +48,7 @@ Both are invoked as subprocesses by Dagster assets via `subprocess.run()`.
2. **Python Builder** (`python:3.11-slim`) — exports deps via uv to `requirements.txt`
3. **Runtime** (`python:3.11-slim`) — installs deps, copies Go binaries to `/usr/local/bin/`, runs Dagster

`docker-compose.yml` runs two services: `ost-linker` (app) and `db` (PostgreSQL with pgvector via `ankane/pgvector:v0.4.1`). DB is exposed on port 5433 by default.
`docker-compose.yml` runs four services: `webserver` (Dagster UI), `daemon` (Dagster daemon), `api` (FastAPI REST API on port 8000), and `db` (PostgreSQL with pgvector, dev only via override). DB is exposed on port 5433 by default.

## Database Schema

Expand All @@ -63,6 +67,20 @@ Seed data lives in `prisma/seed/` (categories, domains, techstacks).
- `language_detection.py` — `has_non_latin_chars()`, `parse_fasttext_labels()`, `is_blacklisted()` + constants (`NON_LATIN_LANGS`, `NON_LATIN_CHAR_RE`)
- `serialization.py` — `make_serializable()` (datetime/UUID → string), `clean_llm_json()` (strip markdown fences)

## REST API (`src/services/api/`)

Lightweight, read-only FastAPI service consumed by the [ost-mcp](https://github.com/opensource-together/ost-mcp) MCP server. Runs as a separate Docker container with minimal env (only `DATABASE_URL`, no Dagster/LLM secrets).

- `main.py` — FastAPI app with lifespan (connection pool), rate limit handler
- `config.py` — `APIConfig` (pydantic-settings) reads env vars
- `database.py` — `ConnectionPool` wrapper around psycopg2 `SimpleConnectionPool`
- `dependencies.py` — FastAPI dependency injection (`get_pool`)
- `schemas.py` — Pydantic v2 response models
- `rate_limit.py` — slowapi `Limiter` instance (60 req/min/IP)
- `routes/` — `health.py`, `projects.py`, `recommendations.py`, `references.py`

**Endpoints:** `/health`, `/projects/search`, `/projects/{id}`, `/projects/{id}/similar`, `/recommendations/trending`, `/categories`, `/domains`, `/techstacks`

## Python Services (`src/services/python/`)

- `db.py` — shared DB cursor context manager (`get_db_cursor`) used by assets
6 changes: 6 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,12 @@ FASTTEXT_MODEL_PATH="models/lid.176.ftz"
# OpenRouter API key — used by LLMClassifierResource (Mistral Small).
OPENROUTER_API_KEY="<your_openrouter_api_key>"

# --- API ---
# FastAPI REST API configuration (read-only, for MCP server consumption).
API_HOST=0.0.0.0
API_PORT=8000
API_RATE_LIMIT=60

# --- dbt ---
# Target profile: "local" (port 5433, default) or "docker" (port 5432, host "db").
# Set to "docker" when running inside a container.
Expand Down
10 changes: 10 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,13 @@ uv sync # Install Python dependencies
dagster dev -h 0.0.0.0 -p 3000 # Run Dagster locally (outside Docker)
```

### REST API (FastAPI)
```bash
uvicorn src.services.api.main:app --host 0.0.0.0 --port 8000 # Run API locally
pytest -m api # Run API tests only
```
The API is a lightweight, read-only service consumed by the [ost-mcp](https://github.com/opensource-together/ost-mcp) MCP server. It exposes project search, similarity, trending recommendations, and reference data.

### dbt
```bash
cd dbt && dbt deps # Install dbt packages
Expand Down Expand Up @@ -84,6 +91,9 @@ scripts/clean_docker_images.sh # Docker image cleanup
| `DBT_TARGET` | dbt target profile (`local` by default, `docker` in container) |
| `DBT_PROJECT_DIR` | dbt project directory (default: `<repo>/dbt`, set to `/app/dbt` in Docker) |
| `DAGSTER_HOME` | Dagster metadata directory (default: `./dagster_home`) |
| `API_HOST` | API listen host (default: `0.0.0.0`) |
| `API_PORT` | API listen port (default: `8000`) |
| `API_RATE_LIMIT` | Requests per minute per IP (default: `60`) |

## Bug Fixing

Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ RUN groupadd -g 1000 appuser \
USER appuser

# Expose Dagster webserver port
EXPOSE 3000
EXPOSE 3000 8000

# Healthcheck for Dagster webserver
HEALTHCHECK --interval=30s --timeout=5s --start-period=120s --retries=3 \
Expand Down
9 changes: 9 additions & 0 deletions docker-compose.override.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,15 @@ services:
- ./dbt:/app/dbt
- ./scripts:/app/scripts

api:
environment:
DATABASE_URL: postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db:5432/${POSTGRES_DB}
volumes:
- ./src:/app/src
depends_on:
db:
condition: service_healthy

# ============================================================================
# DATABASE (Postgres + PGVector) — dev only
# ============================================================================
Expand Down
24 changes: 24 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,5 +52,29 @@ services:
condition: service_healthy
command: ["./scripts/init.sh", "dagster-daemon", "run", "-w", "/app/workspace.yaml"]

# ============================================================================
# REST API (FastAPI — lightweight, read-only)
# Minimal env: only DATABASE_URL, no Dagster/GitHub/LLM secrets
# ============================================================================
api:
build: .
container_name: ost-linker-api
restart: unless-stopped
ports:
- "8000:8000"
environment:
DATABASE_URL: ${DATABASE_URL}
API_HOST: ${API_HOST:-0.0.0.0}
API_PORT: ${API_PORT:-8000}
API_RATE_LIMIT: ${API_RATE_LIMIT:-60}
DAGSTER_ROLE: api
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 5s
retries: 3
start_period: 10s
command: ["./scripts/init.sh", "uvicorn", "src.services.api.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "1"]

volumes:
dagster_data:
2 changes: 1 addition & 1 deletion docs
Submodule docs updated from 79796e to f4f3bf
4 changes: 4 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@ dependencies = [
"dagster-dbt>=0.28.17,<0.29",
"dbt-core>=1.8.0,<2",
"dbt-postgres>=1.8.0,<2",
"fastapi>=0.115.0,<1",
"uvicorn[standard]>=0.34.0,<1",
"slowapi>=0.1.9,<0.2",
]

[project.urls]
Expand Down Expand Up @@ -95,6 +98,7 @@ select = [

[tool.ruff.lint.per-file-ignores]
"src/linker/definitions.py" = ["E402"]
"src/services/api/routes/*.py" = ["B008"]

[tool.ruff.lint.isort]
known-first-party = ["src"]
Expand Down
7 changes: 7 additions & 0 deletions scripts/init.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,13 @@ if [ "$DAGSTER_ROLE" = "daemon" ]; then
exec "$@"
fi

# API skips dbt init — only needs DB
if [ "$DAGSTER_ROLE" = "api" ]; then
echo "API role: skipping dbt init."
echo "Executing command: $@"
exec "$@"
fi

# Wait for Postgres
echo "Waiting for Postgres to be ready..."
# Use Python to check connection using standard environment variables.
Expand Down
Empty file added src/services/api/__init__.py
Empty file.
13 changes: 13 additions & 0 deletions src/services/api/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
from pydantic import Field
from pydantic_settings import BaseSettings


class APIConfig(BaseSettings):
"""API configuration loaded from environment variables."""

database_url: str = Field(alias="DATABASE_URL")
host: str = Field(default="0.0.0.0", alias="API_HOST")
port: int = Field(default=8000, alias="API_PORT")
rate_limit: int = Field(default=60, alias="API_RATE_LIMIT")

model_config = {"populate_by_name": True}
31 changes: 31 additions & 0 deletions src/services/api/database.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
from collections.abc import Generator
from contextlib import contextmanager
from typing import Any

from psycopg2.extras import RealDictCursor
from psycopg2.pool import SimpleConnectionPool


class ConnectionPool:
"""Thin wrapper around psycopg2 SimpleConnectionPool."""

def __init__(self, database_url: str, minconn: int = 1, maxconn: int = 5) -> None:
self._pool = SimpleConnectionPool(minconn, maxconn, database_url)

@contextmanager
def get_cursor(self) -> Generator[Any, None, None]:
"""Yield a RealDictCursor, rollback on exit, return conn to pool."""
conn = self._pool.getconn()
try:
with conn.cursor(cursor_factory=RealDictCursor) as cur:
yield cur
conn.rollback()
except Exception:
conn.rollback()
raise
finally:
self._pool.putconn(conn)

def close(self) -> None:
"""Close all pooled connections."""
self._pool.closeall()
22 changes: 22 additions & 0 deletions src/services/api/dependencies.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
from src.services.api.database import ConnectionPool

_pool: ConnectionPool | None = None


def init_pool(database_url: str) -> None:
"""Initialize the global connection pool."""
global _pool
_pool = ConnectionPool(database_url, minconn=1, maxconn=5)


def close_pool() -> None:
"""Close the global connection pool."""
if _pool:
_pool.close()


def get_pool() -> ConnectionPool:
"""FastAPI dependency: returns the connection pool."""
if _pool is None:
raise RuntimeError("Connection pool not initialized")
return _pool
53 changes: 53 additions & 0 deletions src/services/api/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
from collections.abc import AsyncGenerator
from contextlib import asynccontextmanager

from fastapi import FastAPI, Request, Response
from fastapi.responses import JSONResponse
from slowapi.errors import RateLimitExceeded

from src.services.api.config import APIConfig
from src.services.api.dependencies import close_pool, init_pool
from src.services.api.rate_limit import limiter
from src.services.api.routes import health, projects, recommendations, references


def _get_config() -> APIConfig:
return APIConfig() # type: ignore[call-arg]


@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
"""Startup: init pool. Shutdown: close pool."""
config = _get_config()
init_pool(config.database_url)
yield
close_pool()


def _rate_limit_handler(request: Request, exc: RateLimitExceeded) -> Response:
return JSONResponse(
status_code=429,
content={"detail": "Rate limit exceeded"},
)


app = FastAPI(
title="OST Linker API",
description="Open-source project recommendations",
version="1.0.0",
lifespan=lifespan,
)

# Rate limiting via @limiter.limit() decorators on routes.
# slowapi's SlowAPIMiddleware has compatibility issues with sync endpoints,
# so we use the per-route decorator approach instead.
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_handler) # type: ignore[arg-type]

# NOTE: No CORS middleware — this API is consumed server-to-server by the MCP
# backend, not by browsers. Add CORSMiddleware if browser access is needed later.

app.include_router(health.router)
app.include_router(references.router)
app.include_router(projects.router)
app.include_router(recommendations.router)
4 changes: 4 additions & 0 deletions src/services/api/rate_limit.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from slowapi import Limiter
from slowapi.util import get_remote_address

limiter = Limiter(key_func=get_remote_address)
Empty file.
14 changes: 14 additions & 0 deletions src/services/api/routes/health.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
from fastapi import APIRouter, Depends

from src.services.api.database import ConnectionPool
from src.services.api.dependencies import get_pool

router = APIRouter()


@router.get("/health")
def health(pool: ConnectionPool = Depends(get_pool)) -> dict[str, str]:
"""Health check endpoint -- verifies DB connectivity."""
with pool.get_cursor() as cur:
cur.execute("SELECT 1")
return {"status": "ok"}
Loading