Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,18 @@ curl -X POST localhost:8000/api/v1/timepoints/generate/stream \

---

## Model Control

Downstream apps can control model selection and generation behavior per-request:

- **`model_policy: "permissive"`** — Routes all generation through open-weight models (DeepSeek, Llama, Qwen, Mistral) via OpenRouter, uses Pollinations for images, and skips Google grounding. Fully Google-free.
- **`text_model` / `image_model`** — Override preset models with any OpenRouter-compatible model ID (e.g. `qwen/qwen3-235b-a22b`) or Google native model.
- **`llm_params`** — Fine-grained control over temperature, max_tokens, top_p, top_k, penalties, stop sequences, and system prompt injection. Applied to all 14 pipeline agents.

All three are composable: `model_policy` + explicit models + `llm_params` work together with clear priority ordering. See [docs/API.md](docs/API.md) for the full parameter reference.

---

## API

| Endpoint | Description |
Expand Down Expand Up @@ -279,6 +291,7 @@ python3.10 -m pytest tests/ -v # 522 fast + integration, 11 skipped
- [iOS Integration](docs/IOS_INTEGRATION.md) — Auth flow, credit system, endpoint map for iOS client
- [Agent Architecture](docs/AGENTS.md) — Pipeline breakdown with example output
- [Temporal Navigation](docs/TEMPORAL.md) — Time travel mechanics
- [Downstream Model Control](docs/DOWNSTREAM_MODEL_CONTROL.md) — Model policy, LLM params, and per-request control for downstream apps
- [Eval Roadmap](docs/EVAL_ROADMAP.md) — Quality scoring and benchmark plans
- [Deployment](docs/DEPLOY.md) — Local, Replit, and production deployment

Expand Down
85 changes: 82 additions & 3 deletions docs/API.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,77 @@ Override preset models for custom configurations:
}
```

**Permissive Mode (Google-Free):**

Use only open-weight, distillable models — zero Google API calls:
```json
{
"query": "The signing of the Magna Carta, 1215",
"generate_image": true,
"model_policy": "permissive"
}
```
Text routes to DeepSeek/Llama/Qwen via OpenRouter, images route to Pollinations, and Google grounding is skipped. Response metadata reflects the actual models used:
```json
{
"text_model_used": "deepseek/deepseek-r1-0528",
"image_model_used": "pollinations",
"model_provider": "openrouter",
"model_permissiveness": "permissive"
}
```

**Composing model_policy with explicit models:**

`model_policy` and explicit model names are composable — explicit models take priority:
```json
{
"query": "Apollo 11 Moon Landing, 1969",
"model_policy": "permissive",
"text_model": "qwen/qwen3-235b-a22b",
"generate_image": true
}
```
This uses the specified Qwen model for text, Pollinations for images (from permissive policy), and skips Google grounding.

---

## LLM Parameters

The `llm_params` object gives downstream callers fine-grained control over generation hyperparameters. All fields are optional — unset fields use agent/preset defaults. These parameters are applied to every agent in the 14-step pipeline.

```json
{
"query": "Turing breaks Enigma, 1941",
"text_model": "deepseek/deepseek-r1-0528",
"llm_params": {
"temperature": 0.5,
"max_tokens": 4096,
"top_p": 0.9,
"system_prompt_suffix": "Keep all descriptions under 200 words. Use British English."
}
}
```

| Parameter | Type | Range | Providers | Description |
|-----------|------|-------|-----------|-------------|
| `temperature` | float | 0.0–2.0 | All | Sampling temperature. Overrides per-agent defaults (which range from 0.2 for factual agents to 0.85 for creative agents). |
| `max_tokens` | int | 1–32768 | All | Maximum output tokens per agent call. Preset defaults: hyper=1024, balanced=2048, hd=8192. |
| `top_p` | float | 0.0–1.0 | All | Nucleus sampling — only consider tokens whose cumulative probability is <= top_p. |
| `top_k` | int | >= 1 | All | Top-k sampling — only consider the k most likely tokens at each step. |
| `frequency_penalty` | float | -2.0–2.0 | OpenRouter | Penalize tokens proportionally to how often they've appeared in the output. |
| `presence_penalty` | float | -2.0–2.0 | OpenRouter | Penalize tokens that have appeared at all in the output so far. |
| `repetition_penalty` | float | 0.0–2.0 | OpenRouter | Multiplicative penalty for repeated tokens. |
| `stop` | string[] | max 4 | All | Stop sequences — generation halts when any of these strings is produced. |
| `thinking_level` | string | — | Google | Reasoning depth for thinking models: `"none"`, `"low"`, `"medium"`, `"high"`. |
| `system_prompt_prefix` | string | max 2000 | All | Text prepended to every agent's system prompt. Use for tone, persona, or style injection. |
| `system_prompt_suffix` | string | max 2000 | All | Text appended to every agent's system prompt. Use for constraints, formatting rules, or output instructions. |

**Notes:**
- Parameters marked "OpenRouter" are silently ignored when the request routes to Google (and vice versa for `thinking_level`).
- `system_prompt_prefix` and `system_prompt_suffix` affect all 14 pipeline agents. Use these to inject cross-cutting concerns (e.g., language, tone, verbosity constraints).
- Request-level `llm_params` override per-agent defaults. For example, if `llm_params.temperature` is set, it overrides the judge agent's default of 0.3, the scene agent's default of 0.7, etc.

---

## Endpoints Overview
Expand Down Expand Up @@ -120,12 +191,20 @@ Generate a scene with real-time progress updates via Server-Sent Events.
| query | string | Yes | Historical moment (3-500 chars) |
| generate_image | boolean | No | Generate AI image (default: false) |
| preset | string | No | Quality preset: `hd`, `hyper`, `balanced` (default), `gemini3` |
| text_model | string | No | Override text model (ignores preset) |
| image_model | string | No | Override image model (ignores preset) |
| text_model | string | No | Text model ID — OpenRouter format (`org/model`) or Google native (`gemini-*`). Overrides preset. |
| image_model | string | No | Image model ID — `pollinations` for free open-source, or Google native. Overrides preset. |
| model_policy | string | No | `"permissive"` — selects only open-weight models (Llama, DeepSeek, Qwen) and skips Google-dependent steps. Fully Google-free. Works alongside explicit model overrides. |
| llm_params | object | No | Fine-grained LLM parameters applied to all pipeline agents. See **LLM Parameters** below. |
| visibility | string | No | `public` (default) or `private` — controls who can see full data |
| callback_url | string | No | URL to POST results to when generation completes (async endpoint only) |
| request_context | object | No | Opaque context passed through to response (e.g. `{"source": "clockchain", "job_id": "..."}`) |

**Model selection priority** (highest first):
1. Explicit `text_model` / `image_model` — use exactly these models
2. `model_policy: "permissive"` — auto-select open-weight models, skip Google grounding
3. `preset` — use preset's default models
4. Server defaults

**Response:** SSE stream with events:

```
Expand Down Expand Up @@ -1036,4 +1115,4 @@ Rate limit: 60 requests/minute per IP.

---

*Last updated: 2026-02-23*
*Last updated: 2026-03-11*
82 changes: 82 additions & 0 deletions docs/DOWNSTREAM_MODEL_CONTROL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Downstream Model Control — TIMEPOINT Flash

**For teams building on TIMEPOINT Flash (Web App, iPhone App, Clockchain, Billing, Enterprise integrations)**

TIMEPOINT Flash now supports full downstream control of model selection and generation hyperparameters on every generation request. Downstream apps can set `model_policy: "permissive"` to route all 14 pipeline agents through open-weight models (DeepSeek R1, Llama, Qwen, Mistral) via OpenRouter with Pollinations for images — making the entire pipeline fully Google-free with zero Google API calls, including grounding. Apps can also specify exact models by name using `text_model` and `image_model` (any OpenRouter-compatible model ID like `qwen/qwen3-235b-a22b` or Google native like `gemini-2.5-flash`), and these explicit overrides take priority over `model_policy`, which in turn takes priority over `preset`. In addition, the new `llm_params` object provides fine-grained control over generation hyperparameters — temperature, max_tokens, top_p, top_k, frequency/presence/repetition penalties, stop sequences, thinking level, and system prompt injection (prefix/suffix) — all applied uniformly across every agent in the pipeline. Request-level `llm_params` override each agent's built-in defaults, so setting `temperature: 0.3` overrides the scene agent's default of 0.7, the dialog agent's default of 0.85, etc. All of these controls are composable: you can combine `model_policy`, explicit models, `preset`, and `llm_params` in the same request.

## Request Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `query` | string | Yes | Historical moment description (3-500 chars) |
| `generate_image` | boolean | No | Generate AI image (default: false) |
| `preset` | string | No | Quality preset: `hyper`, `balanced` (default), `hd`, `gemini3` |
| `text_model` | string | No | Text model ID — OpenRouter format (`org/model`) or Google native (`gemini-*`). Overrides preset. |
| `image_model` | string | No | Image model ID — `pollinations` for free, or Google native. Overrides preset. |
| `model_policy` | string | No | `"permissive"` for open-weight only, Google-free generation. |
| `llm_params` | object | No | Fine-grained LLM hyperparameters (see table below). |
| `visibility` | string | No | `public` (default) or `private` |
| `callback_url` | string | No | URL to POST results when generation completes (async only) |
| `request_context` | object | No | Opaque context passed through to response |

## LLM Parameters (`llm_params`)

| Parameter | Type | Range | Providers | Description |
|-----------|------|-------|-----------|-------------|
| `temperature` | float | 0.0–2.0 | All | Sampling temperature. Overrides per-agent defaults (0.2 for factual, 0.85 for creative). |
| `max_tokens` | int | 1–32768 | All | Max output tokens per agent call. Preset defaults: hyper=1024, balanced=2048, hd=8192. |
| `top_p` | float | 0.0–1.0 | All | Nucleus sampling threshold. |
| `top_k` | int | >= 1 | All | Top-k sampling — consider only the k most likely tokens. |
| `frequency_penalty` | float | -2.0–2.0 | OpenRouter | Penalize tokens proportionally to frequency in output. |
| `presence_penalty` | float | -2.0–2.0 | OpenRouter | Penalize tokens that have appeared at all in output. |
| `repetition_penalty` | float | 0.0–2.0 | OpenRouter | Multiplicative penalty for repeated tokens. |
| `stop` | string[] | max 4 | All | Stop sequences — generation halts when produced. |
| `thinking_level` | string | — | Google | Reasoning depth: `"none"`, `"low"`, `"medium"`, `"high"`. |
| `system_prompt_prefix` | string | max 2000 | All | Text prepended to every agent's system prompt. |
| `system_prompt_suffix` | string | max 2000 | All | Text appended to every agent's system prompt. |

## Model Selection Priority (highest first)

1. Explicit `text_model` / `image_model`
2. `model_policy: "permissive"` (auto-selects open-weight models, skips Google grounding)
3. `preset` (uses preset's default models)
4. Server defaults

## Examples

**Google-free generation:**
```json
{
"query": "The signing of the Magna Carta, 1215",
"generate_image": true,
"model_policy": "permissive"
}
```

**Specific model with custom params:**
```json
{
"query": "Turing breaks Enigma, 1941",
"text_model": "deepseek/deepseek-r1-0528",
"llm_params": {
"temperature": 0.5,
"max_tokens": 4096,
"top_p": 0.9,
"system_prompt_suffix": "Keep all descriptions under 200 words. Use British English."
}
}
```

**Permissive mode with explicit model override:**
```json
{
"query": "Apollo 11 Moon Landing, 1969",
"model_policy": "permissive",
"text_model": "qwen/qwen3-235b-a22b",
"generate_image": true
}
```

---

*Last updated: 2026-03-11*
Loading