diff --git a/README.md b/README.md index 05abac8..9206ed3 100644 --- a/README.md +++ b/README.md @@ -206,6 +206,18 @@ curl -X POST localhost:8000/api/v1/timepoints/generate/stream \ --- +## Model Control + +Downstream apps can control model selection and generation behavior per-request: + +- **`model_policy: "permissive"`** — Routes all generation through open-weight models (DeepSeek, Llama, Qwen, Mistral) via OpenRouter, uses Pollinations for images, and skips Google grounding. Fully Google-free. +- **`text_model` / `image_model`** — Override preset models with any OpenRouter-compatible model ID (e.g. `qwen/qwen3-235b-a22b`) or Google native model. +- **`llm_params`** — Fine-grained control over temperature, max_tokens, top_p, top_k, penalties, stop sequences, and system prompt injection. Applied to all 14 pipeline agents. + +All three are composable: `model_policy` + explicit models + `llm_params` work together with clear priority ordering. See [docs/API.md](docs/API.md) for the full parameter reference. + +--- + ## API | Endpoint | Description | @@ -279,6 +291,7 @@ python3.10 -m pytest tests/ -v # 522 fast + integration, 11 skipped - [iOS Integration](docs/IOS_INTEGRATION.md) — Auth flow, credit system, endpoint map for iOS client - [Agent Architecture](docs/AGENTS.md) — Pipeline breakdown with example output - [Temporal Navigation](docs/TEMPORAL.md) — Time travel mechanics +- [Downstream Model Control](docs/DOWNSTREAM_MODEL_CONTROL.md) — Model policy, LLM params, and per-request control for downstream apps - [Eval Roadmap](docs/EVAL_ROADMAP.md) — Quality scoring and benchmark plans - [Deployment](docs/DEPLOY.md) — Local, Replit, and production deployment diff --git a/docs/API.md b/docs/API.md index b36f73a..63870f7 100644 --- a/docs/API.md +++ b/docs/API.md @@ -62,6 +62,77 @@ Override preset models for custom configurations: } ``` +**Permissive Mode (Google-Free):** + +Use only open-weight, distillable models — zero Google API calls: +```json +{ + "query": "The signing of the Magna Carta, 1215", + "generate_image": true, + "model_policy": "permissive" +} +``` +Text routes to DeepSeek/Llama/Qwen via OpenRouter, images route to Pollinations, and Google grounding is skipped. Response metadata reflects the actual models used: +```json +{ + "text_model_used": "deepseek/deepseek-r1-0528", + "image_model_used": "pollinations", + "model_provider": "openrouter", + "model_permissiveness": "permissive" +} +``` + +**Composing model_policy with explicit models:** + +`model_policy` and explicit model names are composable — explicit models take priority: +```json +{ + "query": "Apollo 11 Moon Landing, 1969", + "model_policy": "permissive", + "text_model": "qwen/qwen3-235b-a22b", + "generate_image": true +} +``` +This uses the specified Qwen model for text, Pollinations for images (from permissive policy), and skips Google grounding. + +--- + +## LLM Parameters + +The `llm_params` object gives downstream callers fine-grained control over generation hyperparameters. All fields are optional — unset fields use agent/preset defaults. These parameters are applied to every agent in the 14-step pipeline. + +```json +{ + "query": "Turing breaks Enigma, 1941", + "text_model": "deepseek/deepseek-r1-0528", + "llm_params": { + "temperature": 0.5, + "max_tokens": 4096, + "top_p": 0.9, + "system_prompt_suffix": "Keep all descriptions under 200 words. Use British English." + } +} +``` + +| Parameter | Type | Range | Providers | Description | +|-----------|------|-------|-----------|-------------| +| `temperature` | float | 0.0–2.0 | All | Sampling temperature. Overrides per-agent defaults (which range from 0.2 for factual agents to 0.85 for creative agents). | +| `max_tokens` | int | 1–32768 | All | Maximum output tokens per agent call. Preset defaults: hyper=1024, balanced=2048, hd=8192. | +| `top_p` | float | 0.0–1.0 | All | Nucleus sampling — only consider tokens whose cumulative probability is <= top_p. | +| `top_k` | int | >= 1 | All | Top-k sampling — only consider the k most likely tokens at each step. | +| `frequency_penalty` | float | -2.0–2.0 | OpenRouter | Penalize tokens proportionally to how often they've appeared in the output. | +| `presence_penalty` | float | -2.0–2.0 | OpenRouter | Penalize tokens that have appeared at all in the output so far. | +| `repetition_penalty` | float | 0.0–2.0 | OpenRouter | Multiplicative penalty for repeated tokens. | +| `stop` | string[] | max 4 | All | Stop sequences — generation halts when any of these strings is produced. | +| `thinking_level` | string | — | Google | Reasoning depth for thinking models: `"none"`, `"low"`, `"medium"`, `"high"`. | +| `system_prompt_prefix` | string | max 2000 | All | Text prepended to every agent's system prompt. Use for tone, persona, or style injection. | +| `system_prompt_suffix` | string | max 2000 | All | Text appended to every agent's system prompt. Use for constraints, formatting rules, or output instructions. | + +**Notes:** +- Parameters marked "OpenRouter" are silently ignored when the request routes to Google (and vice versa for `thinking_level`). +- `system_prompt_prefix` and `system_prompt_suffix` affect all 14 pipeline agents. Use these to inject cross-cutting concerns (e.g., language, tone, verbosity constraints). +- Request-level `llm_params` override per-agent defaults. For example, if `llm_params.temperature` is set, it overrides the judge agent's default of 0.3, the scene agent's default of 0.7, etc. + --- ## Endpoints Overview @@ -120,12 +191,20 @@ Generate a scene with real-time progress updates via Server-Sent Events. | query | string | Yes | Historical moment (3-500 chars) | | generate_image | boolean | No | Generate AI image (default: false) | | preset | string | No | Quality preset: `hd`, `hyper`, `balanced` (default), `gemini3` | -| text_model | string | No | Override text model (ignores preset) | -| image_model | string | No | Override image model (ignores preset) | +| text_model | string | No | Text model ID — OpenRouter format (`org/model`) or Google native (`gemini-*`). Overrides preset. | +| image_model | string | No | Image model ID — `pollinations` for free open-source, or Google native. Overrides preset. | +| model_policy | string | No | `"permissive"` — selects only open-weight models (Llama, DeepSeek, Qwen) and skips Google-dependent steps. Fully Google-free. Works alongside explicit model overrides. | +| llm_params | object | No | Fine-grained LLM parameters applied to all pipeline agents. See **LLM Parameters** below. | | visibility | string | No | `public` (default) or `private` — controls who can see full data | | callback_url | string | No | URL to POST results to when generation completes (async endpoint only) | | request_context | object | No | Opaque context passed through to response (e.g. `{"source": "clockchain", "job_id": "..."}`) | +**Model selection priority** (highest first): +1. Explicit `text_model` / `image_model` — use exactly these models +2. `model_policy: "permissive"` — auto-select open-weight models, skip Google grounding +3. `preset` — use preset's default models +4. Server defaults + **Response:** SSE stream with events: ``` @@ -1036,4 +1115,4 @@ Rate limit: 60 requests/minute per IP. --- -*Last updated: 2026-02-23* +*Last updated: 2026-03-11* diff --git a/docs/DOWNSTREAM_MODEL_CONTROL.md b/docs/DOWNSTREAM_MODEL_CONTROL.md new file mode 100644 index 0000000..73147db --- /dev/null +++ b/docs/DOWNSTREAM_MODEL_CONTROL.md @@ -0,0 +1,82 @@ +# Downstream Model Control — TIMEPOINT Flash + +**For teams building on TIMEPOINT Flash (Web App, iPhone App, Clockchain, Billing, Enterprise integrations)** + +TIMEPOINT Flash now supports full downstream control of model selection and generation hyperparameters on every generation request. Downstream apps can set `model_policy: "permissive"` to route all 14 pipeline agents through open-weight models (DeepSeek R1, Llama, Qwen, Mistral) via OpenRouter with Pollinations for images — making the entire pipeline fully Google-free with zero Google API calls, including grounding. Apps can also specify exact models by name using `text_model` and `image_model` (any OpenRouter-compatible model ID like `qwen/qwen3-235b-a22b` or Google native like `gemini-2.5-flash`), and these explicit overrides take priority over `model_policy`, which in turn takes priority over `preset`. In addition, the new `llm_params` object provides fine-grained control over generation hyperparameters — temperature, max_tokens, top_p, top_k, frequency/presence/repetition penalties, stop sequences, thinking level, and system prompt injection (prefix/suffix) — all applied uniformly across every agent in the pipeline. Request-level `llm_params` override each agent's built-in defaults, so setting `temperature: 0.3` overrides the scene agent's default of 0.7, the dialog agent's default of 0.85, etc. All of these controls are composable: you can combine `model_policy`, explicit models, `preset`, and `llm_params` in the same request. + +## Request Parameters + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `query` | string | Yes | Historical moment description (3-500 chars) | +| `generate_image` | boolean | No | Generate AI image (default: false) | +| `preset` | string | No | Quality preset: `hyper`, `balanced` (default), `hd`, `gemini3` | +| `text_model` | string | No | Text model ID — OpenRouter format (`org/model`) or Google native (`gemini-*`). Overrides preset. | +| `image_model` | string | No | Image model ID — `pollinations` for free, or Google native. Overrides preset. | +| `model_policy` | string | No | `"permissive"` for open-weight only, Google-free generation. | +| `llm_params` | object | No | Fine-grained LLM hyperparameters (see table below). | +| `visibility` | string | No | `public` (default) or `private` | +| `callback_url` | string | No | URL to POST results when generation completes (async only) | +| `request_context` | object | No | Opaque context passed through to response | + +## LLM Parameters (`llm_params`) + +| Parameter | Type | Range | Providers | Description | +|-----------|------|-------|-----------|-------------| +| `temperature` | float | 0.0–2.0 | All | Sampling temperature. Overrides per-agent defaults (0.2 for factual, 0.85 for creative). | +| `max_tokens` | int | 1–32768 | All | Max output tokens per agent call. Preset defaults: hyper=1024, balanced=2048, hd=8192. | +| `top_p` | float | 0.0–1.0 | All | Nucleus sampling threshold. | +| `top_k` | int | >= 1 | All | Top-k sampling — consider only the k most likely tokens. | +| `frequency_penalty` | float | -2.0–2.0 | OpenRouter | Penalize tokens proportionally to frequency in output. | +| `presence_penalty` | float | -2.0–2.0 | OpenRouter | Penalize tokens that have appeared at all in output. | +| `repetition_penalty` | float | 0.0–2.0 | OpenRouter | Multiplicative penalty for repeated tokens. | +| `stop` | string[] | max 4 | All | Stop sequences — generation halts when produced. | +| `thinking_level` | string | — | Google | Reasoning depth: `"none"`, `"low"`, `"medium"`, `"high"`. | +| `system_prompt_prefix` | string | max 2000 | All | Text prepended to every agent's system prompt. | +| `system_prompt_suffix` | string | max 2000 | All | Text appended to every agent's system prompt. | + +## Model Selection Priority (highest first) + +1. Explicit `text_model` / `image_model` +2. `model_policy: "permissive"` (auto-selects open-weight models, skips Google grounding) +3. `preset` (uses preset's default models) +4. Server defaults + +## Examples + +**Google-free generation:** +```json +{ + "query": "The signing of the Magna Carta, 1215", + "generate_image": true, + "model_policy": "permissive" +} +``` + +**Specific model with custom params:** +```json +{ + "query": "Turing breaks Enigma, 1941", + "text_model": "deepseek/deepseek-r1-0528", + "llm_params": { + "temperature": 0.5, + "max_tokens": 4096, + "top_p": 0.9, + "system_prompt_suffix": "Keep all descriptions under 200 words. Use British English." + } +} +``` + +**Permissive mode with explicit model override:** +```json +{ + "query": "Apollo 11 Moon Landing, 1969", + "model_policy": "permissive", + "text_model": "qwen/qwen3-235b-a22b", + "generate_image": true +} +``` + +--- + +*Last updated: 2026-03-11*