Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
b45f564
docs(adr): add ADR0056 for diagram awareness architecture
m2ux Dec 29, 2025
3a7c7ac
feat(domain): add Visual model and VisualRepository interface
m2ux Dec 29, 2025
8d29334
feat(infra): add LanceDB visual repository implementation
m2ux Dec 29, 2025
2c4ca57
feat(scripts): add migration script for visuals table
m2ux Dec 29, 2025
f616750
feat(visual): add visual extraction pipeline (M2)
m2ux Dec 29, 2025
ef8fcf7
feat(visual): add description generation script (M3)
m2ux Dec 29, 2025
a01ea5e
feat(mcp): add get_visuals tool (M4)
m2ux Dec 29, 2025
906b455
docs: add get_visuals to tool selection guide (M4)
m2ux Dec 29, 2025
d0d1a8c
docs(adr): update ADR status to Accepted
m2ux Dec 29, 2025
93c2b90
test(visual): add test database seeding and verification scripts
m2ux Dec 29, 2025
610d3cd
feat(visual): extract embedded images from PDFs using pdfimages
m2ux Dec 29, 2025
57b2e51
feat(visual): use human-readable folder names for extracted images
m2ux Dec 30, 2025
ce598a2
feat(tools): add catalog_id and title to search outputs, integrate vi…
m2ux Dec 30, 2025
724a923
docs: update tool schemas to reflect catalog_id and visuals integration
m2ux Dec 30, 2025
8971483
test(e2e): add visual search integration tests
m2ux Dec 30, 2025
fd8e7f9
test(e2e): add semantic relevance validation for visual search
m2ux Dec 30, 2025
9f95dc3
chore(config): update default concept model to gemini-3-flash-preview
m2ux Dec 30, 2025
2f7d6a5
fix(visual): suppress noisy parse warnings for empty LLM responses
m2ux Dec 30, 2025
2dd9c2a
feat(visual): embed EXIF metadata in extracted PNG images
m2ux Jan 1, 2026
5e44bde
Merge feat/content-metadata-extraction into feat/diagram-awareness
m2ux Jan 2, 2026
b16d6d6
feat(visuals): add pre-filter pipeline for OCR-scanned documents
m2ux Jan 2, 2026
ce2e192
feat(visuals): add local classification using LayoutParser
m2ux Jan 2, 2026
90ce5ef
fix: skip visual extraction for scanned/OCR documents
m2ux Jan 4, 2026
b9afa01
feat(visual): add EPUB visual extraction support
m2ux Jan 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
116 changes: 98 additions & 18 deletions docs/api-reference.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Concept-RAG API Reference

**Schema Version:** v7 (December 2025)
**Tools:** 10 MCP tools
**Schema Version:** v8 (December 2025)
**Tools:** 12 MCP tools

This document provides JSON input and output schemas for all MCP tools. For tool selection guidance, decision trees, and usage patterns, see [tool-selection-guide.md](tool-selection-guide.md).

Expand Down Expand Up @@ -32,7 +32,8 @@ Search document summaries and metadata to discover relevant documents.
```json
[
{
"source": "string",
"catalog_id": 0,
"title": "string",
"summary": "string",
"score": "string",
"expanded_terms": ["string"]
Expand All @@ -42,7 +43,8 @@ Search document summaries and metadata to discover relevant documents.

| Field | Type | Description |
|-------|------|-------------|
| `source` | string | Full file path to document |
| `catalog_id` | number | Document ID for subsequent tool calls |
| `title` | string | Document title |
| `summary` | string | Document summary text |
| `score` | string | Combined hybrid score (0.000-1.000) |
| `expanded_terms` | string[] | Expanded query terms |
Expand Down Expand Up @@ -88,8 +90,11 @@ Search across all document chunks using hybrid search.
```json
[
{
"catalog_id": 0,
"title": "string",
"text": "string",
"source": "string",
"page_number": 0,
"concepts": ["string"],
"score": "string",
"expanded_terms": ["string"]
}
Expand All @@ -98,8 +103,11 @@ Search across all document chunks using hybrid search.

| Field | Type | Description |
|-------|------|-------------|
| `catalog_id` | number | Document ID for subsequent tool calls |
| `title` | string | Document title |
| `text` | string | Chunk content |
| `source` | string | Source document path |
| `page_number` | number | Page number in document |
| `concepts` | string[] | Concept names in chunk |
| `score` | string | Combined hybrid score (0.000-1.000) |
| `expanded_terms` | string[] | Expanded query terms |

Expand Down Expand Up @@ -127,25 +135,24 @@ Search within a single known document.
```json
{
"text": "string",
"source": "string"
"catalog_id": 0
}
```

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `text` | string | βœ… | β€” | Search query |
| `source` | string | βœ… | β€” | Full file path of document |
| `catalog_id` | number | βœ… | β€” | Document ID from `catalog_search` |

> **Debug Output:** Enable via `DEBUG_SEARCH=true` environment variable.
> **Note:** First use `catalog_search` to find the document and get its `catalog_id`.

#### Output Schema

```json
[
{
"text": "string",
"source": "string",
"title": "string",
"text": "string",
"concepts": ["string"],
"concept_ids": [0]
}
Expand All @@ -154,13 +161,12 @@ Search within a single known document.

| Field | Type | Description |
|-------|------|-------------|
| `text` | string | Chunk content |
| `source` | string | Source document path |
| `title` | string | Document title |
| `text` | string | Chunk content |
| `concepts` | string[] | Concept names in chunk |
| `concept_ids` | number[] | Concept IDs |

**Limits:** 5 chunks max (fixed limit for single-document search).
**Limits:** Top chunks from the document (fixed limit for single-document search).

---

Expand All @@ -175,14 +181,14 @@ Find chunks associated with a concept, organized hierarchically.
```json
{
"concept": "string",
"source_filter": "string"
"title_filter": "string"
}
```

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `concept` | string | βœ… | β€” | Concept to search for |
| `source_filter` | string | ❌ | β€” | Filter by source path |
| `title_filter` | string | ❌ | β€” | Filter by document title |

**Result Filtering:** Returns all matching sources and chunks (no fixed limit).

Expand All @@ -195,12 +201,14 @@ Find chunks associated with a concept, organized hierarchically.
"concept": "string",
"concept_id": 0,
"summary": "string",
"image_ids": [0],
"related_concepts": ["string"],
"synonyms": ["string"],
"broader_terms": ["string"],
"narrower_terms": ["string"],
"sources": [
{
"catalog_id": 0,
"title": "string",
"pages": [0],
"match_type": "primary|related",
Expand All @@ -209,8 +217,9 @@ Find chunks associated with a concept, organized hierarchically.
],
"chunks": [
{
"text": "string",
"catalog_id": 0,
"title": "string",
"text": "string",
"page": 0,
"concept_density": "string",
"concepts": ["string"]
Expand All @@ -220,7 +229,8 @@ Find chunks associated with a concept, organized hierarchically.
"total_documents": 0,
"total_chunks": 0,
"sources_returned": 0,
"chunks_returned": 0
"chunks_returned": 0,
"images_found": 0
},
"score": "string"
}
Expand All @@ -231,18 +241,23 @@ Find chunks associated with a concept, organized hierarchically.
| `concept` | string | Matched concept name |
| `concept_id` | number | Concept identifier |
| `summary` | string | Concept summary |
| `image_ids` | number[] | Visual IDs for `get_visuals` |
| `related_concepts` | string[] | Related concepts |
| `synonyms` | string[] | Alternative names |
| `broader_terms` | string[] | More general concepts |
| `narrower_terms` | string[] | More specific concepts |
| `sources[].catalog_id` | number | Document ID |
| `sources[].title` | string | Document title |
| `sources[].pages` | number[] | Page numbers |
| `sources[].match_type` | string | `"primary"` or `"related"` |
| `sources[].via_concept` | string? | Linking concept if related |
| `chunks[].catalog_id` | number | Document ID |
| `chunks[].title` | string | Document title |
| `chunks[].text` | string | Chunk content |
| `chunks[].page` | number | Page number |
| `chunks[].concept_density` | string | Prominence (0.000-1.000) |
| `stats` | object | Search statistics |
| `stats.images_found` | number | Count of associated visuals |
| `score` | string | Combined hybrid score (0.000-1.000) |

#### Additional Fields with Debug Enabled
Expand Down Expand Up @@ -578,6 +593,70 @@ Find concepts in a category's documents.

---

## Visual Content

### get_visuals

Retrieve visual content (diagrams, charts, tables, figures) from documents.

#### Input Schema

```json
{
"ids": [0],
"catalog_id": 0,
"visual_type": "diagram|flowchart|chart|table|figure",
"concept": "string",
"limit": 20
}
```

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `ids` | number[] | ❌ | β€” | Retrieve specific visuals by ID (from `concept_search` `image_ids`) |
| `catalog_id` | number | ❌ | β€” | Filter by document ID |
| `visual_type` | string | ❌ | β€” | Filter by type |
| `concept` | string | ❌ | β€” | Filter by associated concept |
| `limit` | number | ❌ | `20` | Maximum results |

> **Note:** Use `ids` to fetch visuals returned by `concept_search` `image_ids`. Use `catalog_id` to browse all visuals in a document.

#### Output Schema

```json
{
"visuals": [
{
"id": 0,
"catalog_id": 0,
"catalog_title": "string",
"visual_type": "string",
"page_number": 0,
"description": "string",
"image_path": "string",
"concepts": ["string"]
}
],
"total_returned": 0,
"filters_applied": {}
}
```

| Field | Type | Description |
|-------|------|-------------|
| `visuals[].id` | number | Visual ID |
| `visuals[].catalog_id` | number | Document ID |
| `visuals[].catalog_title` | string | Document title |
| `visuals[].visual_type` | string | Type: diagram, flowchart, chart, table, figure |
| `visuals[].page_number` | number | Page in document |
| `visuals[].description` | string | Semantic description |
| `visuals[].image_path` | string | Path to image file |
| `visuals[].concepts` | string[] | Associated concept names |
| `total_returned` | number | Count of visuals returned |
| `filters_applied` | object | Applied filter parameters |

---

## Error Schema

All tools return errors in this format:
Expand Down Expand Up @@ -630,3 +709,4 @@ All tools return errors in this format:
| `category_search` | 30-130ms |
| `list_categories` | 10-50ms |
| `list_concepts_in_category` | 30-100ms |
| `get_visuals` | 20-100ms |
Loading
Loading