RAG Demo — You.com

A minimal, end-to-end Retrieval-Augmented Generation (RAG) demo built with Next.js. Upload a .txt file, ask a question, and get a grounded answer streamed back — powered by local embeddings and the You.com Express Agent.

How RAG Works

Large language models are powerful, but they only know what was in their training data. When you need answers grounded in your documents — not general knowledge — you need RAG.

RAG solves this in three steps:

① INDEXING (per request)
Your Document → [Chunk & Embed] → In-Memory Vector Store

② QUERYING
Your Question → [Embed] → Question Vector
                               ↓
                  Compare against Vector Store
                               ↓
                         Top Chunks
                    (most semantically similar passages)
                               ↓
③ Your Question + Top Chunks → [You.com Express Agent] → Streamed Answer

1. Index

The uploaded document is split into small, overlapping chunks of text. Each chunk is converted into a vector embedding — an array of numbers (e.g. [0.21, -0.87, 0.54, ...]) that represents the meaning of that text. Similar ideas produce similar numbers, which is what makes semantic search possible.

Unlike a traditional RAG system, this demo holds the index entirely in memory for the duration of the request — no database, no disk writes.

2. Retrieve

When a user asks a question, that question is run through the same embedding model to produce its own vector. The in-memory vector store compares it against all stored document vectors and returns the closest matches — the chunks most semantically related to the question.

This is why "What movies involve a kid and a father?" can surface content about Interstellar even if the word "father" never appears in the text.

3. Generate

The retrieved chunks are inserted into a prompt alongside the original question. The You.com Express Agent reads that context and streams a grounded answer back to the browser — without hallucinating facts it doesn't have.

Why You.com?

You.com's Express Agent is a fast, capable LLM endpoint designed for agentic and RAG workflows. In this pipeline it handles the generation step — it receives the retrieved context and question, then streams a precise, grounded answer.

Key advantages:

Streaming by default — responses stream token-by-token, keeping latency low
No hallucination pressure — explicitly prompted to answer only from provided context
No mandatory web search — tools are opt-in, so the agent stays focused on your documents
Simple API — one SDK call with the express agent

Project Structure

app/
  page.tsx            # UI: file upload, query input, streamed response
  api/
    query/
      route.ts        # API route: embed → retrieve → generate pipeline
public/
  classroom-favorite-movies.txt   # Sample document

How the Code Works

`app/api/query/route.ts` — Index, Retrieve & Generate

Accepts a POST request with { files, query, apiKey } and runs all three RAG steps server-side.

Embedding model: BAAI/bge-small-en-v1.5 runs entirely on-device via @llamaindex/huggingface — no data leaves your machine during the embedding step. The model is bundled with the deployment under /models so no download is needed at runtime.

In-memory index: Unlike a persistent RAG system, there's no disk storage. VectorStoreIndex.fromDocuments() is called without a storageContext, so the entire index lives in memory for the lifetime of the request. This keeps the architecture simple and stateless.

// No storageContext → stays entirely in memory, gone after the request
return VectorStoreIndex.fromDocuments(documents);

Chunking & retrieval: LlamaIndex's VectorStoreIndex handles splitting the document into chunks and cosine-similarity search. retrieve() returns the top-3 most relevant chunks, which are formatted into a numbered context block and injected into the You.com prompt.

Streaming: The You.com Express Agent streams its response token-by-token. The route forwards that stream directly to the browser as a text/plain ReadableStream:

const stream = await you.agentsRuns({ agent: "express", input, stream: true });

for await (const chunk of stream) {
  if (chunk.data.type === "response.output_text.delta") {
    controller.enqueue(encoder.encode(chunk.data.response.delta));
  }
}

`app/page.tsx` — UI

The browser handles everything else:

API key input (passed directly in the request — no .env.local needed)
File selection, including multiple files, via the File API (or loading the built-in example)
Reading files as text and POSTing them to /api/query
Reading the streamed response and appending tokens to the UI in real time

Getting Started

1. Install dependencies

npm install

2. Run the dev server

npm run dev

Open localhost:3000.

3. Try it

Enter your You.com API key (get one at you.com/platform), upload one or more .txt files (or click "Use our example"), and ask a question. The example file is a short story about a fifth-grade class sharing their favorite movies — try asking:

"What is Elijah's favorite movie?"
"Which students like animated films?"
"Who relates most to their favorite movie character?"

Dependencies

Package	Purpose
`next`	App framework — UI and API routes
`llamaindex`	Document chunking, vector store, and similarity search
`@llamaindex/huggingface`	On-device embeddings via `BAAI/bge-small-en-v1.5`
`@youdotcom-oss/sdk`	You.com Express Agent for answer generation

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
app		app
models/BAAI/bge-small-en-v1.5		models/BAAI/bge-small-en-v1.5
public		public
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Demo — You.com

How RAG Works

1. Index

2. Retrieve

3. Generate

Why You.com?

Project Structure

How the Code Works

`app/api/query/route.ts` — Index, Retrieve & Generate

`app/page.tsx` — UI

Getting Started

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG Demo — You.com

How RAG Works

1. Index

2. Retrieve

3. Generate

Why You.com?

Project Structure

How the Code Works

app/api/query/route.ts — Index, Retrieve & Generate

app/page.tsx — UI

Getting Started

Dependencies

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`app/api/query/route.ts` — Index, Retrieve & Generate

`app/page.tsx` — UI

Packages