Skip to content

youdotcom-oss/ydc-private-rag-sample

Repository files navigation

RAG Demo — You.com

Live demo →

A minimal, end-to-end Retrieval-Augmented Generation (RAG) demo built with Next.js. Upload a .txt file, ask a question, and get a grounded answer streamed back — powered by local embeddings and the You.com Express Agent.


How RAG Works

Large language models are powerful, but they only know what was in their training data. When you need answers grounded in your documents — not general knowledge — you need RAG.

RAG solves this in three steps:

① INDEXING (per request)
Your Document → [Chunk & Embed] → In-Memory Vector Store

② QUERYING
Your Question → [Embed] → Question Vector
                               ↓
                  Compare against Vector Store
                               ↓
                         Top Chunks
                    (most semantically similar passages)
                               ↓
③ Your Question + Top Chunks → [You.com Express Agent] → Streamed Answer

1. Index

The uploaded document is split into small, overlapping chunks of text. Each chunk is converted into a vector embedding — an array of numbers (e.g. [0.21, -0.87, 0.54, ...]) that represents the meaning of that text. Similar ideas produce similar numbers, which is what makes semantic search possible.

Unlike a traditional RAG system, this demo holds the index entirely in memory for the duration of the request — no database, no disk writes.

2. Retrieve

When a user asks a question, that question is run through the same embedding model to produce its own vector. The in-memory vector store compares it against all stored document vectors and returns the closest matches — the chunks most semantically related to the question.

This is why "What movies involve a kid and a father?" can surface content about Interstellar even if the word "father" never appears in the text.

3. Generate

The retrieved chunks are inserted into a prompt alongside the original question. The You.com Express Agent reads that context and streams a grounded answer back to the browser — without hallucinating facts it doesn't have.


Why You.com?

You.com's Express Agent is a fast, capable LLM endpoint designed for agentic and RAG workflows. In this pipeline it handles the generation step — it receives the retrieved context and question, then streams a precise, grounded answer.

Key advantages:

  • Streaming by default — responses stream token-by-token, keeping latency low
  • No hallucination pressure — explicitly prompted to answer only from provided context
  • No mandatory web search — tools are opt-in, so the agent stays focused on your documents
  • Simple API — one SDK call with the express agent

Project Structure

app/
  page.tsx            # UI: file upload, query input, streamed response
  api/
    query/
      route.ts        # API route: embed → retrieve → generate pipeline
public/
  classroom-favorite-movies.txt   # Sample document

How the Code Works

app/api/query/route.ts — Index, Retrieve & Generate

Accepts a POST request with { files, query, apiKey } and runs all three RAG steps server-side.

Embedding model: BAAI/bge-small-en-v1.5 runs entirely on-device via @llamaindex/huggingface — no data leaves your machine during the embedding step. The model is bundled with the deployment under /models so no download is needed at runtime.

In-memory index: Unlike a persistent RAG system, there's no disk storage. VectorStoreIndex.fromDocuments() is called without a storageContext, so the entire index lives in memory for the lifetime of the request. This keeps the architecture simple and stateless.

// No storageContext → stays entirely in memory, gone after the request
return VectorStoreIndex.fromDocuments(documents);

Chunking & retrieval: LlamaIndex's VectorStoreIndex handles splitting the document into chunks and cosine-similarity search. retrieve() returns the top-3 most relevant chunks, which are formatted into a numbered context block and injected into the You.com prompt.

Streaming: The You.com Express Agent streams its response token-by-token. The route forwards that stream directly to the browser as a text/plain ReadableStream:

const stream = await you.agentsRuns({ agent: "express", input, stream: true });

for await (const chunk of stream) {
  if (chunk.data.type === "response.output_text.delta") {
    controller.enqueue(encoder.encode(chunk.data.response.delta));
  }
}

app/page.tsx — UI

The browser handles everything else:

  • API key input (passed directly in the request — no .env.local needed)
  • File selection, including multiple files, via the File API (or loading the built-in example)
  • Reading files as text and POSTing them to /api/query
  • Reading the streamed response and appending tokens to the UI in real time

Getting Started

1. Install dependencies

npm install

2. Run the dev server

npm run dev

Open localhost:3000.

3. Try it

Enter your You.com API key (get one at you.com/platform), upload one or more .txt files (or click "Use our example"), and ask a question. The example file is a short story about a fifth-grade class sharing their favorite movies — try asking:

  • "What is Elijah's favorite movie?"
  • "Which students like animated films?"
  • "Who relates most to their favorite movie character?"

Dependencies

Package Purpose
next App framework — UI and API routes
llamaindex Document chunking, vector store, and similarity search
@llamaindex/huggingface On-device embeddings via BAAI/bge-small-en-v1.5
@youdotcom-oss/sdk You.com Express Agent for answer generation

About

This demonstrates feeding the You.com Express agent private RAG data to answer a question in browser

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors