This repo is a small retrieval-augmented generation (RAG) playground I put together: a Next.js chat UI that can add text to a knowledge base and answer from what it has stored—not from the model's training memory alone. I use the AI SDK for streaming, embeddings, and tool calling, Drizzle ORM against PostgreSQL with the pgvector extension, and simple sentence-style chunking before embedding.
RAG stands for retrieval-augmented generation. Instead of hoping the model already "knows" your facts, you retrieve snippets that match the user's question and augment the prompt with that text so generation is grounded in your data.
Large language models only see what they were trained on (plus whatever you put in the context window). That breaks down for private or fresh information: internal docs, product details, notes you typed yesterday, or anything after a model's knowledge cutoff. RAG fixes that by fetching relevant passages at query time and passing them in as context, so answers can stay accurate and attributable to your sources.
A minimal intuition: without extra context, "What's my favorite food?" gets a generic refusal. With retrieved context ("user loves pepperoni + pineapple pizza), the model can answer correctly—because you gave it the fact, not because it remembered you.
Note: Retrieval can be web search, a file store, or anything else. Here I use semantic search over embeddings stored in Postgres. That's one solid pattern, not the only one.
An embedding is a fixed-length vector of numbers that represents text (or other inputs) in a space where similar meaning tends to sit near similar vectors. In practice you compare vectors with cosine similarity (or related distance metrics): scores near 1 mean very aligned, lower scores mean less related.
Embeddings work best on bounded text. Very long inputs often get a muddier vector, which is why we don't embed whole documents as one blob in this project—we chunk first, then embed each chunk.
For a deeper visual intro to the idea behind word vectors, The Illustrated Word2Vec is still a great read.
Chunking means splitting source text into smaller units before embedding. The right strategy depends on the content (legal prose vs. chat logs vs. Markdown). This codebase uses a deliberately simple approach: split on periods after trim, so each chunk is roughly sentence-sized. That's easy to reason about and easy to swap for something smarter (fixed token windows, paragraph splits, overlap between chunks, etc.) later.
After chunking, each piece gets an embedding row linked back to the parent resource in the database.
- Ingest: A server action saves a resource (full text), chunks it, calls
embedManyon the chunks, and inserts rows into anembeddingstable (1536-dim vectors, HNSW index for cosine similarity). - Chat: The UI uses
useChatand streams fromPOST /api/chat. - Agent:
streamTextruns a chat model withstopWhen: stepCountIs(5)so the model can call tools and still produce a follow-up natural-language reply (multi-step tool use). - Tools:
addResource— stores text and refreshes embeddings (ingest path).getInformation— embeds the user's question, runs a vector similarity query in SQL (cosine distance via Drizzle), filters by a similarity threshold, returns top matches for the model to cite from.
The system prompt instructs the model to lean on tool results and say it doesn't know if nothing relevant turns up, so behavior stays tied to the knowledge base.
| Piece | Role |
|---|---|
| Next.js 14 (App Router) | UI, route handler for chat, server actions |
AI SDK (ai, @ai-sdk/react, @ai-sdk/openai) |
streamText, embed / embedMany, tools, streaming UI messages |
OpenAI (via @ai-sdk/openai) |
Chat: gpt-4o; embeddings: text-embedding-ada-002 |
| Drizzle ORM | Schema + queries; vector column + HNSW index |
| Postgres + pgvector | Storage and similarity search |
| shadcn-style UI + Tailwind | Minimal chat shell |
You'll need a DATABASE_URL for Postgres with pgvector (see .env.example). For OpenAI, configure credentials the way @ai-sdk/openai expects in your environment (e.g. OPENAI_API_KEY—check the OpenAI provider docs for your setup).
Typical workflow: pnpm install → set env → pnpm db:migrate (or db:push as you prefer) → pnpm dev.
It's a compact reference for RAG plumbing: chunk → embed → store → retrieve → tool-grounded answers—with streaming and multi-step tool loops in one place. Good for prototyping support bots, personal knowledge bases, or teaching how vectors and chat agents connect without a heavyweight framework.
This is the starter project for the Vercel AI SDK Retrieval-Augmented Generation (RAG) guide.
In this project, you will build a chatbot that will only respond with information that it has within its knowledge base. The chatbot will be able to both store and retrieve information. This project has many interesting use cases from customer support through to building your own second brain!
This project will use the following stack:
- Next.js 14 (App Router)
- Vercel AI SDK
- OpenAI
- Drizzle ORM
- Postgres with pgvector
- shadcn-ui and TailwindCSS for styling