AI SDK + RAG chat (Postgres / pgvector)

This repo is a small retrieval-augmented generation (RAG) playground I put together: a Next.js chat UI that can add text to a knowledge base and answer from what it has stored—not from the model's training memory alone. I use the AI SDK for streaming, embeddings, and tool calling, Drizzle ORM against PostgreSQL with the pgvector extension, and simple sentence-style chunking before embedding.

What is RAG?

RAG stands for retrieval-augmented generation. Instead of hoping the model already "knows" your facts, you retrieve snippets that match the user's question and augment the prompt with that text so generation is grounded in your data.

Why is RAG important?

Large language models only see what they were trained on (plus whatever you put in the context window). That breaks down for private or fresh information: internal docs, product details, notes you typed yesterday, or anything after a model's knowledge cutoff. RAG fixes that by fetching relevant passages at query time and passing them in as context, so answers can stay accurate and attributable to your sources.

A minimal intuition: without extra context, "What's my favorite food?" gets a generic refusal. With retrieved context ("user loves pepperoni + pineapple pizza), the model can answer correctly—because you gave it the fact, not because it remembered you.

Note: Retrieval can be web search, a file store, or anything else. Here I use semantic search over embeddings stored in Postgres. That's one solid pattern, not the only one.

What is an embedding?

An embedding is a fixed-length vector of numbers that represents text (or other inputs) in a space where similar meaning tends to sit near similar vectors. In practice you compare vectors with cosine similarity (or related distance metrics): scores near 1 mean very aligned, lower scores mean less related.

Embeddings work best on bounded text. Very long inputs often get a muddier vector, which is why we don't embed whole documents as one blob in this project—we chunk first, then embed each chunk.

For a deeper visual intro to the idea behind word vectors, The Illustrated Word2Vec is still a great read.

What is chunking?

Chunking means splitting source text into smaller units before embedding. The right strategy depends on the content (legal prose vs. chat logs vs. Markdown). This codebase uses a deliberately simple approach: split on periods after trim, so each chunk is roughly sentence-sized. That's easy to reason about and easy to swap for something smarter (fixed token windows, paragraph splits, overlap between chunks, etc.) later.

After chunking, each piece gets an embedding row linked back to the parent resource in the database.

How this implementation fits together

Ingest: A server action saves a resource (full text), chunks it, calls embedMany on the chunks, and inserts rows into an embeddings table (1536-dim vectors, HNSW index for cosine similarity).
Chat: The UI uses useChat and streams from POST /api/chat.
Agent: streamText runs a chat model with stopWhen: stepCountIs(5) so the model can call tools and still produce a follow-up natural-language reply (multi-step tool use).
Tools:
- addResource — stores text and refreshes embeddings (ingest path).
- getInformation — embeds the user's question, runs a vector similarity query in SQL (cosine distance via Drizzle), filters by a similarity threshold, returns top matches for the model to cite from.

The system prompt instructs the model to lean on tool results and say it doesn't know if nothing relevant turns up, so behavior stays tied to the knowledge base.

Stack (this repo)

Piece	Role
Next.js 14 (App Router)	UI, route handler for chat, server actions
AI SDK (`ai`, `@ai-sdk/react`, `@ai-sdk/openai`)	`streamText`, `embed` / `embedMany`, tools, streaming UI messages
OpenAI (via `@ai-sdk/openai`)	Chat: `gpt-4o`; embeddings: `text-embedding-ada-002`
Drizzle ORM	Schema + queries; `vector` column + HNSW index
Postgres + pgvector	Storage and similarity search
shadcn-style UI + Tailwind	Minimal chat shell

You'll need a DATABASE_URL for Postgres with pgvector (see .env.example). For OpenAI, configure credentials the way @ai-sdk/openai expects in your environment (e.g. OPENAI_API_KEY—check the OpenAI provider docs for your setup).

Typical workflow: pnpm install → set env → pnpm db:migrate (or db:push as you prefer) → pnpm dev.

Why I keep this around

It's a compact reference for RAG plumbing: chunk → embed → store → retrieve → tool-grounded answers—with streaming and multi-step tool loops in one place. Good for prototyping support bots, personal knowledge bases, or teaching how vectors and chat agents connect without a heavyweight framework.

Vercel AI SDK RAG Guide Starter Project

This is the starter project for the Vercel AI SDK Retrieval-Augmented Generation (RAG) guide.

In this project, you will build a chatbot that will only respond with information that it has within its knowledge base. The chatbot will be able to both store and retrieve information. This project has many interesting use cases from customer support through to building your own second brain!

This project will use the following stack:

Next.js 14 (App Router)
Vercel AI SDK
OpenAI
Drizzle ORM
Postgres with pgvector
shadcn-ui and TailwindCSS for styling

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
app		app
components/ui		components/ui
lib		lib
public		public
.env.example		.env.example
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
README.md		README.md
components.json		components.json
drizzle.config.ts		drizzle.config.ts
kirimase.config.json		kirimase.config.json
next.config.mjs		next.config.mjs
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI SDK + RAG chat (Postgres / pgvector)

What is RAG?

Why is RAG important?

What is an embedding?

What is chunking?

How this implementation fits together

Stack (this repo)

Why I keep this around

Vercel AI SDK RAG Guide Starter Project

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI SDK + RAG chat (Postgres / pgvector)

What is RAG?

Why is RAG important?

What is an embedding?

What is chunking?

How this implementation fits together

Stack (this repo)

Why I keep this around

Vercel AI SDK RAG Guide Starter Project

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages