Skip to content

longda/ai-sdk-rag

Repository files navigation

AI SDK + RAG chat (Postgres / pgvector)

This repo is a small retrieval-augmented generation (RAG) playground I put together: a Next.js chat UI that can add text to a knowledge base and answer from what it has stored—not from the model's training memory alone. I use the AI SDK for streaming, embeddings, and tool calling, Drizzle ORM against PostgreSQL with the pgvector extension, and simple sentence-style chunking before embedding.


What is RAG?

RAG stands for retrieval-augmented generation. Instead of hoping the model already "knows" your facts, you retrieve snippets that match the user's question and augment the prompt with that text so generation is grounded in your data.


Why is RAG important?

Large language models only see what they were trained on (plus whatever you put in the context window). That breaks down for private or fresh information: internal docs, product details, notes you typed yesterday, or anything after a model's knowledge cutoff. RAG fixes that by fetching relevant passages at query time and passing them in as context, so answers can stay accurate and attributable to your sources.

A minimal intuition: without extra context, "What's my favorite food?" gets a generic refusal. With retrieved context ("user loves pepperoni + pineapple pizza), the model can answer correctly—because you gave it the fact, not because it remembered you.

Note: Retrieval can be web search, a file store, or anything else. Here I use semantic search over embeddings stored in Postgres. That's one solid pattern, not the only one.


What is an embedding?

An embedding is a fixed-length vector of numbers that represents text (or other inputs) in a space where similar meaning tends to sit near similar vectors. In practice you compare vectors with cosine similarity (or related distance metrics): scores near 1 mean very aligned, lower scores mean less related.

Embeddings work best on bounded text. Very long inputs often get a muddier vector, which is why we don't embed whole documents as one blob in this project—we chunk first, then embed each chunk.

For a deeper visual intro to the idea behind word vectors, The Illustrated Word2Vec is still a great read.


What is chunking?

Chunking means splitting source text into smaller units before embedding. The right strategy depends on the content (legal prose vs. chat logs vs. Markdown). This codebase uses a deliberately simple approach: split on periods after trim, so each chunk is roughly sentence-sized. That's easy to reason about and easy to swap for something smarter (fixed token windows, paragraph splits, overlap between chunks, etc.) later.

After chunking, each piece gets an embedding row linked back to the parent resource in the database.


How this implementation fits together

  1. Ingest: A server action saves a resource (full text), chunks it, calls embedMany on the chunks, and inserts rows into an embeddings table (1536-dim vectors, HNSW index for cosine similarity).
  2. Chat: The UI uses useChat and streams from POST /api/chat.
  3. Agent: streamText runs a chat model with stopWhen: stepCountIs(5) so the model can call tools and still produce a follow-up natural-language reply (multi-step tool use).
  4. Tools:
    • addResource — stores text and refreshes embeddings (ingest path).
    • getInformation — embeds the user's question, runs a vector similarity query in SQL (cosine distance via Drizzle), filters by a similarity threshold, returns top matches for the model to cite from.

The system prompt instructs the model to lean on tool results and say it doesn't know if nothing relevant turns up, so behavior stays tied to the knowledge base.


Stack (this repo)

Piece Role
Next.js 14 (App Router) UI, route handler for chat, server actions
AI SDK (ai, @ai-sdk/react, @ai-sdk/openai) streamText, embed / embedMany, tools, streaming UI messages
OpenAI (via @ai-sdk/openai) Chat: gpt-4o; embeddings: text-embedding-ada-002
Drizzle ORM Schema + queries; vector column + HNSW index
Postgres + pgvector Storage and similarity search
shadcn-style UI + Tailwind Minimal chat shell

You'll need a DATABASE_URL for Postgres with pgvector (see .env.example). For OpenAI, configure credentials the way @ai-sdk/openai expects in your environment (e.g. OPENAI_API_KEY—check the OpenAI provider docs for your setup).

Typical workflow: pnpm install → set env → pnpm db:migrate (or db:push as you prefer) → pnpm dev.


Why I keep this around

It's a compact reference for RAG plumbing: chunk → embed → store → retrieve → tool-grounded answers—with streaming and multi-step tool loops in one place. Good for prototyping support bots, personal knowledge bases, or teaching how vectors and chat agents connect without a heavyweight framework.


Vercel AI SDK RAG Guide Starter Project

This is the starter project for the Vercel AI SDK Retrieval-Augmented Generation (RAG) guide.

In this project, you will build a chatbot that will only respond with information that it has within its knowledge base. The chatbot will be able to both store and retrieve information. This project has many interesting use cases from customer support through to building your own second brain!

This project will use the following stack:

About

Next.js RAG chat using the AI SDK, OpenAI embeddings, and Postgres + pgvector for semantic search and tool-grounded answers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors