A RAG-based personal knowledge base using natural language for both input and querying, built with Go.
- Vector Database: Milvus
- Metadata Storage: Redis
- Embedding Generation: VoyageAI
- Reranking: VoyageAI Rerank
- LLM: Anthropic Claude Sonnet
- Email Service: Mailjet (interface-based design)
- Backend Language: Go
The personal knowledge base consists of these major components:
- Document Processing Pipeline: Handles incoming natural language text
- Storage System: Manages both vector embeddings and metadata
- Query Processing System: Processes natural language queries with reranking for improved accuracy
- Response Generation System: Creates natural language responses
- Docker and Docker Compose
- Go 1.21 or later
- VoyageAI API key
- Anthropic Claude API key
- Copy the example environment file and fill in your API keys:
cp exmample.env .env- Edit
.envwith your VoyageAI and Anthropic API keys
Start all services using Docker Compose:
docker-compose up -dNavigate to localhost:3000 to interact with hippocamp using the web UI. Here you can add new documents and ask questions using natural lanugage queries.
curl -X POST http://localhost:8080/api/documents \
-H "Content-Type: application/json" \
-d '{
"title": "Example Document",
"content": "This is the content of the document that will be processed and stored in the knowledge base."
}'curl http://localhost:8080/api/documentscurl http://localhost:8080/api/documents/{document_id}curl -X POST http://localhost:8080/api/query \
-H "Content-Type: application/json" \
-d '{
"text": "What information do you have about example topics?"
}'curl -X DELETE http://localhost:8080/api/documents/{document_id}The system uses VoyageAI's reranking models to improve retrieval quality. This is configured through the following environment variables:
RERANKER_ENABLED: Set totrueto enable reranking (default:true)RERANKER_MODEL: The reranking model to use (default:rerank-2). Available options:rerank-2: Best quality, 16K token contextrerank-2-lite: Good quality with faster speed, 8K token contextrerank-1: Legacy model (not recommended)rerank-lite-1: Legacy model (not recommended)
RERANKER_TOP_K: Number of top results to return after reranking (default:10)
Reranking improves retrieval quality by using a more sophisticated cross-encoder model that considers both the query and document together for relevance assessment.
When documents are added, they are:
- Split into chunks with configurable size and overlap
- Embedded using VoyageAI
- Stored in Milvus for vector search
- Metadata is stored in Redis
When queries are processed:
- The query is embedded using the same VoyageAI model
- Similar chunks are retrieved from Milvus
- Chunks are used as context for Claude to generate a response
- The response is returned with citations to the source material
The application is configured via environment variables:
VOYAGEAI_API_KEY: API key for VoyageAI embedding serviceANTHROPIC_API_KEY: API key for Anthropic Claude- Milvus and Redis connection settings
- Chunking parameters
- Server port and logging level
go test ./...config/: Configuration loading from environmentinternal/api/: HTTP server and API endpointsinternal/model/: Data modelsinternal/processor/: Document and query processing logicinternal/storage/: Vector and metadata storage interfaces