Skip to content

YCombuster/dojang

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

54 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Dojang - AI-Powered Learning Platform

Transform your PDF textbooks into intelligent study materials with AI-generated flashcards, quizzes, and semantic search.

License: AGPL v3 Python 3.11+ Next.js 15 FastAPI


🎯 Project Overview

Dojang is an AI-powered learning platform that transforms large PDF textbooks into intelligent study materials. Upload a PDF, and the system:

  1. Extracts semantic content using Marker (preserves structure: headings, lists, tables)
  2. Generates embeddings with OpenAI (768-dimensional vectors)
  3. Stores in PostgreSQL with pgvector for fast similarity search
  4. Enables learning features like flashcards, quizzes, and spaced repetition (in development)

Vision

Create a comprehensive CDN (Content Delivery Network) that integrates with vector databases, making AI-powered educational features easy to implement and open source for everyone.


✨ Features

Currently Implemented βœ…

  • PDF Processing Pipeline

    • Semantic extraction with Marker
    • Hierarchical content storage
    • Memory-optimized for large textbooks
  • Vector Embeddings

    • OpenAI text-embedding-ada-002
    • Batch processing (100 chunks at a time)
    • pgvector storage for similarity search
  • Beautiful UI

    • Modern Next.js frontend
    • Drag-and-drop PDF upload
    • Real-time progress tracking
  • Robust Backend

    • FastAPI with async support
    • PostgreSQL with pgvector
    • Comprehensive test suite

In Development 🚧

  • Vector similarity search (RAG)
  • AI flashcard generation
  • Quiz creation
  • Spaced repetition system
  • Chat with documents
  • User authentication
  • Study progress tracking

πŸš€ Quick Start

Get up and running in 5 minutes:

  1. Prerequisites: Docker Desktop + OpenAI API key
  2. Clone and setup:
    git clone <repository-url>
    cd dojang
    echo "OPENAI_API_KEY=your_key_here" > .env
  3. Start everything:
    docker-compose up --build
  4. Access the app: http://localhost:3000

For detailed instructions, see QUICKSTART.md


πŸ“š Documentation

Document Description
QUICKSTART.md Get started in 5 minutes with Docker
SETUP.md Detailed development setup (Docker & manual)
ARCHITECTURE.md System design, data flow, and technical details
FEATURES.md Implementation guides for new features

πŸ—οΈ Technology Stack

Frontend

  • Framework: Next.js 15 (React 18)
  • Language: TypeScript
  • Styling: Tailwind CSS + Shadcn UI
  • Testing: Playwright

Backend

  • Framework: FastAPI
  • Language: Python 3.11+
  • ORM: SQLAlchemy (async)
  • PDF Processing: PyMuPDF + Marker
  • AI: OpenAI API
  • Testing: Pytest

Database

  • DBMS: PostgreSQL 15+
  • Vector Search: pgvector extension
  • Schema: Hierarchical content storage

Infrastructure

  • Containerization: Docker + Docker Compose
  • Development: Hot reload for frontend & backend

πŸ“Š Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Frontend (Next.js)                                          β”‚
β”‚  β€’ PDF Upload UI                                             β”‚
β”‚  β€’ Flashcards (TODO)                                         β”‚
β”‚  β€’ Quizzes (TODO)                                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚ HTTP/REST
                          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Backend (FastAPI)                                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚  Document Processing Pipeline                       β”‚    β”‚
β”‚  β”‚  PDF β†’ Marker β†’ JSON β†’ DB β†’ OpenAI β†’ Embeddings    β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
                          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  PostgreSQL + pgvector                                       β”‚
β”‚  β€’ Document metadata                                         β”‚
β”‚  β€’ Hierarchical content                                      β”‚
β”‚  β€’ 768-dim embeddings                                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

For detailed architecture, see ARCHITECTURE.md


πŸ› οΈ Development

Running Tests

Backend:

# Inside Docker
docker exec -it dojang-backend-1 pytest

# Or use the script
cd backend
./run_tests.sh  # Unix/Mac
.\run_tests.ps1  # Windows

Frontend:

docker exec -it dojang-frontend-1 npx playwright test

Making Changes

  • Backend: Edit files in backend/app/ - auto-reloads
  • Frontend: Edit files in frontend/ - Fast Refresh
  • Database: Modify backend/app/models.py + create migration

Viewing Logs

docker-compose logs -f              # All services
docker-compose logs -f backend      # Backend only
docker-compose logs -f frontend     # Frontend only

πŸ—„οΈ Database Schema

Core Tables

knowledge_base_sources - Document metadata

  • source_id, name, author, publisher, etc.

knowledge_base_content - Hierarchical content with embeddings

  • content_id, source_id, parent_content_id
  • title, content, content_type
  • embedding (VECTOR(768))

users - User management (for future auth)

tags - Content tagging system

user_activity_log - Track learning progress

See ARCHITECTURE.md for full schema details.


πŸŽ“ Contributing Features

We welcome contributions! Here's how to get started:

  1. Pick a feature from FEATURES.md
  2. Read the architecture in ARCHITECTURE.md
  3. Set up your environment with SETUP.md
  4. Create a feature branch: git checkout -b feat/your-feature
  5. Implement & test
  6. Submit a pull request

High-Priority Features

  • πŸ”΄ Vector Similarity Search
  • πŸ”΄ Flashcard Generation
  • πŸ”΄ Quiz Generation
  • 🟑 Spaced Repetition System
  • 🟑 Chat with Documents

See FEATURES.md for detailed implementation guides.


πŸ“¦ Project Structure

dojang/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ main.py                    # FastAPI app
β”‚   β”‚   β”œβ”€β”€ models.py                  # Database models
β”‚   β”‚   β”œβ”€β”€ database.py                # DB connection
β”‚   β”‚   β”œβ”€β”€ routers/                   # API endpoints
β”‚   β”‚   └── services/
β”‚   β”‚       β”œβ”€β”€ document_processor.py  # Core pipeline
β”‚   β”‚       └── source_intake.py       # Content ingestion
β”‚   β”œβ”€β”€ tests/                         # Pytest tests
β”‚   β”œβ”€β”€ Dockerfile
β”‚   └── requirements.txt
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ app/                           # Next.js pages
β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”œβ”€β”€ FileUpload.tsx            # Upload UI
β”‚   β”‚   └── ui/                        # Shadcn components
β”‚   β”œβ”€β”€ tests/                         # Playwright tests
β”‚   β”œβ”€β”€ Dockerfile
β”‚   └── package.json
β”œβ”€β”€ docker-compose.yml                 # Multi-container setup
β”œβ”€β”€ init-db.sh                         # Database initialization
β”œβ”€β”€ QUICKSTART.md                      # Quick start guide
β”œβ”€β”€ SETUP.md                           # Detailed setup
β”œβ”€β”€ ARCHITECTURE.md                    # System design
└── FEATURES.md                        # Feature guides

πŸ” Environment Variables

Backend (.env)

OPENAI_API_KEY=your_openai_api_key_here
DATABASE_URL=postgresql+asyncpg://postgres:zany12@localhost:5433/studyai
UPLOAD_DIR=./uploads

🚒 Deployment

Current: Development

All services run in Docker containers locally.

Future: Production

  • Frontend: Vercel/Netlify
  • Backend: AWS ECS / GCP Cloud Run
  • Database: AWS RDS / GCP Cloud SQL
  • Background Jobs: Celery + Redis
  • Monitoring: Sentry, Prometheus, Grafana

πŸ“ License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).

See LICENSE for full details.


πŸ™ Acknowledgments

  • Marker - Excellent PDF semantic extraction
  • pgvector - PostgreSQL vector similarity search
  • OpenAI - Embeddings and GPT models
  • FastAPI - Modern Python web framework
  • Next.js - React framework for production

πŸ“§ Contact & Support


πŸ—ΊοΈ Roadmap

  • PDF upload and processing
  • Semantic content extraction
  • Embedding generation
  • Vector storage (pgvector)
  • Beautiful UI
  • Vector similarity search
  • Flashcard generation
  • Quiz creation
  • Spaced repetition
  • User authentication
  • Progress tracking
  • Multi-modal support (images, videos)
  • Mobile app
  • Open source CDN for education

Ready to get started? See QUICKSTART.md to begin!

About

RevisionDojo (YC F24) but it's opencourseware

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published