A web application for scraping auction websites and automatically categorizing items using AI. Built with Node.js/Express backend and React frontend.
- Web Scraping: Scrape auction data from configurable sources
- AI Categorization: Automatically categorize auction items using Azure OpenAI
- Category Management: Create and manage categories with descriptions
- Category Probabilities: AI provides probability scores for each category
- 50% Confidence Threshold: Items only get assigned a main category if AI confidence ≥ 50%
- Responsive UI: Works on desktop and mobile devices
- Runtime: Bun (recommended) / Node.js
- Framework: Express.js 5.x
- Database: SQLite with Drizzle ORM + libsql client
- AI: Azure OpenAI (GPT-4.1-mini)
- API Documentation: OpenAPI
- Framework: React 18
- Routing: React Router DOM 6
- Styling: Tailwind CSS
- State Management: Zustand
- Bundler: Vite
-
Node.js (v18 or higher)
# Check version node --version -
Bun (recommended for faster installation and execution)
# Install Bun curl -fsSL https://bun.sh/install | bash
-
Git
# Check version git --version
- Azure OpenAI (for AI categorization)
- Azure subscription
- Azure OpenAI resource
- Deployment of
gpt-4.1-minior similar model
-
Clone the repository
git clone <repository-url> cd auction-categorization
-
Install dependencies
# Using bun (recommended) bun install # Or using npm npm install
-
Set up the database
# Generate database (SQLite) # Database is automatically created at packages/server/db/dev.db
-
Configure environment (optional)
# Copy example config if needed cp packages/server/config/config-example.toml packages/server/config/config.toml # Edit packages/server/config/config.toml
Edit packages/server/config/config.toml to configure the application:
[ai]
# AI model configuration
model = "gpt-4o"
# API key - set to empty or use environment variable
api_key = ""
# Base URL for self-hosted models (optional)
base_url = ""
# Azure AI Foundry configuration
azure_endpoint = "https://your-resource.openai.azure.com/"
azure_api_version = "2025-01-01-preview"
azure_deployment = "gpt-4.1-mini"You can also configure using environment variables:
| Variable | Description |
|---|---|
AI_API_KEY |
Azure OpenAI API key |
AI_MODEL |
AI model name |
AI_AZURE_ENDPOINT |
Azure OpenAI endpoint URL |
AI_AZURE_API_VERSION |
Azure API version |
AI_AZURE_DEPLOYMENT |
Azure deployment name |
SERVER_PORT |
Backend server port (default: 3000) |
Start both backend and frontend in development mode:
# Start both client and server concurrently
bun run dev
# Or start them separately:
bun run dev:server # Backend only
bun run dev:client # Frontend only# Build server
cd packages/server && bun run build
# Start production server
cd packages/server && bun run start- Frontend: http://localhost:5173
- Backend API: http://localhost:3000
- API Documentation: http://localhost:3000/api (OpenAPI)
This is a Bun workspace monorepo with separate client and server packages.
auction-categorization/
├── packages/
│ ├── client/ # React frontend
│ │ ├── src/
│ │ │ ├── app.tsx # Main React app with routing
│ │ │ ├── main.tsx # React entry point
│ │ │ ├── pages/ # Page components
│ │ │ │ ├── HomePage.tsx
│ │ │ │ ├── AuctionsPage.tsx
│ │ │ │ ├── AuctionDetailPage.tsx
│ │ │ │ ├── AllItemsPage.tsx
│ │ │ │ ├── CategoriesPage.tsx
│ │ │ │ ├── CategoryDetailPage.tsx
│ │ │ │ ├── ScrapingPage.tsx
│ │ │ │ └── DatabasePage.tsx
│ │ │ ├── components/ # Reusable UI components
│ │ │ └── stores/ # Zustand state stores
│ │ └── package.json
│ └── server/ # Express backend
│ ├── src/
│ │ ├── index.ts # Express server entry
│ │ ├── db/
│ │ │ ├── schema.ts # Drizzle schema
│ │ │ └── db.ts # Database client
│ │ ├── routes/api/ # File-based API routes
│ │ │ ├── auctions/
│ │ │ ├── categories/
│ │ │ ├── scrapers/
│ │ │ ├── items/
│ │ │ ├── websites/
│ │ │ ├── categorization/
│ │ │ ├── database/
│ │ │ ├── health/
│ │ │ └── stats/
│ │ ├── services/
│ │ │ ├── aiCategorization.ts
│ │ │ ├── aiProbability.ts
│ │ │ └── scrapingService.ts
│ │ ├── scrapers/
│ │ │ ├── index.ts
│ │ │ └── bopaScraper.ts
│ │ └── lib/
│ │ └── config.ts
│ ├── db/ # SQLite database & migrations
│ │ └── dev.db
│ ├── drizzle/ # Drizzle migrations
│ ├── config/ # Configuration files
│ └── package.json
├── package.json # Root workspace configuration
└── README.md
# Check if port 3000 is in use
lsof -ti:3000 | xargs kill -9
# Check configuration
cat packages/server/config/config.toml
# Check database
ls -la packages/server/db/# Check if port 5173 is in use
lsof -ti:5173 | xargs kill -9
# Reinstall dependencies
rm -rf node_modules bun.lockb
bun install- Check API key is set correctly
- Verify Azure OpenAI endpoint and deployment name
- Check server logs for errors
- Ensure categories exist in the database
# The project uses Drizzle ORM with SQLite
# Database file is at: packages/server/db/dev.dbMIT License