A web application for analyzing Hacker News topics using machine learning clustering and AI-powered summaries. The application fetches stories from Hacker News via the Algolia API, embeds them using sentence-transformers, clusters them using UMAP and KMeans, and generates cluster summaries using Azure OpenAI.
- Dual Search Interface: Analyze two different topics simultaneously in a split-screen layout
- Story Statistics: View story counts and most upvoted stories for each search
- ML-Powered Clustering: Uses sentence-transformers, UMAP, and KMeans for intelligent story clustering
- Interactive Visualization: Click on clusters in a 2D scatter plot to explore stories
- AI Summaries: LLM-generated titles and summaries for each cluster using Azure OpenAI
- Real-time Analysis: Fast processing with in-memory caching
- No Authentication Required: Uses public Hacker News Algolia API (no API keys needed)
┌─────────────────────────────────────────┐
│ Frontend: HTML + CSS + JavaScript │
│ - Split-screen UI │
│ - Plotly.js visualization │
└─────────────────────────────────────────┘
│ REST API
▼
┌─────────────────────────────────────────┐
│ Backend: Python FastAPI │
│ - Hacker News Algolia API Integration │
│ - Sentence Transformers (embeddings) │
│ - UMAP + KMeans (clustering) │
│ - Azure OpenAI (summaries) │
└─────────────────────────────────────────┘
- Python 3.9 or higher
- Azure OpenAI API credentials (for cluster summaries)
- Modern web browser
- No Hacker News API credentials needed (uses public Algolia API)
cd hackit/twitterit# Create virtual environment
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activatecd backend
pip install -r requirements.txtCreate a .env file in the backend directory:
cp .env.example .envEdit .env and add your credentials:
# Hacker News API (no credentials needed - uses public Algolia API)
# Azure OpenAI Credentials
AZURE_OPENAI_ENDPOINT=https://abpatra-7946-resource.openai.azure.com/openai/v1/
AZURE_OPENAI_API_KEY=your_azure_openai_api_key
AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4.1-mini
# Application Settings (optional)
CACHE_SIZE=1000
MAX_TWEETS_PER_SEARCH=1000
EMBEDDING_BATCH_SIZE=32- Go to Azure Portal
- Create an Azure OpenAI resource
- Deploy a model (e.g., gpt-4.1-mini)
- Copy the endpoint and API key to your
.envfile
cd backend
uvicorn app.main:app --reload --port 8000The backend API will be available at http://localhost:8000
You can view the API documentation at:
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
Open a new terminal:
cd frontend
python -m http.server 3000The frontend will be available at http://localhost:3000
Alternatively, you can use any static file server or simply open index.html in your browser.
- Enter a topic in either search bar (e.g., "AI", "Rust", "Open Source")
- Click "Search" button
- View story statistics: total count and most upvoted story
- Click the "Analyse" button after searching
- Wait for the ML pipeline to complete:
- Generating embeddings (sentence-transformers)
- Reducing dimensions (UMAP)
- Clustering stories (KMeans)
- View the interactive cluster visualization
- Click on any point in the scatter plot
- View AI-generated cluster title and summary
- Browse all stories in the cluster
- Close the modal to explore other clusters
- Use the split-screen interface to analyze two topics
- Each section operates independently
- Compare clustering patterns across different topics
POST /api/v1/twitter/search- Search for Hacker News storiesGET /api/v1/twitter/stats/{search_id}- Get story statistics
POST /api/v1/analysis/embed- Generate embeddingsPOST /api/v1/analysis/cluster- Perform clusteringPOST /api/v1/analysis/summarize- Generate cluster summary
GET /health- Check API health
- FastAPI: Modern Python web framework
- httpx: Async HTTP client for Hacker News Algolia API
- sentence-transformers: Text embeddings (all-MiniLM-L6-v2)
- UMAP: Dimensionality reduction
- KMeans: K-means clustering from scikit-learn
- Azure OpenAI: LLM summaries
- Pydantic: Data validation
- HTML5/CSS3: User interface
- Vanilla JavaScript: Application logic
- Plotly.js: Interactive visualizations
hackit/twitterit/
├── backend/
│ ├── app/
│ │ ├── __init__.py
│ │ ├── main.py # FastAPI application
│ │ ├── config.py # Configuration
│ │ ├── models.py # Data models
│ │ ├── routers/
│ │ │ ├── twitter.py # Hacker News endpoints (kept name for compatibility)
│ │ │ └── analysis.py # Analysis endpoints
│ │ └── services/
│ │ ├── hackernews_service.py # Hacker News API
│ │ ├── embedding_service.py# Embeddings
│ │ ├── clustering_service.py# Clustering
│ │ └── llm_service.py # Azure OpenAI
│ ├── requirements.txt
│ ├── .env.example
│ └── .env # Your credentials
├── frontend/
│ ├── index.html # Main page
│ ├── css/
│ │ └── styles.css # Styling
│ └── js/
│ ├── api.js # API client
│ ├── visualization.js # Plotly charts
│ └── app.js # Main logic
└── README.md
Error: Hacker News API request failed
- Check your internet connection
- Verify the Algolia API is accessible
- Check for rate limiting (unlikely but possible)
Error: Module not found
- Ensure virtual environment is activated
- Run
pip install -r requirements.txtagain
Error: Port already in use
- Change the port:
uvicorn app.main:app --port 8001
Error: CORS policy blocking requests
- Ensure backend is running on
http://localhost:8000 - Check CORS configuration in
backend/app/config.py
Error: API requests failing
- Verify backend URL in
frontend/js/api.js - Check browser console for detailed errors
Slow first request
- The sentence-transformers model downloads on first use
- Subsequent requests will be faster
Out of memory
- Reduce
EMBEDDING_BATCH_SIZEin.env - Limit
MAX_TWEETS_PER_SEARCH
- Caching: Embeddings are cached in memory
- Batch Processing: Tweets are embedded in batches
- Rate Limiting: Twitter API rate limits are respected
- Async Processing: FastAPI handles requests asynchronously
- Hacker News search limited to last 5 days by default (configurable)
- KMeans requires minimum 2 stories for clustering (optimal results with 5+ stories)
- Azure OpenAI costs apply per API call
- Algolia API has rate limits but they're quite generous for normal use
- Save analysis results to database
- Export cluster data to CSV/JSON
- Support for more clustering algorithms (DBSCAN, Agglomerative Clustering)
- Sentiment analysis for each cluster
- Time-series analysis of topics
- User authentication and saved searches
This project is for educational and research purposes.
- sentence-transformers: Reimers & Gurevych, 2019
- UMAP: McInnes et al., 2018
- KMeans: Lloyd's algorithm (scikit-learn implementation)
- Plotly: Open-source visualization library
For issues or questions:
- Check the troubleshooting section
- Review API documentation at
http://localhost:8000/docs - Check Hacker News Algolia API status: https://hn.algolia.com/api