A comprehensive AI-powered platform for analyzing and understanding community sentiment around public safety in the Dorchester community of Boston. This project combines traditional data science approaches with LLM-based chat interactions to explore and make sense of community data.
This platform enables:
- Interactive Data Exploration: Query 311 requests, crime reports, and community events through natural language
- Community Engagement: Access community newsletters, meeting transcripts, and policy documents via semantic search
- Intelligent Question Routing: Automatically routes questions to SQL queries (structured data) or RAG retrieval (documents) or both (hybrid mode)
- Event Discovery: Find upcoming community events with temporal queries
ml-misi-community-sentiment/
βββ api/ # Flask REST API (v2.0)
β βββ api_v2.py # Main API endpoint (agent-powered)
β βββ api.py # Legacy API (deprecated)
β βββ datastore/ # Static data files
β βββ prompts/ # LLM prompt templates
β βββ requirements.txt # API dependencies
β
βββ on_the_porch/ # Core chatbot and data processing
β βββ unified_chatbot.py # Main chatbot orchestration
β βββ sql_chat/ # SQL query generation and execution
β βββ rag stuff/ # RAG retrieval system
β βββ data_ingestion/ # Automated data sync (Google Drive, Email)
β βββ calendar/ # Event extraction and processing
β βββ new_metadata/ # Database schema metadata generation
β
βββ dataset-documentation/ # Dataset documentation (see below)
βββ test_frontend/ # Frontend testing interface
βββ public/ # Static frontend assets
βββ Old_exp/ # Legacy experiments (ignored in git)
For instructors and evaluators, a lightweight demo setup is available in the demo/ folder. This avoids any client credentials and uses a small demo database snapshot and vector store without running the data ingestion pipeline, since ingestion requires additional setup of Google Drive and Gmail credentials.
To keep setup instructions in one place (and avoid the main README getting out of sync with the actual scripts), all demo-specific setup steps are documented in demo/README.md.
From the project root, see:
demo/README.mdβ how to:- Run
demo/setup.shordemo/setup_windows.bat - Bring up the Dockerized MySQL demo database
- Configure the minimal
.envvalues needed for the demo
- Run
Once youβve followed the steps in demo/README.md, you can skip the Installation section below and just use the Configuration and Running API/frontend sections as reference.
- Python 3.11+
- MySQL 8.0+ (for structured data)
- Google Gemini API key
-
Clone the repository
git clone <repository-url> cd ml-misi-community-sentiment
-
Create and activate virtual environment
python3 -m venv venv source venv/bin/activate # On Mac/Linux venv\Scripts\activate # On Windows
-
Install dependencies
# Install all dependencies from root requirements.txt pip install -r requirements.txt -
Set up environment variables
- Copy
.env.examplefiles to.envin each directory - See Configuration section below
- Copy
-
Set up database
- Create MySQL database:
rethink_ai_boston - Run database setup scripts (see
on_the_porch/data_ingestion/)
- Create MySQL database:
-
Run the API
cd api python api_v2.pyThe API will start on
http://127.0.0.1:8888 -
Run the Frontend (in a separate terminal)
# From project root cd public python -m http.server 8000
Then open
http://localhost:8000in your browser
Note: Make sure the backend API is running before starting the frontend. The frontend connects to the API at http://127.0.0.1:8888 by default.
The project uses a single .env file at the repo root.
- Copy
example_env.txtto.env:cp example_env.txt .env
- Edit
.envand fill in the values for your environment.
Key Variables (non-exhaustive):
GEMINI_API_KEYβ Google Gemini API key (required)RETHINKAI_API_KEYSβ API authentication keys (comma-separated)MYSQL_HOST,MYSQL_PORT,MYSQL_USER,MYSQL_PASSWORD,MYSQL_DBβ MySQL connectionVECTORDB_DIRβ path to the ChromaDB/vector DB directoryGOOGLE_DRIVE_FOLDER_IDand relatedGOOGLE_*/GMAIL_*settings β data ingestion
- 311 Requests: Service requests from Boston 311 system
- 911 Reports: Crime and emergency reports
- Community Events: Calendar events extracted from newsletters
- Meeting Transcripts: Community meeting notes and discussions
- Policy Documents: City planning documents, budgets, and initiatives
The system automatically syncs data from:
- Google Drive: Client-uploaded documents (PDF, DOCX, TXT, MD)
- Email Newsletters: Automated extraction of events to calendar
See on_the_porch/data_ingestion/README.md for details.
- POST /chat - Main chat interaction with intelligent routing
- POST /log - Log interactions
- PUT /log - Update interaction feedback
- GET /events - Fetch upcoming community events
- GET /health - Health check
See api/README.md for detailed API documentation.
Comprehensive dataset documentation is available in the dataset-documentation/ folder. This includes:
- Data source descriptions
- Schema documentation
- Data quality notes
- Usage examples
See dataset-documentation/README.md for details.
This project implements a hybrid AI system that combines:
- SQL-based queries for structured data (311, 911, events)
- RAG (Retrieval-Augmented Generation) for document-based answers
- Intelligent routing that selects the best approach for each question
-
Unified Chatbot (
on_the_porch/unified_chatbot.py)- Routes questions to SQL, RAG, or hybrid mode
- Manages conversation history and context
- Handles source citations
-
Data Ingestion Pipeline (
on_the_porch/data_ingestion/)- Automated sync from Google Drive and email
- Event extraction from newsletters
- Vector database updates
-
API Layer (
api/api_v2.py)- RESTful endpoints for frontend integration
- Session management
- Interaction logging
- Start Here: Review
on_the_porch/unified_chatbot.pyto understand the core routing logic - Test the API: Use
api/test_api_v2.pyto test endpoints - Explore Data: Check
dataset-documentation/for available data sources - Frontend Integration:
- See
public/for the production frontend (seepublic/README.mdfor details) - See
test_frontend/for example frontend code - Both can be used to test the API via a web interface
- See
-
Local Development
# Start API server cd api python api_v2.py # Test with curl or Postman curl -X POST http://localhost:8888/chat \ -H "RethinkAI-API-Key: your-key" \ -H "Content-Type: application/json" \ -d '{"message": "What events are happening this weekend?"}'
-
Data Updates
# Run data ingestion cd on_the_porch/data_ingestion python boston_data_sync/boston_data_sync.py
-
Database Setup
- See
on_the_porch/data_ingestion/README.mdfor database initialization
- See
- API Key Errors: Ensure
GEMINI_API_KEYis set in.env - Database Connection: Verify MySQL credentials and database exists
- Vector DB Issues: Check
VECTORDB_DIRpath and permissions - Import Errors: Ensure virtual environment is activated and dependencies installed
- API Documentation:
api/README.md - Data Ingestion:
on_the_porch/data_ingestion/README.md - Dataset Info:
dataset-documentation/README.md - API v2 Details:
on_the_porch/api_readme.md
See scripts/dreamhost/ for deployment scripts:
setup.sh- Initial server setupdeploy.sh- Application deploymentdatabase_setup.sh- Database initialization
- Use
gunicornor similar WSGI server for production - Set
FLASK_SESSION_COOKIE_SECURE=Truefor HTTPS - Configure proper CORS origins
- Set up database backups
- Monitor API usage and costs
See LICENSE.md for license information.
- Project Owner: buspark@bu.edu
- Repository: [GitHub Link]
- Interactive Dashboard: [Add dashboard URL if hosted]
- API Documentation: See
api/README.md - Dataset Documentation: See
dataset-documentation/README.md
Note: The Old_exp/ folder contains legacy experiments and is excluded from version control. Focus on the api/ and on_the_porch/ directories for active development.