A comprehensive Streamlit-based dashboard for analyzing car news and reviews data with advanced NLP capabilities including sentiment analysis, topic modeling, keyword extraction, and more.
- Sentiment Analysis: VADER + TextBlob analysis for news and reviews
- Topic Modeling: LDA-based topic extraction and visualization
- Keyword Analysis: Enhanced keyword extraction with co-occurrence analysis
- N-gram Analysis: Bigram and trigram pattern identification
- Named Entity Recognition: Extraction of brands, organizations, and locations
- Time Series Analysis: Temporal trends and patterns
- Correlation Analysis: Relationships between different metrics
- User-friendly Interface: Designed for non-technical users
- Multiple Data Views: News articles, car reviews, or combined analysis
- Advanced Filtering: Filter by date, sentiment, brand, rating, and source
- Interactive Visualizations: Charts, graphs, and word clouds
- Real-time Insights: Dynamic analysis based on selected filters
- Python 3.8 or higher
- PostgreSQL database (optional, can use CSV files)
- Docker (optional, for containerized deployment)
-
Clone or download the project
# If using git git clone <repository-url> cd demo_project
-
Install dependencies
pip install -r requirements.txt
-
Install additional requirements
# Install spaCy model for NER python -m spacy download en_core_web_sm # Install NLTK data python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords'); nltk.download('wordnet')"
If you want to use PostgreSQL instead of CSV files:
-
Install PostgreSQL and create a database
-
Set environment variables:
export DB_HOST=localhost export DB_PORT=5432 export DB_NAME=car_analysis export DB_USER=your_username export DB_PASSWORD=your_password export NEWS_TABLE=car_news export REVIEWS_TABLE=car_reviews
-
Run database migration:
python scripts/migrate_to_database.py
streamlit run streamlit_app.pyOpen your browser to http://localhost:8501
python run_analysis.py# Build and run with Docker Compose
docker-compose up --build
# Or build and run manually
docker build -t car-dashboard .
docker run -p 8501:8501 car-dashboard# Test database connection
python scripts/test_database.py
# Query analysis results
python scripts/query_analysis_results.py
# Run analysis pipeline
python scripts/run_analysis_pipeline.pydemo_project/
βββ π datasets/ # CSV data files
β βββ car_news_dataset.csv # News articles data
β βββ car_reviews_dataset.csv # Car reviews data
β βββ car_work_data.csv # Census/market data
βββ π§ src/ # Core analysis modules
β βββ analysis/ # Main analysis framework
β βββ config/ # Configuration management
β βββ data/ # Data loading utilities
β βββ features/ # Feature extraction modules
βββ π οΈ utils/ # Streamlit utilities
β βββ analysis_utils.py # Analysis display functions
β βββ chart_utils.py # Chart creation functions
β βββ data_utils.py # Data processing utilities
βββ π scripts/ # Utility scripts
β βββ migrate_to_database.py # Database setup
β βββ query_analysis_results.py # Query analysis results
β βββ run_analysis_pipeline.py # Run complete analysis
β βββ test_database.py # Test database connection
βββ π¨ streamlit_app.py # Main dashboard application
βββ βοΈ run_analysis.py # Analysis pipeline runner
βββ π requirements.txt # Python dependencies
βββ π README.md # This file
- Data Selection: Choose between News, Reviews, or Both datasets
- Filters: Apply date range, sentiment, brand, and rating filters
- Analysis Tabs: Navigate through different analysis types
- π Overview & Metrics: High-level insights and key statistics
- π Sentiment Analysis: Sentiment distribution and trends
- π·οΈ Topic Modeling: Topic discovery and word clouds
- π Keyword Analysis: Important words and relationships
- π N-gram Analysis: Phrase patterns and combinations
- π’ Named Entity Recognition: Brands, organizations, locations
- π Time Series Analysis: Temporal trends and market data
- π Raw Data Explorer: Browse and filter raw data
# Build and run the dashboard
docker-compose up --build
# Access the dashboard at http://localhost:8501The included Docker setup provides:
- Optimized Image: Python 3.11 slim with all dependencies
- Security: Non-root user execution
- Health Checks: Automatic container health monitoring
- Persistent Data: Datasets mounted as read-only volumes
- Production Ready: Headless mode with proper CORS settings
Copy env.template to .env and customize:
cp env.template .env- Local Docker:
docker-compose up --build - Cloud Platforms: Deploy to AWS, GCP, Azure, etc.
- VPS Hosting: Use Docker on any Linux server
- Container Orchestration: Kubernetes, Docker Swarm
# Database Configuration
DB_HOST=localhost
DB_PORT=5432
DB_NAME=car_analysis
DB_USER=username
DB_PASSWORD=password
# Table Names
NEWS_TABLE=car_news
REVIEWS_TABLE=car_reviews
# Streamlit Configuration
STREAMLIT_SERVER_PORT=8501
STREAMLIT_SERVER_ADDRESS=0.0.0.0
STREAMLIT_SERVER_HEADLESS=true- Keywords: Edit
src/config/config.pyto modify analysis keywords - Stop Words: Customize in
src/features/enhanced_stopwords.py - Charts: Modify visualization settings in
utils/chart_utils.py
-
spaCy Model Not Found
python -m spacy download en_core_web_sm
-
Database Connection Failed
- Check PostgreSQL is running
- Verify environment variables
- Run
python scripts/test_database.py
-
Memory Issues with Large Datasets
- Use database mode instead of CSV
- Reduce dataset size for testing
-
Streamlit Not Starting
pip install --upgrade streamlit
- Use database mode for better performance with large datasets
- Filter data using the dashboard filters to reduce processing time
- Close unused browser tabs to free memory
News Data (car_news_dataset.csv):
titleorheadline: Article titlecontent: Article contentpublication_date: Date of publicationsource: News source
Reviews Data (car_reviews_dataset.csv):
verdict: Review textrating: Star rating (1-5)brand: Car brandpublication_date: Review datesource: Review source
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
This project is open source and available under the MIT License.
For issues and questions:
- Check the troubleshooting section above
- Review the code comments and documentation
- Test with smaller datasets first
- Ensure all dependencies are properly installed
π Enjoy analyzing car data with this comprehensive dashboard!