Skip to content

olastephen/autointel

Repository files navigation

πŸš— Car Analysis Dashboard

A comprehensive Streamlit-based dashboard for analyzing car news and reviews data with advanced NLP capabilities including sentiment analysis, topic modeling, keyword extraction, and more.

πŸ“‹ Features

πŸ” Analysis Capabilities

  • Sentiment Analysis: VADER + TextBlob analysis for news and reviews
  • Topic Modeling: LDA-based topic extraction and visualization
  • Keyword Analysis: Enhanced keyword extraction with co-occurrence analysis
  • N-gram Analysis: Bigram and trigram pattern identification
  • Named Entity Recognition: Extraction of brands, organizations, and locations
  • Time Series Analysis: Temporal trends and patterns
  • Correlation Analysis: Relationships between different metrics

πŸ“Š Interactive Dashboard

  • User-friendly Interface: Designed for non-technical users
  • Multiple Data Views: News articles, car reviews, or combined analysis
  • Advanced Filtering: Filter by date, sentiment, brand, rating, and source
  • Interactive Visualizations: Charts, graphs, and word clouds
  • Real-time Insights: Dynamic analysis based on selected filters

πŸš€ Quick Start

Prerequisites

  • Python 3.8 or higher
  • PostgreSQL database (optional, can use CSV files)
  • Docker (optional, for containerized deployment)

Installation

  1. Clone or download the project

    # If using git
    git clone <repository-url>
    cd demo_project
  2. Install dependencies

    pip install -r requirements.txt
  3. Install additional requirements

    # Install spaCy model for NER
    python -m spacy download en_core_web_sm
    
    # Install NLTK data
    python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords'); nltk.download('wordnet')"

πŸ—„οΈ Database Setup (Optional)

If you want to use PostgreSQL instead of CSV files:

  1. Install PostgreSQL and create a database

  2. Set environment variables:

    export DB_HOST=localhost
    export DB_PORT=5432
    export DB_NAME=car_analysis
    export DB_USER=your_username
    export DB_PASSWORD=your_password
    export NEWS_TABLE=car_news
    export REVIEWS_TABLE=car_reviews
  3. Run database migration:

    python scripts/migrate_to_database.py

🎯 Usage

Method 1: Streamlit Dashboard (Recommended)

streamlit run streamlit_app.py

Open your browser to http://localhost:8501

Method 2: Run Complete Analysis Pipeline

python run_analysis.py

Method 3: Docker Deployment (Recommended for Hosting)

# Build and run with Docker Compose
docker-compose up --build

# Or build and run manually
docker build -t car-dashboard .
docker run -p 8501:8501 car-dashboard

Method 4: Individual Scripts

# Test database connection
python scripts/test_database.py

# Query analysis results
python scripts/query_analysis_results.py

# Run analysis pipeline
python scripts/run_analysis_pipeline.py

πŸ“ Project Structure

demo_project/
β”œβ”€β”€ πŸ“Š datasets/                    # CSV data files
β”‚   β”œβ”€β”€ car_news_dataset.csv       # News articles data
β”‚   β”œβ”€β”€ car_reviews_dataset.csv    # Car reviews data
β”‚   └── car_work_data.csv          # Census/market data
β”œβ”€β”€ πŸ”§ src/                        # Core analysis modules
β”‚   β”œβ”€β”€ analysis/                  # Main analysis framework
β”‚   β”œβ”€β”€ config/                    # Configuration management
β”‚   β”œβ”€β”€ data/                      # Data loading utilities
β”‚   └── features/                  # Feature extraction modules
β”œβ”€β”€ πŸ› οΈ utils/                      # Streamlit utilities
β”‚   β”œβ”€β”€ analysis_utils.py          # Analysis display functions
β”‚   β”œβ”€β”€ chart_utils.py             # Chart creation functions
β”‚   └── data_utils.py              # Data processing utilities
β”œβ”€β”€ πŸ“œ scripts/                    # Utility scripts
β”‚   β”œβ”€β”€ migrate_to_database.py     # Database setup
β”‚   β”œβ”€β”€ query_analysis_results.py  # Query analysis results
β”‚   β”œβ”€β”€ run_analysis_pipeline.py   # Run complete analysis
β”‚   └── test_database.py           # Test database connection
β”œβ”€β”€ 🎨 streamlit_app.py            # Main dashboard application
β”œβ”€β”€ βš™οΈ run_analysis.py             # Analysis pipeline runner
β”œβ”€β”€ πŸ“‹ requirements.txt            # Python dependencies
└── πŸ“– README.md                   # This file

🎨 Dashboard Guide

Main Interface

  1. Data Selection: Choose between News, Reviews, or Both datasets
  2. Filters: Apply date range, sentiment, brand, and rating filters
  3. Analysis Tabs: Navigate through different analysis types

Available Tabs

  • πŸ“Š Overview & Metrics: High-level insights and key statistics
  • 😊 Sentiment Analysis: Sentiment distribution and trends
  • 🏷️ Topic Modeling: Topic discovery and word clouds
  • πŸ”‘ Keyword Analysis: Important words and relationships
  • πŸ“ N-gram Analysis: Phrase patterns and combinations
  • 🏒 Named Entity Recognition: Brands, organizations, locations
  • πŸ“ˆ Time Series Analysis: Temporal trends and market data
  • πŸ“‹ Raw Data Explorer: Browse and filter raw data

🐳 Docker Hosting

Quick Docker Deployment

# Build and run the dashboard
docker-compose up --build

# Access the dashboard at http://localhost:8501

Docker Configuration

The included Docker setup provides:

  • Optimized Image: Python 3.11 slim with all dependencies
  • Security: Non-root user execution
  • Health Checks: Automatic container health monitoring
  • Persistent Data: Datasets mounted as read-only volumes
  • Production Ready: Headless mode with proper CORS settings

Environment Variables for Docker

Copy env.template to .env and customize:

cp env.template .env

Hosting Options

  • Local Docker: docker-compose up --build
  • Cloud Platforms: Deploy to AWS, GCP, Azure, etc.
  • VPS Hosting: Use Docker on any Linux server
  • Container Orchestration: Kubernetes, Docker Swarm

βš™οΈ Configuration

Environment Variables (Optional)

# Database Configuration
DB_HOST=localhost
DB_PORT=5432
DB_NAME=car_analysis
DB_USER=username
DB_PASSWORD=password

# Table Names
NEWS_TABLE=car_news
REVIEWS_TABLE=car_reviews

# Streamlit Configuration
STREAMLIT_SERVER_PORT=8501
STREAMLIT_SERVER_ADDRESS=0.0.0.0
STREAMLIT_SERVER_HEADLESS=true

Customization

  • Keywords: Edit src/config/config.py to modify analysis keywords
  • Stop Words: Customize in src/features/enhanced_stopwords.py
  • Charts: Modify visualization settings in utils/chart_utils.py

πŸ”§ Troubleshooting

Common Issues

  1. spaCy Model Not Found

    python -m spacy download en_core_web_sm
  2. Database Connection Failed

    • Check PostgreSQL is running
    • Verify environment variables
    • Run python scripts/test_database.py
  3. Memory Issues with Large Datasets

    • Use database mode instead of CSV
    • Reduce dataset size for testing
  4. Streamlit Not Starting

    pip install --upgrade streamlit

Performance Tips

  • Use database mode for better performance with large datasets
  • Filter data using the dashboard filters to reduce processing time
  • Close unused browser tabs to free memory

πŸ“Š Data Format

Expected CSV Columns

News Data (car_news_dataset.csv):

  • title or headline: Article title
  • content: Article content
  • publication_date: Date of publication
  • source: News source

Reviews Data (car_reviews_dataset.csv):

  • verdict: Review text
  • rating: Star rating (1-5)
  • brand: Car brand
  • publication_date: Review date
  • source: Review source

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

πŸ“„ License

This project is open source and available under the MIT License.

πŸ†˜ Support

For issues and questions:

  1. Check the troubleshooting section above
  2. Review the code comments and documentation
  3. Test with smaller datasets first
  4. Ensure all dependencies are properly installed

πŸŽ‰ Enjoy analyzing car data with this comprehensive dashboard!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages