AI-Powered Data Analysis Platform

🚀 Live Demo

Try it now: https://ai-data-analytics-agent.streamlit.app/

Upload your data and start asking questions in natural language instantly!

Overview

A comprehensive artificial intelligence platform for automated data analysis, query processing, and visualization. Built with Streamlit and powered by Google's Gemini AI, this application provides intelligent insights from CSV, Excel, and JSON datasets with natural language queries.

Features

Core Functionality

Multi-format Data Support: Process CSV, Excel (xlsx/xls), and JSON files seamlessly
Natural Language Queries: Ask questions about your data in plain English
AI-Powered Analysis: Leverages Google Gemini AI for intelligent data interpretation
Smart Visualizations: Automatically generates appropriate charts based on query context
Performance Optimization: Implements intelligent caching for faster response times
Google Sheets Integration: Direct connection to Google Sheets for live data analysis

Advanced Features

Security Framework: Comprehensive input validation and file size limitations
Error Boundaries: Robust error handling with graceful degradation
Multi-source Fallback: Web search integration when local analysis is insufficient
Cloud Deployment Ready: Pre-configured for Streamlit Cloud with secrets management
Interactive Visualizations: Plotly-powered charts with hover details and zoom capabilities

Supported Visualizations

Line charts for time series analysis
Bar charts for categorical comparisons
Scatter plots for correlation analysis
Histograms for distribution analysis
Pie charts for summary data
Automatic chart type selection based on data characteristics

Technical Architecture

Technology Stack

Frontend: Streamlit 1.40.0 for interactive web interface
Data Processing: Pandas 2.2.3 with NumPy 2.1.3 for data manipulation
AI Integration: Google Generative AI 0.8.5 with Gemini 1.5-flash/pro models
Visualizations: Plotly 5.24.1 for interactive charts
Cloud Services: Google APIs for Sheets integration and web search
File Processing: OpenPyXL for Excel support, native JSON parsing

Performance Features

Response Caching: 30-minute TTL for AI responses to reduce API calls
Dataset Caching: 1-hour TTL for dataset analysis results
Lazy Loading: On-demand chart generation to optimize performance
Memory Management: Safe processing decorators to handle large datasets

Security Implementation

File Validation: 50MB size limit with extension verification
Query Sanitization: SQL injection prevention and content filtering
Input Validation: Comprehensive user input checking
Error Isolation: Safe execution boundaries to prevent crashes

Installation and Setup

Prerequisites

Python 3.8 or higher
pip package manager
Google AI Studio API key

Local Development Setup

Clone the Repository

git clone https://github.com/ark5234/AI-Agent-Project.git
cd AI-Agent-Project

Install Dependencies
```
pip install -r requirements.txt
```

Environment Configuration Create a .env file in the project root:

GEMINI_API_KEY=your_gemini_api_key_here
GOOGLE_API_KEY=your_google_api_key_here
SEARCH_ENGINE_ID=your_search_engine_id_here

Obtain API Keys
- Gemini API: Visit Google AI Studio to get your API key
- Google API: Create credentials in Google Cloud Console
- Search Engine: Set up Custom Search Engine in Google
Run the Application
```
streamlit run main.py
```

Streamlit Cloud Deployment

Push to GitHub Ensure your code is in a GitHub repository
Deploy to Streamlit Cloud
- Visit Streamlit Cloud
- Connect your GitHub repository
- Select main.py as the main file

Configure Secrets In your Streamlit Cloud app settings, add secrets in TOML format:

GEMINI_API_KEY = "your_api_key_here"
GOOGLE_API_KEY = "your_google_api_key_here"
SEARCH_ENGINE_ID = "your_search_engine_id_here"

Usage Guide

Basic Data Analysis

Upload Your Data
- Select "Upload CSV File" option
- Choose from supported formats: CSV, Excel, JSON
- Review the automatic data preview

Ask Natural Language Questions

Example queries:
- "Show me records where sales > 1000"
- "What is the average price by category?"
- "Count customers by region"
- "Find products with low inventory"

Review Results
- View filtered data tables
- Examine automatically generated visualizations
- Download results in CSV format

Google Sheets Integration

Select Google Sheets Option

Provide Sheet URL

https://docs.google.com/spreadsheets/d/your-sheet-id/edit

Specify Sheet Name (e.g., "Sheet1")
Analyze Live Data with the same query interface

Advanced Query Examples

Trend Analysis: "Show sales trend over time"
Comparison: "Compare revenue by product category"
Distribution: "Show age distribution of customers"
Correlation: "Relationship between price and sales"
Filtering: "Products launched in 2023 with rating > 4"

Project Structure

AI-Agent-Project/
├── main.py                     # Primary application logic
├── gemini_api.py              # AI integration module
├── google_api.py              # Google services integration  
├── final_test.py              # Integration testing
├── requirements.txt           # Python dependencies
├── runtime.txt                # Python version for deployment
├── .streamlit/
│   └── config.toml            # Streamlit configuration
├── .env                       # Local environment variables
├── .env.example               # Environment template
├── .gitignore                 # Git ignore patterns
├── LICENSE                    # MIT license
├── README.md                  # This documentation
├── GOOGLE_SHEETS_SETUP.md     # Google Sheets integration guide
└── STREAMLIT_DEPLOYMENT.md    # Deployment instructions

API Reference

Core Functions

`validate_csv_file(file)`

Validates uploaded files for security and format compliance.

Parameters: file object from Streamlit file uploader
Returns: tuple (is_valid: bool, message: str)
Security: Size limits, extension validation, content checking

`process_query(data, query, main_column)`

Processes natural language queries against dataset.

Parameters:
- data: pandas DataFrame
- query: string query in natural language
- main_column: primary column for analysis focus
Returns: processed DataFrame or analysis results

`generate_smart_visualizations(data, query, result_data)`

Creates appropriate visualizations based on query intent.

Parameters:
- data: original dataset
- query: user query for context
- result_data: filtered/processed results
Returns: Plotly figure object

Configuration Options

Caching Settings

@st.cache_data(ttl=1800)  # 30 minutes for AI responses
@st.cache_data(ttl=3600)  # 1 hour for dataset analysis

Security Parameters

Maximum file size: 50MB
Supported formats: CSV, XLSX, XLS, JSON
Query length limit: 1000 characters
SQL injection prevention: Active

Contributing

Development Guidelines

Code Style: Follow PEP 8 Python style guidelines
Documentation: Include docstrings for all functions
Testing: Test all features before submitting pull requests
Security: Maintain input validation and error handling

Pull Request Process

Fork the repository
Create a feature branch: git checkout -b feature/your-feature-name
Commit your changes: git commit -m 'Add comprehensive feature'
Push to the branch: git push origin feature/your-feature-name
Submit a pull request with detailed description

Troubleshooting

Common Issues

Import Errors

Ensure all dependencies are installed: pip install -r requirements.txt
Verify Python version compatibility (3.8+)

API Key Issues

Confirm API keys are correctly set in environment variables or Streamlit secrets
Verify API key permissions and quotas in respective consoles

File Upload Problems

Check file size (must be under 50MB)
Ensure supported file format (CSV, Excel, JSON)
Verify file encoding (UTF-8 recommended)

Performance Issues

Large datasets may require increased memory allocation
Consider data sampling for very large files
Monitor API usage to avoid rate limiting

Support

For technical support and bug reports:

Create an issue in the GitHub repository
Provide detailed error messages and reproduction steps
Include system information and Python version

License

This project is licensed under the MIT License. See the LICENSE file for complete details.

Acknowledgments

Google AI Studio for Gemini API access
Streamlit team for the excellent web framework
Plotly for interactive visualization capabilities
Open source community for various Python libraries

Author

Vikrant Kawadkar (@ark5234)

Email: vikrantkawadkar2099@gmail.com
GitHub: https://github.com/ark5234

Version: 2.0.0
Last Updated: September 2025
Compatibility: Python 3.8+, Streamlit 1.40.0+

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.devcontainer		.devcontainer
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
final_test.py		final_test.py
gemini_api.py		gemini_api.py
google_api.py		google_api.py
main.py		main.py
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt
runtime.txt		runtime.txt

License

ark5234/AI-Agent-Project

Folders and files

Latest commit

History

Repository files navigation

AI-Powered Data Analysis Platform

🚀 Live Demo

Overview

Features

Core Functionality

Advanced Features

Supported Visualizations

Technical Architecture

Technology Stack

Performance Features

Security Implementation

Installation and Setup

Prerequisites

Local Development Setup

Streamlit Cloud Deployment

Usage Guide

Basic Data Analysis

Google Sheets Integration

Advanced Query Examples

Project Structure

API Reference

Core Functions

validate_csv_file(file)

process_query(data, query, main_column)

generate_smart_visualizations(data, query, result_data)

Configuration Options

Caching Settings

Security Parameters

Contributing

Development Guidelines

Pull Request Process

Troubleshooting

Common Issues

Support

License

Acknowledgments

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

`validate_csv_file(file)`

`process_query(data, query, main_column)`

`generate_smart_visualizations(data, query, result_data)`

Packages