Try it now: https://ai-data-analytics-agent.streamlit.app/
Upload your data and start asking questions in natural language instantly!
A comprehensive artificial intelligence platform for automated data analysis, query processing, and visualization. Built with Streamlit and powered by Google's Gemini AI, this application provides intelligent insights from CSV, Excel, and JSON datasets with natural language queries.
- Multi-format Data Support: Process CSV, Excel (xlsx/xls), and JSON files seamlessly
- Natural Language Queries: Ask questions about your data in plain English
- AI-Powered Analysis: Leverages Google Gemini AI for intelligent data interpretation
- Smart Visualizations: Automatically generates appropriate charts based on query context
- Performance Optimization: Implements intelligent caching for faster response times
- Google Sheets Integration: Direct connection to Google Sheets for live data analysis
- Security Framework: Comprehensive input validation and file size limitations
- Error Boundaries: Robust error handling with graceful degradation
- Multi-source Fallback: Web search integration when local analysis is insufficient
- Cloud Deployment Ready: Pre-configured for Streamlit Cloud with secrets management
- Interactive Visualizations: Plotly-powered charts with hover details and zoom capabilities
- Line charts for time series analysis
- Bar charts for categorical comparisons
- Scatter plots for correlation analysis
- Histograms for distribution analysis
- Pie charts for summary data
- Automatic chart type selection based on data characteristics
- Frontend: Streamlit 1.40.0 for interactive web interface
- Data Processing: Pandas 2.2.3 with NumPy 2.1.3 for data manipulation
- AI Integration: Google Generative AI 0.8.5 with Gemini 1.5-flash/pro models
- Visualizations: Plotly 5.24.1 for interactive charts
- Cloud Services: Google APIs for Sheets integration and web search
- File Processing: OpenPyXL for Excel support, native JSON parsing
- Response Caching: 30-minute TTL for AI responses to reduce API calls
- Dataset Caching: 1-hour TTL for dataset analysis results
- Lazy Loading: On-demand chart generation to optimize performance
- Memory Management: Safe processing decorators to handle large datasets
- File Validation: 50MB size limit with extension verification
- Query Sanitization: SQL injection prevention and content filtering
- Input Validation: Comprehensive user input checking
- Error Isolation: Safe execution boundaries to prevent crashes
- Python 3.8 or higher
- pip package manager
- Google AI Studio API key
-
Clone the Repository
git clone https://github.com/ark5234/AI-Agent-Project.git cd AI-Agent-Project -
Install Dependencies
pip install -r requirements.txt
-
Environment Configuration Create a
.envfile in the project root:GEMINI_API_KEY=your_gemini_api_key_here GOOGLE_API_KEY=your_google_api_key_here SEARCH_ENGINE_ID=your_search_engine_id_here
-
Obtain API Keys
- Gemini API: Visit Google AI Studio to get your API key
- Google API: Create credentials in Google Cloud Console
- Search Engine: Set up Custom Search Engine in Google
-
Run the Application
streamlit run main.py
-
Push to GitHub Ensure your code is in a GitHub repository
-
Deploy to Streamlit Cloud
- Visit Streamlit Cloud
- Connect your GitHub repository
- Select
main.pyas the main file
-
Configure Secrets In your Streamlit Cloud app settings, add secrets in TOML format:
GEMINI_API_KEY = "your_api_key_here" GOOGLE_API_KEY = "your_google_api_key_here" SEARCH_ENGINE_ID = "your_search_engine_id_here"
-
Upload Your Data
- Select "Upload CSV File" option
- Choose from supported formats: CSV, Excel, JSON
- Review the automatic data preview
-
Ask Natural Language Questions
Example queries: - "Show me records where sales > 1000" - "What is the average price by category?" - "Count customers by region" - "Find products with low inventory" -
Review Results
- View filtered data tables
- Examine automatically generated visualizations
- Download results in CSV format
- Select Google Sheets Option
- Provide Sheet URL
https://docs.google.com/spreadsheets/d/your-sheet-id/edit - Specify Sheet Name (e.g., "Sheet1")
- Analyze Live Data with the same query interface
- Trend Analysis: "Show sales trend over time"
- Comparison: "Compare revenue by product category"
- Distribution: "Show age distribution of customers"
- Correlation: "Relationship between price and sales"
- Filtering: "Products launched in 2023 with rating > 4"
AI-Agent-Project/
├── main.py # Primary application logic
├── gemini_api.py # AI integration module
├── google_api.py # Google services integration
├── final_test.py # Integration testing
├── requirements.txt # Python dependencies
├── runtime.txt # Python version for deployment
├── .streamlit/
│ └── config.toml # Streamlit configuration
├── .env # Local environment variables
├── .env.example # Environment template
├── .gitignore # Git ignore patterns
├── LICENSE # MIT license
├── README.md # This documentation
├── GOOGLE_SHEETS_SETUP.md # Google Sheets integration guide
└── STREAMLIT_DEPLOYMENT.md # Deployment instructions
Validates uploaded files for security and format compliance.
- Parameters: file object from Streamlit file uploader
- Returns: tuple (is_valid: bool, message: str)
- Security: Size limits, extension validation, content checking
Processes natural language queries against dataset.
- Parameters:
- data: pandas DataFrame
- query: string query in natural language
- main_column: primary column for analysis focus
- Returns: processed DataFrame or analysis results
Creates appropriate visualizations based on query intent.
- Parameters:
- data: original dataset
- query: user query for context
- result_data: filtered/processed results
- Returns: Plotly figure object
@st.cache_data(ttl=1800) # 30 minutes for AI responses
@st.cache_data(ttl=3600) # 1 hour for dataset analysis- Maximum file size: 50MB
- Supported formats: CSV, XLSX, XLS, JSON
- Query length limit: 1000 characters
- SQL injection prevention: Active
- Code Style: Follow PEP 8 Python style guidelines
- Documentation: Include docstrings for all functions
- Testing: Test all features before submitting pull requests
- Security: Maintain input validation and error handling
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature-name - Commit your changes:
git commit -m 'Add comprehensive feature' - Push to the branch:
git push origin feature/your-feature-name - Submit a pull request with detailed description
Import Errors
- Ensure all dependencies are installed:
pip install -r requirements.txt - Verify Python version compatibility (3.8+)
API Key Issues
- Confirm API keys are correctly set in environment variables or Streamlit secrets
- Verify API key permissions and quotas in respective consoles
File Upload Problems
- Check file size (must be under 50MB)
- Ensure supported file format (CSV, Excel, JSON)
- Verify file encoding (UTF-8 recommended)
Performance Issues
- Large datasets may require increased memory allocation
- Consider data sampling for very large files
- Monitor API usage to avoid rate limiting
For technical support and bug reports:
- Create an issue in the GitHub repository
- Provide detailed error messages and reproduction steps
- Include system information and Python version
This project is licensed under the MIT License. See the LICENSE file for complete details.
- Google AI Studio for Gemini API access
- Streamlit team for the excellent web framework
- Plotly for interactive visualization capabilities
- Open source community for various Python libraries
Vikrant Kawadkar (@ark5234)
- Email: vikrantkawadkar2099@gmail.com
- GitHub: https://github.com/ark5234
Version: 2.0.0
Last Updated: September 2025
Compatibility: Python 3.8+, Streamlit 1.40.0+