🔮 MultiRegal - Automatic Multivariable Regression Analysis Calculator

An intelligent tool that automatically identifies the most important factors affecting your outcomes through AI-powered regression analysis.

✨ Features

🤖 AI-Powered Factor Identification: Uses multiple statistical methods and LLM intelligence to identify the most important variables
📊 Interactive Web Interface: Beautiful, user-friendly Streamlit interface for easy data analysis
🧹 Advanced Data Cleaning: Automatically handles dirty data including missing values, duplicates, outliers, mixed data types, and inconsistent formatting
🔍 Comprehensive Analysis: Performs correlation analysis, feature importance ranking, and regression modeling
📐 Formula Generation: Creates both precise and simplified mathematical formulas
📈 Rich Visualizations: Interactive charts, correlation heatmaps, and model comparisons
💡 Smart Insights: LLM-generated recommendations and actionable insights
📋 Detailed Reports: Complete cleaning and analysis reports with step-by-step documentation
💾 Export Capabilities: Download results in JSON and CSV formats

🚀 Quick Start

Installation

Install dependencies:
```
pip install -r requirements.txt
```
Install the package in development mode:
```
pip install -e .
```

Launch Web Interface

Option 1: Use the launch script (Recommended)

python run_app.py

Option 2: Direct Streamlit command

streamlit run app.py

The web interface will open at http://localhost:8501

📱 Web Interface Usage

1. Upload Data

Upload a CSV file with your dataset
Or use the demo data for quick testing
Data preview and validation happens automatically

2. Configure Analysis

Select your target variable (what you want to predict)
Choose input features (or use all available)
Set analysis parameters (max features, test size)

3. Run Analysis

Click "🚀 Run Analysis" to start the process
Watch real-time progress as the analysis runs
Results appear in organized tabs

4. Explore Results

🎯 Key Findings: Top factors and importance rankings
📊 Feature Analysis: Correlation heatmaps and detailed analysis
🤖 Model Comparison: Performance metrics across different models
📐 Formulas: Generated mathematical formulas
💡 Insights: AI-powered recommendations and insights
🧹 Data Cleaning Report: Detailed report of all cleaning steps performed

5. Export Results

Download comprehensive reports in JSON format
Export factor rankings as CSV
Save results for further analysis

🧹 Data Cleaning Capabilities

MultiRegal automatically handles messy, real-world data! It can clean:

🔄 Duplicate Records: Automatically removes identical rows
💰 Currency & Number Formatting: Handles "$1,234", "1,234.56", "€123", percentages
❌ Missing Values: Smart imputation using mean/median based on data distribution
🔤 Mixed Data Types: Converts "TRUE/FALSE", "Yes/No" to numeric, handles text in number columns
📊 Statistical Outliers: Removes extreme values using IQR or Z-score methods
🏷️ Column Names: Standardizes messy column names (spaces, special characters)
🔢 Data Type Issues: Converts text representations to proper numeric formats
📏 Inconsistent Formats: Handles spaces in numbers ("12 34" → 1234)

Try the "Dirty Demo Data" option to see cleaning in action!

📋 Data Requirements

Format: CSV file
Minimum Size: At least 10 rows (after cleaning)
Columns: At least 2 numeric columns (after conversion)
Missing Data: Up to 50% missing values (will be cleaned automatically)
Variable Types: Any data types (will be converted to numeric where possible)

🛠️ Programmatic Usage

You can also use the analysis tools programmatically:

from regression_analyzer.analysis_tools import (
    load_and_preprocess_data,
    identify_top_factors,
    perform_regression_analysis,
    generate_formula_and_insights
)
import pandas as pd

# Load your data
df = pd.read_csv("your_data.csv")

# Run analysis
top_factors = identify_top_factors(df, target_column="your_target")
model_results = perform_regression_analysis(df, "your_target", list(top_factors.keys()))
formulas = generate_formula_and_insights(df, "your_target", model_results)

print("Top factors:", top_factors)
print("Best model R²:", max([r['r2_score'] for r in model_results.values()]))

🎯 Use Cases

Business Analytics: Identify factors affecting sales, revenue, or performance
Scientific Research: Analyze experimental data and variable relationships
Financial Analysis: Understand factors influencing financial metrics
Healthcare: Analyze factors affecting patient outcomes
Marketing: Determine which factors drive customer behavior
Operations: Optimize processes by understanding key variables

🧠 How It Works

Data Preprocessing: Handles missing values, validates data structure
Factor Identification: Uses multiple methods:
- Pearson correlation analysis
- F-test statistical significance
- Random Forest feature importance
- Recursive feature elimination
Model Training: Tests multiple regression models:
- Linear Regression
- Ridge Regression (L2 regularization)
- Lasso Regression (L1 regularization)
Formula Generation: Creates mathematical representations
Insight Generation: LLM-powered analysis and recommendations

📊 Example Output

Top Factors Identified:

1. square_footage    0.8234
2. location_score    0.7891
3. age_of_building  -0.6543
4. num_bedrooms      0.5321
5. amenities_score   0.4567

Generated Formula:

price = 245.67 + 156.23 * square_footage + 89.45 * location_score - 12.34 * age_of_building

Key Insights:

Square footage has the strongest positive impact on price
Location score is the second most important factor
Building age negatively affects price
Model achieves 89.4% accuracy (R² = 0.894)

🤖 Google ADK Integration

This tool integrates with Google's Agent Development Kit for enhanced LLM capabilities:

# To use with ADK web interface
adk serve

# To use with ADK CLI
adk run regression_analyzer

🔧 Development

Project Structure

multiregal/
├── regression_analyzer/          # Core analysis package
│   ├── __init__.py
│   ├── agent.py                 # LLM agent integration
│   └── analysis_tools.py        # Statistical analysis functions
├── utils/                       # Web interface utilities
│   ├── __init__.py
│   └── interface_helpers.py     # Streamlit helper functions
├── app.py                       # Main Streamlit application
├── run_app.py                   # Launch script
├── demo.py                      # Command-line demo
├── test_installation.py         # Installation tester
├── sample_data.csv              # Demo dataset
└── requirements.txt             # Dependencies

Running Tests

python test_installation.py

Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

📄 License

This project is open source and available under the MIT License.

🆘 Support

If you encounter any issues:

Check the data requirements section
Ensure all dependencies are installed correctly
Try the demo data first to verify the installation
Check the console/terminal for error messages

For additional help, please open an issue in the repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔮 MultiRegal - Automatic Multivariable Regression Analysis Calculator

✨ Features

🚀 Quick Start

Installation

Launch Web Interface

📱 Web Interface Usage

1. Upload Data

2. Configure Analysis

3. Run Analysis

4. Explore Results

5. Export Results

🧹 Data Cleaning Capabilities

📋 Data Requirements

🛠️ Programmatic Usage

🎯 Use Cases

🧠 How It Works

📊 Example Output

🤖 Google ADK Integration

🔧 Development

Project Structure

Running Tests

Contributing

📄 License

🆘 Support

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
regression_analyzer.egg-info		regression_analyzer.egg-info
regression_analyzer		regression_analyzer
utils		utils
README.md		README.md
app.py		app.py
demo.py		demo.py
dirty_sample_data.csv		dirty_sample_data.csv
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_app.py		run_app.py
sample_data.csv		sample_data.csv
test_installation.py		test_installation.py

ThyFriendlyFox/multiregal

Folders and files

Latest commit

History

Repository files navigation

🔮 MultiRegal - Automatic Multivariable Regression Analysis Calculator

✨ Features

🚀 Quick Start

Installation

Launch Web Interface

📱 Web Interface Usage

1. Upload Data

2. Configure Analysis

3. Run Analysis

4. Explore Results

5. Export Results

🧹 Data Cleaning Capabilities

📋 Data Requirements

🛠️ Programmatic Usage

🎯 Use Cases

🧠 How It Works

📊 Example Output

🤖 Google ADK Integration

🔧 Development

Project Structure

Running Tests

Contributing

📄 License

🆘 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages