A comprehensive, production-ready machine learning system for classifying GNSS (Global Navigation Satellite System) signals as Line-of-Sight (LOS) or Non-Line-of-Sight (NLOS) using satellite signal characteristics. Features both a modern web interface and command-line pipeline capabilities.
- Features
- Demo
- Quick Start
- Installation
- Usage
- Data Format
- Machine Learning Pipeline
- Model Performance
- API Documentation
- Configuration
- Project Structure
- Development
- Troubleshooting
- Contributing
- Citation
- License
- Support
- Multi-Model Training: Trains and compares 4 machine learning algorithms:
- Random Forest
- Gradient Boosting
- XGBoost
- Support Vector Machine (SVM)
- Automated Feature Engineering: 20+ derived features including:
- Geometric features (elevation categories, azimuth sectors)
- Signal quality features (SNR trends, moving averages)
- Temporal features (hour of day, day of week, weekends)
- Interaction features (elevation-SNR, azimuth-SNR)
- Comprehensive Data Preprocessing:
- Automatic outlier detection (IQR/Z-score methods)
- Missing value imputation
- Feature scaling and normalization
- PCA dimensionality reduction
- Advanced Model Evaluation:
- Confusion matrices
- ROC curves with AUC scores
- Precision-Recall curves
- Feature importance plots
- SHAP values for model interpretability
- Dual Interface:
- Modern web UI for non-technical users
- CLI pipeline for batch processing and automation
- Intuitive 4-step workflow
- Real-time progress tracking
- Interactive visualizations
- Downloadable results and trained models
- Support for Excel (.xlsx, .xls) file uploads
- Maximum file size: 16MB
- Upload Data: Upload your LOS and NLOS Excel files
- Process: Automated data preprocessing and feature engineering
- Download: Get processed training/testing datasets
- Train & Evaluate: Train models and view performance metrics
# Clone the repository
git clone https://github.com/yourusername/GNSS-CLASSIFICATION.git
cd GNSS-CLASSIFICATION/gnss_classification
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run the web application
python app.py
# Access at http://localhost:5000- Python: 3.8 or higher
- pip: Latest version recommended
- Virtual environment: Strongly recommended
git clone https://github.com/yourusername/GNSS-CLASSIFICATION.git
cd GNSS-CLASSIFICATION/gnss_classificationLinux/Mac:
python3 -m venv venv
source venv/bin/activateWindows:
python -m venv venv
venv\Scripts\activatepip install --upgrade pip
pip install -r requirements.txtpython -c "import flask, pandas, sklearn, xgboost, shap; print('All dependencies installed successfully!')"python app.pyThe server will start at http://localhost:5000 (default) or http://0.0.0.0:5000 for network access.
Step 1: Upload Data Files
- Navigate to
http://localhost:5000 - Select your LOS data Excel file
- Select your NLOS data Excel file
- Click "Upload Files"
Step 2: Process Data
- Click "Process Data" after successful upload
- System combines datasets and performs 80-20 train-test split
- Features are automatically engineered and scaled
Step 3: Download Processed Data (Optional)
- Download
X_train.csv,y_train.csvfor training set - Download
X_test.csv,y_test.csvfor testing set
Step 4: Train Models
- Click "Train Models" to start training pipeline
- View real-time performance metrics
- Download comprehensive results including:
- Model comparison CSV
- Confusion matrices (PNG)
- ROC curves (PNG)
- Feature importance plots (PNG)
python run_pipeline.pyThis runs the complete pipeline:
- Data preprocessing
- Feature engineering
- Model training
- Model evaluation
# Modify run_pipeline.py to use custom paths
preprocessor = DataPreprocessor('path/to/your/data.csv')The CLI generates:
data/processed/: Cleaned and split datasetsdata/processed/engineered/: Feature-engineered datasetsmodels/: Trained model files (.joblib)results/: Evaluation metrics and visualizations
Your Excel files must contain the following columns:
| Column | Type | Range | Description |
|---|---|---|---|
Year |
Integer | - | Year of observation |
Month |
Integer | 1-12 | Month of observation |
Date |
Integer | 1-31 | Day of month |
Hour |
Integer | 0-23 | Hour of day |
Min |
Integer | 0-59 | Minute |
Sec |
Float | 0-59.999 | Second |
PRN |
Integer | 1-32 | Satellite PRN number |
Elevation |
Float | 0-90 | Elevation angle (degrees) |
Azimuth |
Float | 0-360 | Azimuth angle (degrees) |
SNR |
Float | >0 | Signal-to-Noise Ratio (dB-Hz) |
Label |
Integer | 0 or 1 | 1 = LOS, 0 = NLOS |
See data/examples/sample_data.xlsx for a properly formatted example.
- Excel:
.xlsx,.xls - CSV:
.csv(for CLI pipeline)
- Loading: Reads Excel/CSV files
- Validation: Ensures all required columns exist
- Range Checks: Validates elevation (0-90°), azimuth (0-360°)
- Missing Values: Median imputation for numerical, mode for categorical
- Outlier Removal: IQR-based outlier detection
- Temporal Features: Extracts hour_of_day, day_of_week, month, is_weekend
- Scaling: StandardScaler normalization
- Splitting: 80% training, 20% testing (stratified)
Geometric Features:
- Elevation categories (low: <30°, medium: 30-60°, high: >60°)
- Azimuth sectors (N, E, S, W)
- Elevation-azimuth interaction
Signal Features:
- SNR quartile categories
- SNR rate of change
- 5-point moving average SNR
Satellite Features:
- Satellite count per observation window
- Satellite diversity (standard deviation)
Dimensionality Reduction:
- PCA with 95% variance retention
Algorithms & Hyperparameters:
| Model | Hyperparameters Tuned |
|---|---|
| Random Forest | n_estimators: [100, 200, 300] max_depth: [10, 20, None] min_samples_split: [2, 5, 10] |
| Gradient Boosting | n_estimators: [100, 200, 300] learning_rate: [0.01, 0.1, 0.2] max_depth: [3, 5, 7] |
| XGBoost | n_estimators: [100, 200, 300] learning_rate: [0.01, 0.1, 0.2] max_depth: [3, 5, 7] |
| SVM | C: [0.1, 1, 10] kernel: ['rbf', 'poly'] gamma: ['scale', 'auto'] |
Optimization:
- GridSearchCV with 5-fold stratified cross-validation
- F1-score as primary metric
- Automatic best model selection
Metrics Calculated:
- Accuracy
- Precision
- Recall
- F1-Score
- ROC AUC
- Average Precision
Visualizations Generated:
- Confusion matrices (heatmaps)
- ROC curves with AUC scores
- Precision-Recall curves
- Feature importance rankings
- SHAP summary plots for interpretability
- Model comparison bar charts
Typical performance on GNSS classification tasks:
| Model | Accuracy | Precision | Recall | F1-Score | AUC |
|---|---|---|---|---|---|
| Random Forest | ~92% | ~90% | ~93% | ~91% | ~0.96 |
| Gradient Boosting | ~91% | ~89% | ~92% | ~90% | ~0.95 |
| XGBoost | ~93% | ~91% | ~94% | ~92% | ~0.97 |
| SVM | ~88% | ~86% | ~89% | ~87% | ~0.93 |
Note: Performance varies based on dataset quality and characteristics
Upload LOS and NLOS Excel files.
Request:
los_file: Excel file (multipart/form-data)nlos_file: Excel file (multipart/form-data)
Response:
{
"status": "success",
"message": "Files uploaded successfully",
"los_shape": [1000, 11],
"nlos_shape": [800, 11]
}Process uploaded data (combine, split, engineer features).
Response:
{
"status": "success",
"train_size": 1440,
"test_size": 360
}Train all models and generate evaluation metrics.
Response:
{
"status": "success",
"models_trained": ["random_forest", "gradient_boosting", "xgboost", "svm"],
"best_model": "xgboost",
"metrics": { ... }
}Download processed data or results.
Parameters:
type:train|test|results
Create a .env file for configuration:
# Flask Configuration
FLASK_ENV=development
FLASK_DEBUG=True
SECRET_KEY=your-secret-key-here
# Upload Configuration
MAX_CONTENT_LENGTH=16777216 # 16MB in bytes
UPLOAD_FOLDER=uploads/
# Model Configuration
MODEL_PATH=models/
RESULTS_PATH=results/Modify config.py for advanced configuration:
- Model hyperparameter grids
- Feature engineering parameters
- Data preprocessing thresholds
- Logging levels
GNSS-CLASSIFICATION/
├── gnss_classification/
│ ├── app.py # Flask web application
│ ├── run_pipeline.py # CLI pipeline orchestrator
│ ├── requirements.txt # Python dependencies
│ ├── config.py # Configuration settings
│ ├── src/
│ │ ├── __init__.py
│ │ ├── data_preprocessing.py # Data loading & preprocessing
│ │ ├── feature_engineering.py # Feature creation & PCA
│ │ ├── model_training.py # Model training & tuning
│ │ └── evaluation.py # Model evaluation & visualization
│ ├── templates/
│ │ └── index.html # Web UI template
│ ├── data/
│ │ ├── raw/ # Raw input data
│ │ ├── processed/ # Processed datasets
│ │ └── examples/ # Example data files
│ ├── models/ # Saved trained models (.joblib)
│ ├── results/ # Evaluation results & plots
│ └── uploads/ # Temporary file uploads
├── docs/ # Detailed documentation
│ ├── API.md # API reference
│ ├── DATA_FORMAT.md # Data format specifications
│ ├── TROUBLESHOOTING.md # Common issues & solutions
│ └── DEVELOPMENT.md # Development guide
├── tests/ # Unit and integration tests
├── examples/ # Usage examples & notebooks
│ └── example_usage.ipynb # Jupyter notebook tutorial
├── .github/
│ └── workflows/
│ └── ci.yml # GitHub Actions CI/CD
├── .gitignore
├── README.md
├── LICENSE
├── CONTRIBUTING.md
├── CHANGELOG.md
└── setup.py # Package installation script
# Clone and install in editable mode
git clone https://github.com/yourusername/GNSS-CLASSIFICATION.git
cd GNSS-CLASSIFICATION
pip install -e .
# Install development dependencies
pip install -r requirements-dev.txt# Run all tests
pytest
# Run with coverage
pytest --cov=gnss_classification --cov-report=html# Format code
black gnss_classification/
# Lint code
flake8 gnss_classification/
# Type checking
mypy gnss_classification/ImportError: No module named 'xgboost'
Solution: Ensure all dependencies are installed:
pip install -r requirements.txtError: File too large
Solution: Check file size (<16MB) or increase MAX_CONTENT_LENGTH in config.
MemoryError during training
Solution:
- Reduce hyperparameter grid size
- Use smaller dataset
- Increase system RAM
KeyError: 'PRN'
Solution: Ensure Excel file has all required columns (see Data Format).
See docs/TROUBLESHOOTING.md for comprehensive solutions.
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
If you use this code in your research, please cite:
@software{gnss_classification_2024,
title = {GNSS Line-of-Sight Classification System},
author = {Your Name},
year = {2024},
url = {https://github.com/yourusername/GNSS-CLASSIFICATION}
}This project is licensed under the MIT License - see the LICENSE file for details.
- Documentation: Check the docs/ directory
- Issues: GitHub Issues
- Discussions: GitHub Discussions
For questions or collaboration opportunities:
- Email: your.email@example.com
- LinkedIn: Your Profile
- Developed for GNSS signal analysis and urban canyon navigation research
- Built with scikit-learn, XGBoost, Flask, and other open-source libraries
- Inspired by research in satellite positioning and machine learning
Made with ❤️ for the GNSS research community