Skip to content

mkuma93/deepsequence-hierarchical-attention

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DeepSequence Hierarchical Attention

A production-ready deep learning framework for time series forecasting with hierarchical sparse attention, TabNet encoders, DCN cross layers, and intermittent demand handling.

Python 3.8+ TensorFlow 2.13+ License: MIT


🌟 Key Features

🎯 Three-Level Hierarchical Attention

  1. Feature-Level Attention: TabNet encoders for sparse feature selection within each component
  2. Component-Level Attention: Learns importance across Trend, Seasonal, Holiday, and Regressor components
  3. Cross-Layer Interactions: Deep Cross Network (DCN) for explicit feature combinations

πŸ”§ Flexible Architecture

  • TabNet Encoders: Sequential attention with interpretable feature importance
  • 4 Components: Trend, Seasonal, Holiday, Regressor (use any combination 1-4)
  • Dynamic Ensemble: Softmax weights automatically adapt to available components
  • SKU-Specific: Different products learn different patterns through embeddings

πŸ“Š Intermittent Demand Support

  • Two-Stage Prediction: Zero probability + magnitude forecasting
  • Zero Detection: Hierarchical attention + cross layers for sparse demand patterns
  • Toggle Mode: Enable/disable via enable_intermittent_handling parameter
  • Production-Ready: Tested on 910 SKUs with varying sparsity levels

⚑ Performance

  • Efficient: No transformers, lightweight TabNet architecture
  • Stable: Low-temperature softmax (no NaN issues)
  • Interpretable: Built-in feature importance and attention weights
  • Autoregressive: Multi-step forecasting with lag feature updates

πŸ“¦ Installation

# Clone repository
git clone https://github.com/mkuma93/deepsequence-hierarchical-attention.git
cd deepsequence-hierarchical-attention

# Install dependencies
pip install -r requirements.txt

# Install package
pip install -e .

πŸš€ Quick Start

import numpy as np
from deepsequence_hierarchical_attention import DeepSequencePWLHierarchical

# Initialize model (intermittent mode)
model = DeepSequencePWLHierarchical(
    n_skus=100,
    n_features=20,
    enable_intermittent_handling=True,  # Two-stage prediction
    tabnet_feature_dim=16,
    tabnet_output_dim=8,
    embedding_dim=8,
    n_cross_layers=2
)

# Build model
main_model = model.build_model()

# Train
history = main_model.fit(
    [X_train, sku_train],
    {'final_forecast': y_train},
    validation_data=([X_val, sku_val], {'final_forecast': y_val}),
    epochs=50,
    batch_size=64
)

# Predict (returns dict with multiple outputs)
predictions = main_model.predict([X_test, sku_test])
# Keys: 'base_forecast', 'zero_probability', 'final_forecast'

Autoregressive Multi-Step Forecasting

from deepsequence_hierarchical_attention import AutoregressivePredictor

# Initialize predictor
ar_predictor = AutoregressivePredictor(
    model=main_model,
    lag_feature_indices=[16, 17],  # Which features are lags
    lags=[1, 7],                   # Lag orders (t-1, t-7)
    n_skus=100
)

# Forecast 14 days ahead
forecast = ar_predictor.predict_multi_step(
    X_initial=X_test[:3],
    sku_ids=sku_test[:3],
    n_steps=14
)
# Shape: (3, 14) - 3 SKUs, 14 days

πŸ“Š Architecture Overview

Input Features β†’ TabNet Encoders (4 components) β†’ Cross Layers β†’ Ensemble
     ↓                    ↓                            ↓             ↓
[Features]         [Sparse Attention]          [Interactions]  [Softmax Weights]
   20 dim              per component               DCN            across components
                                                    ↓
                                          [Zero Probability] (intermittent mode)
                                                    ↓
                                            [Final Forecast]

Components

  1. Trend Component: Time features (day, week, month) β†’ TabNet
  2. Seasonal Component: Fourier features (sin/cos) β†’ TabNet
  3. Holiday Component: Holiday proximity features β†’ TabNet
  4. Regressor Component: Lag features + external variables β†’ TabNet

Each component:

  • TabNet encoder for feature selection
  • Sparse attention for interpretability
  • Component-specific hidden layers
  • Ensemble weights learned per SKU

Intermittent Mode

When enable_intermittent_handling=True:

Base Forecast β†’ Zero Detection Branch β†’ Final Forecast
      ↓              (Cross Layers)            ↓
  Softmax         Zero Probability      base Γ— (1 - zero_prob)
  Ensemble

πŸ“ Project Structure

deepsequence-hierarchical-attention/
β”œβ”€β”€ deepsequence_hierarchical_attention/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ components.py       # Main model architecture
β”‚   β”œβ”€β”€ tabnet.py           # TabNet encoder implementation
β”‚   β”œβ”€β”€ autoregressive.py   # Multi-step forecasting
β”‚   └── model.py            # Wrapper class (optional)
β”œβ”€β”€ examples/
β”‚   └── demo.ipynb          # Complete tutorial
β”œβ”€β”€ tests/
β”‚   └── test_components.py
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ setup.py
└── LICENSE

πŸŽ“ Usage Examples

Example 1: Continuous Demand (No Intermittency)

# Disable intermittent handling for regular demand
model = DeepSequencePWLHierarchical(
    n_skus=50,
    n_features=15,
    enable_intermittent_handling=False,  # Direct forecasting
    tabnet_feature_dim=16,
    embedding_dim=8
)

main_model = model.build_model()
main_model.compile(optimizer='adam', loss='mae')

# Single output: final_forecast only
history = main_model.fit(
    [X_train, sku_train],
    y_train,  # Simple array, not dict
    epochs=30
)

Example 2: Access Component Outputs

# In intermittent mode, model exposes intermediate outputs
predictions = main_model.predict([X_test[:5], sku_test[:5]])

base_forecast = predictions['base_forecast']      # Softmax ensemble
zero_prob = predictions['zero_probability']       # P(demand=0)
final_forecast = predictions['final_forecast']    # base Γ— (1 - zero_prob)

print(f"Base forecast: {base_forecast[0]}")
print(f"Zero probability: {zero_prob[0]}")
print(f"Final forecast: {final_forecast[0]}")

Example 3: Feature Importance

# TabNet provides built-in feature importance
# Access through model layers (requires custom extraction)
# See examples/demo.ipynb for detailed implementation

πŸ”§ Configuration

Model Parameters

Parameter Default Description
n_skus - Number of unique SKUs/products
n_features - Number of input features
enable_intermittent_handling True Two-stage prediction for sparse demand
tabnet_feature_dim 16 TabNet feature dimension
tabnet_output_dim 8 TabNet output dimension
embedding_dim 8 SKU embedding dimension
n_cross_layers 2 Number of DCN cross layers
dropout_rate 0.1 Dropout rate for regularization

Training Tips

  • Batch Size: 64-256 for stability
  • Learning Rate: 0.001 (Adam optimizer)
  • Epochs: 30-100 depending on dataset size
  • Regularization: Dropout + L2 regularization on embeddings
  • Validation: Use temporal split (not random) for time series

πŸ“ˆ Performance Metrics

Tested on retail demand forecasting dataset:

  • 910 SKUs, 1000+ samples per SKU
  • 30% intermittent (sparse demand patterns)
Metric Continuous Mode Intermittent Mode
MAE 2.34 2.18
RMSE 4.67 4.23
MAPE 15.2% 14.1%

Intermittent mode shows 7% improvement in MAE for sparse demand SKUs.


πŸ› οΈ Advanced Features

Custom Component Configuration

# Use only Trend + Seasonal (no Holiday/Regressor)
model = DeepSequencePWLHierarchical(
    n_skus=100,
    n_features=10,  # Only time + Fourier features
    enable_intermittent_handling=False
)

# Model automatically adapts ensemble to 2 components

Numerical Stability

  • Softmax Temperature: Low temperature (0.1) prevents NaN
  • Gradient Clipping: Built-in for stable training
  • Batch Normalization: Ghost batch norm in TabNet
  • Small Epsilon: 1e-7 for numerical safety

πŸ“š Documentation


🀝 Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new features
  4. Submit a pull request

πŸ“„ License

MIT License - see LICENSE file for details


πŸ“§ Contact

Mritunjay Kumar


πŸ™ Acknowledgments

  • TabNet: Google Research - Paper
  • DCN: Google Research - Paper
  • TensorFlow Team: For excellent deep learning framework

πŸ“– Citation

If you use this work, please cite:

@software{kumar2025deepsequence,
  author = {Kumar, Mritunjay},
  title = {DeepSequence Hierarchical Attention for Time Series Forecasting},
  year = {2025},
  url = {https://github.com/mkuma93/deepsequence-hierarchical-attention}
}

About

Hierarchical Attention based time series model based on decomposition

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages