HelpModel: Explainable LSTM + Attention with DVC and TensorBoard

1. Overview

HelpModel trains an LSTM (optionally bidirectional and with an intrinsically explainable attention mechanism) to predict request_help events in educational temporal interaction sequences. The project integrates:

TensorFlow/Keras for modeling
A custom attention mechanism (serializable, mask-aware)
DVC for reproducible data & experiment pipeline
TensorBoard for metric and training dynamics visualization
Export of a standalone attention submodel for post‑hoc explainability

2. End-to-End Pipeline (DVC Stages)

(Optional) download: Download or generate raw source data.
transformation: Convert raw sources into unified CSV.
dataanalysis: Exploratory analysis + SHAP computations.
featureselection: Produce filtered feature set.
train: Train main LSTM model (with/without attention).
test_train: Inference, attention extraction, prediction visualization.
(Optional added) attentionheatmap: Generate a focused heatmap + architecture diagram from a chosen test sequence.

Graph:

dvc dag

Reproduce full pipeline:

dvc repro

Run only training:

dvc repro train

3. Repository Structure

├── dvc.yaml                 # DVC stages definition
├── dvc.lock                 # Locked versions of stages after repro
├── params.yaml              # Hyperparameters & configuration
├── src/
│   ├── train.py             # Training entrypoint
│   ├── test_train.py        # Inference + attention extraction
│   ├── generate_attention_heatmap.py  # Focused attention visualization (single sequence)
│   ├── keras_custom_layers.py         # Custom serializable attention-related functions
│   ├── data_analysis.py, featureselection.py, download.py, formatcsv*.py
├── model/                   # Saved models (.keras)
├── metrics/                 # Metrics, attention weights, predictions, DVC plots data
├── images/                  # Figures (architecture, heatmaps, sequence prediction plots)
├── logs/                    # TensorBoard event logs
├── data/                    # Processed input data (after feature selection, etc.)
├── requirements.txt
├── README.md
└── LICENSE

4. Installation

Requirements:

Python 3.11+
(macOS example) Optional Metal acceleration for TensorFlow.

Setup:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

GPU (macOS Metal) if missing:

pip install tensorflow-macos tensorflow-metal

5. Key Parameters (`params.yaml` > `model`)

Param	Purpose
`mask_value`	Padding value used for sequence masking.
`percentage_train_size`	% of sequences used for training split.
`lstm_units`	Units in principal LSTM layer.
`return_sequences`	Whether final LSTM layer returns full sequence. Must be true if attention is enabled.
`second_lstm_layer`	Adds an extra LSTM before the final one.
`use_dropouts`, `dropout_value`	Regularization settings.
`use_bidirectional`	Wrap final LSTM in Bidirectional.
`use_attention`	Enables intrinsic attention mechanism.
`initial_learning_rate`	Adam initial LR.
`training_epochs`, `training_batch_size`	Core training loop config.
`training_class_weights`	If true, compute class weights for imbalance.
`training_early_stopping_patience`	Early stopping patience epochs.
`training_reduce_lr_patience`, `training_reduce_lr_factor`	LR scheduler.
`show_summary`	Print model summary.

Adjust values then:

dvc repro train

6. Manual Training Execution

python src/train.py \
  --params-file params.yaml \
  --input-csv-file data/featureselection.csv \
  --output-model-file model/help_model.keras \
  --plots-file-name metrics/plots.csv \
  --metrics-file-name metrics/scores.json \
  --use-gpu \
  --output-dir images

Generates (when use_attention: true):

model/help_model.keras (full model)
model/help_model_attention.keras (attention submodel returning weights)
images/model.png (general architecture)
Metrics & per-epoch plots data.

7. Inference + Attention Extraction

python src/test_train.py params.yaml model/help_model.keras data/featureselection.csv \
  --output_dir images --use_gpu --test-sequences 15

Outputs:

metrics/test_predictions.csv (per-step real vs predicted probability)
metrics/test_attention.csv (aligned attention weights)
Sequence prediction & feature importance figures (unless --no-viz specified)

8. Attention Mechanism (Intrinsic Explainability)

If use_attention is enabled:

Dense(1, tanh) produces raw attention score per timestep.
Squeeze removes singleton dimension → shape (batch, time_steps).
A mask is computed from padded steps (any timestep with all features == mask_value).
Padded positions receive large negative values (effectively -inf) before softmax.
Softmax normalizes valid positions → attention_weights (probability distribution over real steps).
Element-wise weighting of sequence features is produced (preserving time dimension).
Final Dense(1, sigmoid) yields per-step probabilities of request_help.

Two complementary artifacts:

Full model: for actual predictions.
Attention submodel: returns only attention_weights for the same inputs, enabling auditing and temporal interpretability.

9. Programmatic Attention Retrieval

import tensorflow as tf
from keras_custom_layers import (compute_mask_layer, squeeze_last_axis_func,
    mask_attention_scores_func, apply_attention_func, AttentionLayer, MaskedRepeatVector)

model = tf.keras.models.load_model(
    "model/help_model.keras", compile=False,
    custom_objects={
        "compute_mask_layer": compute_mask_layer,
        "squeeze_last_axis_func": squeeze_last_axis_func,
        "mask_attention_scores_func": mask_attention_scores_func,
        "apply_attention_func": apply_attention_func,
        "AttentionLayer": AttentionLayer,
        "MaskedRepeatVector": MaskedRepeatVector
    }
)
att_model = tf.keras.models.load_model(
    "model/help_model_attention.keras", compile=False,
    custom_objects={
        "compute_mask_layer": compute_mask_layer,
        "squeeze_last_axis_func": squeeze_last_axis_func,
        "mask_attention_scores_func": mask_attention_scores_func,
        "apply_attention_func": apply_attention_func
    }
)
# batch_x shape: (batch, time_steps, features)
attention_weights = att_model.predict(batch_x)

10. Focused Attention Heatmap Stage

generate_attention_heatmap.py creates:

A single-sequence attention heatmap (images/attention_heatmap.png).
A model architecture diagram (images/model_lstm_attention.png). If Graphviz/pydot are missing, a placeholder diagram is created (ensuring DVC output consistency). Install real dependencies for a richer diagram:

pip install pydot graphviz
# macOS system graphviz
brew install graphviz

11. TensorBoard Integration

Event logs in logs/ (subdirs: train, validation, learning_rate). Launch:

tensorboard --logdir logs --port 6006

Navigate to http://localhost:6006.

12. Metrics & Artifacts Summary

File	Description
`metrics/scores.json`	Final aggregate metrics (loss, accuracy, precision, recall, AUC…).
`metrics/plots.csv`	Epoch-by-epoch logged metrics for DVC plots.
`metrics/test_predictions.csv`	Per-step ground truth vs probability.
`metrics/test_attention.csv`	Attention weights aligned to padded sequence length.
`images/model.png`	Trained model architecture snapshot.
`images/model_lstm_attention.png`	Attention architecture diagram (or placeholder).
`images/attention_heatmap.png`	Heatmap for selected sequence.
`model/help_model.keras`	Full predictive model.
`model/help_model_attention.keras`	Isolated attention submodel.

13. GPU Usage (macOS Metal)

Enable with --use-gpu. If GPU devices not listed, ensure tensorflow-macos + tensorflow-metal are installed. Quick check:

import tensorflow as tf
print(tf.config.list_logical_devices('GPU'))

14. Safe Serialization (No Inline Lambdas)

All functions used by Lambda layers are defined & registered in keras_custom_layers.py using @tf.keras.utils.register_keras_serializable. This prevents deserialization errors such as:

Could not locate function 'func'

By avoiding anonymous inline lambdas, we guarantee portability and stable loading with compile=False.

15. Common Issues & Resolutions

Symptom	Cause	Fix
Deserialization error for Lambda	Missing `@register_keras_serializable`	Ensure custom functions registered (already done).
Extra feature columns mismatch	Data pipeline changed post-training	Re-align preprocessing or truncate extra columns (script now truncates when safe).
Output file missing in DVC stage	Diagram failed (Graphviz absent)	Fallback placeholder now guarantees file creation.
Attention shape mismatch	Sequence dims differ from training	Validate & adjust feature dimension before prediction.

16. Adding a New Hyperparameter

Add under params.yaml.
Consume in train.py / test_train.py.
Reference in dvc.yaml stage params.
Run dvc repro.

17. Clean & Reproduce

rm -f model/help_model*.keras
rm -f metrics/*.json metrics/*.csv
rm -f images/*.png
dvc repro train

18. Sequence Visualization

test_train.py can produce per-sequence prediction plots and (if enabled) attention/importance figures. Disable visual generation with --no-viz for batch automation.

19. Design Rationale (Brief Reflection)

Intrinsic attention (instead of post-hoc methods only) provides immediate temporal interpretability and aligns with pedagogical intervention needs.
Mask-aware attention ensures padded timesteps cannot dilute probability mass.
Separate attention submodel lowers computational & cognitive overhead for downstream analytics.
DVC guarantees reproducibility for both data transformations and modeling decisions, crucial in educational settings where traceability is mandated.
Eliminating inline lambdas future-proofs serialization across TensorFlow versions and mixed deployment targets.

20. Suggested Future Enhancements

ONNX export for broader runtime compatibility.
Unified explainer combining attention + SHAP per timestep.
Walk-forward temporal validation to evaluate generalization drift.
Docker image for hermetic reproducibility.
Automated fairness auditing (e.g., segment-based performance).

21. Quick Commands

# Full pipeline
dvc repro
# Training only
dvc repro train
# Inference + attention
python src/test_train.py params.yaml model/help_model.keras data/featureselection.csv --output_dir images --test-sequences 10 --no-viz
# Focused heatmap stage (if defined in dvc.yaml)
dvc repro attentionheatmap
# TensorBoard
tensorboard --logdir logs
# Metrics
dvc metrics show
# Plots
dvc plots show

22. License

MIT License (see LICENSE file).

23. Citation

(If publishing, add BibTeX entry here.)

If you need deeper troubleshooting guidance or deployment documentation, open an issue or extend the relevant section.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HelpModel: Explainable LSTM + Attention with DVC and TensorBoard

1. Overview

2. End-to-End Pipeline (DVC Stages)

3. Repository Structure

4. Installation

5. Key Parameters (`params.yaml` > `model`)

6. Manual Training Execution

7. Inference + Attention Extraction

8. Attention Mechanism (Intrinsic Explainability)

9. Programmatic Attention Retrieval

10. Focused Attention Heatmap Stage

11. TensorBoard Integration

12. Metrics & Artifacts Summary

13. GPU Usage (macOS Metal)

14. Safe Serialization (No Inline Lambdas)

15. Common Issues & Resolutions

16. Adding a New Hyperparameter

17. Clean & Reproduce

18. Sequence Visualization

19. Design Rationale (Brief Reflection)

20. Suggested Future Enhancements

21. Quick Commands

22. License

23. Citation

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
.dvc		.dvc
data		data
dvc_plots		dvc_plots
images		images
logs		logs
metrics		metrics
model		model
src		src
.dvcignore		.dvcignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
params.yaml		params.yaml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

HelpModel: Explainable LSTM + Attention with DVC and TensorBoard

1. Overview

2. End-to-End Pipeline (DVC Stages)

3. Repository Structure

4. Installation

5. Key Parameters (params.yaml > model)

6. Manual Training Execution

7. Inference + Attention Extraction

8. Attention Mechanism (Intrinsic Explainability)

9. Programmatic Attention Retrieval

10. Focused Attention Heatmap Stage

11. TensorBoard Integration

12. Metrics & Artifacts Summary

13. GPU Usage (macOS Metal)

14. Safe Serialization (No Inline Lambdas)

15. Common Issues & Resolutions

16. Adding a New Hyperparameter

17. Clean & Reproduce

18. Sequence Visualization

19. Design Rationale (Brief Reflection)

20. Suggested Future Enhancements

21. Quick Commands

22. License

23. Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

5. Key Parameters (`params.yaml` > `model`)

Packages