Skip to content

Dataset revamp + training fixes#5

Open
yaoderek wants to merge 4 commits intoDenolle-Lab:mainfrom
yaoderek:main
Open

Dataset revamp + training fixes#5
yaoderek wants to merge 4 commits intoDenolle-Lab:mainfrom
yaoderek:main

Conversation

@yaoderek
Copy link
Copy Markdown
Contributor

Data (download_AK_only_data):
• Window length: 5–10s → 60s (captures full P-wave arrival and decay)
• P-wave alignment: human-validated times via libcomcat (ComCat API)
• Noise: separate quiet-period download (60s, ≥2h from any event), no pre-P segments
• Output: same .npy/.csv layout; metadata includes p_arrival_source, window_type

Training (train_cnn_multiclass):
• Augmentation (train only): 2% noise, ±0.25s shift, ±10% amplitude
• Regularization: ReduceLROnPlateau (patience 4), early stop (patience 10), grad clip 2.0, dropout 0.4, weight_decay 1e-3
• Class weights for imbalance; 80/10/10 stratified split
• Val acc ~94–95% (train ~91%) vs ~86–87% val with prior short-window data

Fixes:
• Labels: support [0,2] from data, remap to [0,1]; class_name_map for display
• Conv1d input: ensure [batch, channels, length] (squeeze/unsqueeze in AugmentedDataset)
• Defs: num_classes, batch_size, DataAugmentation in correct cells; Remove ReduceLROnPlateau verbose

yaoderek and others added 4 commits February 22, 2026 19:22
…-only download

Data (download_AK_only_data):
• Window length: 5–10s → 60s (captures full P-wave arrival and decay)
• P-wave alignment: human-validated times via libcomcat (ComCat API)
• Noise: separate quiet-period download (60s, ≥2h from any event), no pre-P segments
• Output: same .npy/.csv layout; metadata includes p_arrival_source, window_type

Training (train_cnn_multiclass):
• Augmentation (train only): 2% noise, ±0.25s shift, ±10% amplitude
• Regularization: ReduceLROnPlateau (patience 4), early stop (patience 10), grad clip 2.0, dropout 0.4, weight_decay 1e-3
• Class weights for imbalance; 80/10/10 stratified split
• Val acc ~94–95% (train ~91%) vs ~86–87% val with prior short-window data

Fixes:
• Labels: support [0,2] from data, remap to [0,1]; class_name_map for display
• Conv1d input: ensure [batch, channels, length] (squeeze/unsqueeze in AugmentedDataset)
• Defs: num_classes, batch_size, DataAugmentation in correct cells; Remove ReduceLROnPlateau verbose
• Small-test warning when test set < 100 samples

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant