Skip to content

Blackfall-Labs/hush

Repository files navigation

Hush

Spatial Spiking Neural Network for Speech Recognition

A biologically-plausible speech recognition system using spiking neural networks. No gradient descent. No backpropagation. No training loops. Just exposure, association, and self-organization.

Philosophy

No training. Exposure and self-adaptation.

The network starts minimal. Speech data shapes the structure through trial and error. Neurons spawn where needed, connections form through correlation, unused paths die. The problem sculpts the solution.

This is not machine learning in the traditional sense. There are no:

  • Loss functions
  • Gradient descent
  • Backpropagation
  • Weight matrices
  • Epochs or batches

Instead, learning happens through:

  • Exposure — streaming audio through the network
  • Association — binding MFCC patterns to teacher characters
  • Importance scoring — biological tagging of useful vs noise patterns
  • Consolidation — sleep-like memory cleanup between learning phases
  • Prediction-surprise — sequence expectations boost learning

Results

Configuration Accuracy
Base SNN only 35-40%
+ Importance weighting 79%
+ Onset suppression 85%
+ Stability gating 90%
+ Learned LexicalBank 100%

100% accuracy on test batch with only 169 learned word mappings (vs 50k+ dictionary entries in traditional systems).

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│  STAGE 1: PRIMARY AUDITORY CORTEX (SpatialSpeechNet)                   │
│                                                                         │
│  MFCC → [Onset Suppression] → [Stability Gate] → Motor Output          │
│              (5 frames)          (>85% sim)        (raw chars)         │
└─────────────────────────────────────────────────────────────────────────┘
                                    ↓
                           raw: "ilustration"
                                    ↓
┌─────────────────────────────────────────────────────────────────────────┐
│  STAGE 2: WERNICKE'S AREA (LexicalBank)                                │
│                                                                         │
│  Raw chars → [Learned Mappings] → [Importance Weighted] → Refined      │
└─────────────────────────────────────────────────────────────────────────┘
                                    ↓
                         refined: "illustration"

Two-Stage Biological Pipeline

The system models the real auditory cortex → Wernicke's area pathway:

  1. Primary Auditory Cortex (SpatialSpeechNet)

    • 26 sensory neurons (MFCC input)
    • 29 motor neurons (alphabet output)
    • 32 memory neurons (databank interface)
    • Character-level pattern recognition
    • Produces raw transcription with systematic errors
  2. Wernicke's Area (LexicalBank)

    • Learned word-level refinement
    • Only applies high-importance (proven) corrections
    • Self-correcting through feedback

Key Innovations

Onset Suppression

  • First 5 frames after speech onset are suppressed
  • Biological basis: auditory cortex shows ~50-100ms adaptation at sound onset
  • Fixes prepended character errors ("yes" → "syes")

Stability Gating

  • Only process frames with >85% similarity to previous
  • Biological basis: neurons are most discriminative during stable periods
  • Reduces false associations at phoneme boundaries

Importance Scoring

  • Patterns tagged with importance (0-255)
  • Correct recall: +16 importance
  • Wrong recall: -8 importance
  • Low-importance patterns pruned during consolidation
  • Self-correcting: bad patterns decay, good ones persist

Mastery-Based Curriculum

  • Start with 10 samples, repeat until mastery
  • Expand batch size by 1.5x on advancement
  • Consolidate memory between grades (like sleep)

Usage

Prepare Data

Convert audio to pre-processed MFCC spool:

cargo run --bin hush-prepare -- \
    --manifest data/librispeech/manifest.json \
    --output data/dev-clean.spool

Train

cargo run --release --bin expose -- \
    --spool data/dev-clean.spool \
    --sort-by-length \
    --max-transcript-len 20 \
    --initial-batch 10 \
    --mastery-threshold 0.40 \
    --target-accuracy 0.50

Output

╔════════════════════════════════════════════════════════════════╗
║                    LEARNING COMPLETE                           ║
╠════════════════════════════════════════════════════════════════╣
║ CURRICULUM                                                     ║
║   Grades completed:       3                                    ║
║   Total passes:          21                                    ║
╠════════════════════════════════════════════════════════════════╣
║ PERFORMANCE                                                    ║
║   Total time:           45.23 s                                ║
║   Frames processed:    105847                                  ║
║   Ticks executed:      423388                                  ║
║   Frames/sec:          2341.2  (23.4x real-time)               ║
║   Ticks/sec:           9364.8                                  ║
╠════════════════════════════════════════════════════════════════╣
║ NETWORK STRUCTURE                                              ║
║   Neurons total:         64  (active: 48, healthy: 61)         ║
║   Synapses total:       847                                    ║
║     Sensory→Memory:     156                                    ║
║     Memory→Motor:        89                                    ║
╠════════════════════════════════════════════════════════════════╣
║ MEMORY BANKS                                                   ║
║   Associations:         129                                    ║
║   Sequences:              0                                    ║
║   Lexical mappings:     169  (127 high-importance)             ║
╠════════════════════════════════════════════════════════════════╣
║ IMPORTANCE SCORING                                             ║
║   Low (noise):           42                                    ║
║   High (useful):         87                                    ║
║   Average:             142.3                                   ║
╠════════════════════════════════════════════════════════════════╣
║ RESOURCE USAGE                                                 ║
║   Est. memory:          12.38 KB                               ║
║   Bytes/neuron:          145  (39 base + synapses)             ║
╚════════════════════════════════════════════════════════════════╝

Biological Plausibility

Feature Biological Basis
No backprop Local learning rules only (Hebbian-like)
Spike-driven Communication via discrete spikes
Spatial structure Neurons exist in 3D, proximity-based connectivity
Memory separation Databanks external (like hippocampus)
Sleep consolidation Offline memory cleanup and strengthening
Importance tagging Neuromodulator-like gating
Onset adaptation Auditory cortex ~50-100ms adaptation
Stability gating Discriminative during stable periods
Hierarchical processing Primary cortex → Wernicke's area
Lexical access Word-form vocabulary matching

Key Insights

  1. Over-connect then prune beats sparse-grow
  2. Curriculum matters — short samples first
  3. Surprise drives learning — unexpected correct = strong reinforcement
  4. Consolidation is essential — without it, memory bloats with noise
  5. Two-stage refinement — neither stage perfect alone, together they work
  6. Learned corrections beat static rules — 169 mappings > 50k dictionary

Project Structure

src/
├── spatial.rs      # SpatialSpeechNet - onset suppression, stability gating
├── memory.rs       # SpeechIO, AssociationBank, LexicalBank
├── bin/
│   └── expose.rs   # Two-stage pipeline: SNN → LexicalBank refinement
├── mfcc.rs         # MFCC extraction
├── spool.rs        # Audio spool reading
└── decoding.rs     # CTC-like decoding with sustained-first-char

Dependencies

License

MIT OR Apache-2.0

References

  • DeepSpeech architecture (for comparison, not implementation)
  • Biological auditory cortex processing
  • Wernicke's area and lexical access
  • Hebbian learning and spike-timing-dependent plasticity
  • Memory consolidation during sleep

Built by Blackfall Labs

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

  •  

Packages

 
 
 

Contributors

Languages