Skip to content

WioN780/bird-watch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Bird Vocalization Event Detector (CRNN)

Author: Mark Slipenkyi

Date: 2025

Task: Multi-label audio classification and time-frequency event detection.

πŸ“Œ Project Overview

This project implements a Convolutional Recurrent Neural Network (CRNN) to identify and localize bird species in audio recordings. The model processes log-mel spectrograms to provide:

  1. Species Identification: Probability scores across 28 bird species.
  2. Temporal Localization: Identification of when a bird is singing.
  3. Frequency Localization: Identification of the frequency range (Hertz) of the vocalization.

πŸ“Š Dataset

The data is sourced from the Fully-annotated soundscape recordings from the Southwestern Amazon Basin.

  • Audio: 21 hours of 48 kHz recordings (resampled to 32 kHz).
  • Labels: 14,798 manually annotated time-frequency bounding boxes.
  • Scope: Focused on the 28 most frequent species (those with >150 occurrences) to mitigate class imbalance.

πŸ— Project Structure

β”œβ”€β”€β”€src
β”‚   β”‚   data_review.ipynb       # Exploratory Data Analysis
β”‚   β”‚   train_iso.ipynb         # Isolated training experiments
β”‚   β”œβ”€β”€β”€classes
β”‚   β”‚       BirdDataset.py      # Custom PyTorch Dataset (Log-Mel conversion)
β”‚   β”‚       BirdModels.py       # CRNN Architecture (CNN + GRU)
β”‚   β”œβ”€β”€β”€common
β”‚   β”‚       metrics.py          # Macro-Averaged F1, Dice, and Precision/Recall
β”‚   β”‚       plotting.py         # Result visualization
β”‚   └───scripts
β”‚           train_config.json   # Hyperparameters
β”‚           train_script.py     # Training pipeline in script form
└───stable_logs                 # Saved metrics and training plots

πŸš€ Model Architecture

The model utilizes a CNN backbone for spatial feature extraction from spectrograms, followed by Gated Recurrent Units (GRU) to capture the temporal rhythm of bird songs.

  • Primary Metric: Macro-Averaged F1 Score.
  • Loss Function: BCEWithLogitsLoss with positive weights for sparse labels.

πŸ“ˆ Results & Detections

The model demonstrates relative precision in identifying most recurring species in various environments with multiple overlapping vocalizations.

Training Progress

To improve the signal-to-noise ratio, the training dataset was augmented by removing background noise in Adobe Audition.

Because bird vocalizations are sparse, all classes were boosted during training using pos_weight in the BCEWithLogitsLoss function.

Below are examples of the model's ability to detect species like the Thrush-like Wren and Amazonian Grosbeak.

Thrush-like Wren

Amazonian Grosbeak

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors