Skip to content

michaeldvinci/syllabus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

syllabus

A Go web application that tracks audiobook series release dates by scraping Audible and Amazon. Features user authentication, database persistence, background scraping, and automatic refresh capabilities with a clean web UI and JSON API.

Data Collection & Terms of Service

⚠️ Important Notice: This application collects publicly available data from Audible and Amazon through automated web scraping for personal use only.

Compliance & Usage:

  • Personal Use: Intended for individual tracking of audiobook series you're interested in
  • Rate Limited: Implements intelligent delays and respectful scraping practices
  • Public Data Only: Extracts only publicly visible release information (titles, dates, series counts)
  • No Account Required: Does not access or require user accounts on either platform

Performance Expectations:

  • Initial Setup: First scrape may take 30-90 seconds, depending on series count
  • Background Updates: Automatic refreshes are optimized and much faster.
  • Rate Limiting: Intentional delays prevent overwhelming provider servers.

Terms of Service Considerations: Users should review Audible's Terms of Use and Amazon's Conditions of Use to ensure compliance with their personal use case. This tool is designed for personal tracking and does not violate automated access restrictions when used responsibly.


Originally created to replace manual maintenance of an Obsidian "database" for tracking audiobook release dates. Sample config is included to see how the screenshots below were created.

I call it syllabus because it's a list of things to read.

Perfect for homelab deployment with Docker Compose for automated audiobook series tracking.

Unified View

syllabus-unified

Tabbed View

syllabus-tabbed

Search

syllabus-desktop

Settings

syllabus-desktop

Features

Core Functionality

  • YAML Configuration: Parse audiobook series from a simple YAML file
  • Multi-Provider Scraping: Fetch data from both Audible and Amazon
  • Release Date Tracking: Extract latest and next release dates automatically
  • Database Persistence: SQLite database for reliable data storage
  • Background Processing: Multi-threaded background scraper with job queue
  • Real-time Updates: Server-sent events for live UI updates

User Experience

  • Authentication System: Secure login with role-based access (Admin/User)
  • Responsive Web UI: Clean, mobile-friendly interface
  • Auto-refresh: Configurable automatic data refresh (2-10 hours)
  • Manual Refresh: On-demand data refresh with progress tracking
  • iCal Export: Subscribe to release date calendar in your favorite app
  • Settings Panel: Manage refresh intervals and preferences

Technical Features

  • File Watching: Auto-reload when YAML configuration changes
  • Graceful Shutdown: Clean application termination handling
  • Docker Support: Full containerization with Docker Compose
  • JSON API: Programmatic access to series data
  • Rate Limiting: Intelligent scraping to avoid provider restrictions

Quick Start

Default Credentials

  • Username: admin
  • Password: admin

⚠️ Change the default password immediately after first login

Requirements

  • Go 1.21+ (for local development)
  • Docker & Docker Compose (recommended)
  • Internet access for scraping
  • YAML configuration file

Configuration

YAML schema with optional application settings:

# Application Settings (optional - defaults shown)
settings:
  auto_refresh_interval: 6  # Hours between automatic data refreshes (default: 6)
  default_workers: 4        # Number of concurrent scraper workers (default: 4)  
  server_port: 8080         # Port for the web server (default: 8080)
  cache_timeout: 6          # Cache timeout in hours (default: 6)
  log_level: "info"         # Logging level: debug, info, warn, error (default: info)
  main_view: "unified"      # Default view mode: unified, tabbed (default: unified)

# Audiobook/Ebook Series Configuration
audiobooks:
  - title: "1% Lifesteal"
    audible: "https://www.audible.com/series/1-Lifesteal-Audiobooks/B0F8QMLV9T"
    amazon: "https://www.amazon.com/dp/B0DGWCJ6JP"
  - title: "A Soldier's Life"
    audible: "https://www.audible.com/series/A-Soldiers-Life-Audiobooks/B0D34549LX"
    amazon: "https://www.amazon.com/dp/B0CW18NDBQ"

Required fields: Only title, audible, and amazon are required for scraping.

Settings section: All settings are optional and will use sensible defaults if not specified. The settings section allows you to customize application behavior without modifying code.

Environment Variables (Docker Compose)

For containerized deployments, you can override any setting using environment variables. Environment variables take precedence over YAML configuration:

environment:
  # Server Configuration
  SYLLABUS_SERVER_PORT: "8080"         # Server port (1-65535)
  PORT: "8080"                         # Standard port env var (alternative)
  
  # Scraping Configuration
  SYLLABUS_AUTO_REFRESH_INTERVAL: "4"  # Hours between auto-refreshes (>0)
  SYLLABUS_DEFAULT_WORKERS: "2"        # Number of concurrent scraper workers (>0)
  SYLLABUS_CACHE_TIMEOUT: "6"          # Cache timeout in hours (>0)
  
  # UI Configuration  
  SYLLABUS_MAIN_VIEW: "unified"        # Default view mode: "unified" or "tabbed"
  
  # Logging Configuration
  SYLLABUS_LOG_LEVEL: "debug"          # Log level: "debug", "info", "warn", "error"

Configuration Priority:

  1. Runtime UI changes (highest priority) - persisted to database
  2. Environment Variables - override YAML at startup
  3. YAML Configuration - file-based defaults
  4. Built-in Defaults (lowest priority)

Database Persistence: UI changes (like auto-refresh interval) are saved to the database and survive container restarts. The UI will always show the current server state.

Installation & Deployment

Docker Compose (Recommended)

git clone https://github.com/michaeldvinci/syllabus.git
cd syllabus

# Basic deployment
docker compose up -d

# View logs
docker compose logs -f

Customization with Environment Variables:

# Copy example override file
cp docker-compose.override.example.yaml docker-compose.override.yaml

# Edit settings
nano docker-compose.override.yaml

# Deploy with custom settings
docker compose up -d

The override file allows you to customize settings without modifying the main docker-compose.yaml file.

Local Development

git clone https://github.com/michaeldvinci/syllabus.git
cd syllabus

# Run directly
go run cmd/syllabus/main.go config/books.yaml

# Or build first
go build -o syllabus cmd/syllabus/main.go
./syllabus config/books.yaml

Access the Application

Data Sources & Scraping

Audible Scraping

  • Series Count: Number of productlistitem occurrences in series page HTML
  • Latest Release: Most recent Release date: MM-DD-YY from series page
  • Next Release: Extracted from "Coming Soon" or pre-order sections

Amazon Scraping

  • Series Count: Parsed from collection-size element as (N book series)
  • Next Release: Date from a-color-success a-text-bold span elements
  • Series Detection: Automatic ASIN extraction from Amazon URLs

Background Processing

  • Multi-threaded: 4 concurrent workers for faster scraping
  • Job Queue: Persistent SQLite-based job management
  • Rate Limiting: Intelligent delays to respect provider limits
  • Error Handling: Automatic retry logic with exponential backoff

API Reference

Authentication Required

All API endpoints require authentication via session cookie (login at /login).

GET /api/series

Returns an array of series objects with the following fields:

{
  "Title": "Series Name",
  "AudibleCount": 5,
  "AudibleLatestTitle": "Book Title",
  "AudibleLatestDate": "2024-01-15T00:00:00Z",
  "AudibleNextTitle": "Next Book Title", 
  "AudibleNextDate": "2024-03-20T00:00:00Z",
  "AmazonCount": 5,
  "AmazonLatestTitle": "Book Title",
  "AmazonLatestDate": "2024-01-15T00:00:00Z",
  "AmazonNextTitle": "Next Book Title",
  "AmazonNextDate": "2024-03-20T00:00:00Z",
  "AudibleID": "B0EXAMPLE",
  "AmazonASIN": "B08EXAMPLE",
  "Err": null
}

POST /refresh

Triggers a manual refresh of all series data.

GET /calendar.ics

Returns iCal calendar file with all upcoming release dates.

POST /api/auto-refresh

Updates automatic refresh interval (Admin only).

{"interval": 6}

All dates are returned in ISO 8601 format.

Data Storage & Persistence

SQLite Database

  • Location: ./data/syllabus.db
  • Schema: Series, books, and job queue tables
  • Persistence: Survives application restarts
  • Migration: Automatic schema updates on startup

User Management

  • Storage: ./data/users.json
  • Encryption: bcrypt password hashing
  • Roles: Admin and User access levels
  • Default: Admin user created on first run

Configuration Watching

  • Auto-reload: YAML file changes trigger incremental updates
  • Smart Updates: Only new series are scraped, existing data preserved
  • Hot Refresh: No application restart required

Auto-Refresh System

Configurable Intervals

  • Options: 2, 4, 6, 8, 10 hours
  • Default: 6 hours
  • UI Control: Settings panel slider
  • Persistence: Interval survives restarts

Refresh Behavior

  • Incremental: Only updates stale data
  • Background: Non-blocking operation
  • Progress: Real-time updates via Server-Sent Events
  • Manual Override: Refresh button forces immediate update

Troubleshooting

Log Locations

  • Docker: docker compose logs syllabus
  • Local: Console output
  • Scraper: Detailed job progress in logs