Skip to content

Database Migration: Move from JSON Files to AWS EC2 on Ruby on Rails #245

@NelsonLee-Code

Description

@NelsonLee-Code

Database Migration: JSON Files → PostgreSQL (AWS RDS) + Ruby on Rails GraphQL API + StatsCan Integration

Overview

Migrate CanadaSpends from file-based JSON data storage to a professional full-stack architecture:

Infrastructure:

  • Database: PostgreSQL 15 on AWS RDS (Multi-AZ)
  • Backend API: Ruby on Rails 7.1 with GraphQL
  • Frontend: Next.js 15 (existing, updated to consume GraphQL)
  • Compute: AWS EC2 with Auto Scaling
  • Background Jobs: Sidekiq + Redis (ElastiCache)
  • Load Balancing: Application Load Balancer (ALB)
  • Emails: AWS SES for notifications

New Capabilities:

  • ✅ Real-time data updates without rebuilding the site
  • ✅ GraphQL API for flexible data querying
  • Statistics Canada (StatsCan) integration for population, GDP, census data
  • Email notification system when data is updated/corrected
  • ✅ Admin dashboard for data management
  • ✅ Automated StatsCan data syncing (Sidekiq cron jobs)
  • ✅ Better scalability and data integrity
  • ✅ Professional production-grade architecture

Current Architecture Analysis

Data Storage

  • Format: JSON/CSV files in /data directory
  • Structure: Hierarchical (Province → Year → Department → Programs → Line Items)
  • Size: ~28 departments per province, multiple years, deeply nested spending data
  • Flow: Build-time file reading (fs.readFileSync()) → Static Site Generation (SSG)

Current Data Entities

  1. Jurisdictions (provinces/municipalities)
  2. Financial Years (year-specific totals and metadata)
  3. Departments/Ministries (per jurisdiction-year)
  4. Spending Items (hierarchical tree, 3-5 levels deep)
  5. Revenue Items (hierarchical breakdown)
  6. Editorial Content (markdown descriptions, methodology, credits)

Current Tech Stack

  • Next.js 15 (App Router) + React 19 + TypeScript
  • Data visualization: D3.js (Sankey) + Recharts
  • No current database or API layer
  • Server Components with file system reads

Limitations

  • ❌ Data updates require full site rebuild
  • ❌ No ability to query or filter data dynamically
  • ❌ No population/economic context (per-capita calculations return null)
  • ❌ Manual data entry process
  • ❌ No notifications when data changes
  • ❌ Doesn't scale with data growth

Architecture Decision: AWS RDS + Ruby on Rails + GraphQL

Why This Stack?

PostgreSQL on AWS RDS

  • ✅ Fully managed, production-grade PostgreSQL 15
  • ✅ Automatic backups, point-in-time recovery (PITR)
  • ✅ Multi-AZ deployment for 99.95% uptime SLA
  • ✅ Scalable (start db.t3.medium, scale to db.m5.large+)
  • ✅ VPC isolation for security
  • ✅ No vendor lock-in (standard Postgres, can export/migrate)
  • ✅ Read replicas for heavy analytics
  • ✅ Excellent for complex queries (recursive CTEs, window functions, JSONB)

Ruby on Rails 7.1

  • ✅ Mature, battle-tested web framework (18+ years)
  • ✅ Convention over configuration → rapid development
  • ✅ ActiveRecord ORM → elegant database queries
  • ✅ Built-in authentication (Devise gem)
  • ✅ Background jobs (ActiveJob + Sidekiq)
  • ✅ Large ecosystem of gems
  • ✅ Strong MVC patterns
  • ✅ Easy deployment (Capistrano, Docker)

GraphQL API (graphql-ruby gem)

  • ✅ Flexible querying (clients request exactly what they need)
  • Perfect for hierarchical spending data (nested queries)
  • ✅ Single endpoint (vs. dozens of REST routes)
  • ✅ Type-safe schema with introspection
  • ✅ Excellent Next.js integration (Apollo Client, urql)
  • ✅ Reduces over-fetching and under-fetching
  • ✅ Built-in query batching and caching

AWS EC2 + Auto Scaling

  • ✅ Full control over application server
  • ✅ Auto Scaling Groups scale with traffic
  • ✅ Load Balancer distributes traffic
  • ✅ Standard deployment target
  • ✅ Can run Rails + Sidekiq workers

Sidekiq (Background Jobs)

  • ✅ Efficient multi-threaded job processing
  • Perfect for StatsCan API syncing (scheduled jobs)
  • Email notifications via ActiveMailer
  • ✅ Retry logic, job monitoring
  • ✅ Web UI for job inspection

AWS SES (Email)

  • ✅ Transactional email service
  • ✅ High deliverability rates
  • ✅ Pay-as-you-go ($0.10 per 1,000 emails)
  • ✅ SMTP integration with ActionMailer

PostgreSQL Advantages

  • JSONB columns: Store nested spending trees efficiently
  • Recursive CTEs: Query hierarchical data elegantly
  • Materialized views: Cache complex aggregations (Sankey data)
  • Full-text search: Search spending items (future feature)
  • Triggers & functions: Auto-update totals, send notifications
  • Window functions: Advanced analytics (year-over-year growth, rankings)
  • PostGIS extension: Geographic analysis (future)

AWS Infrastructure Architecture (Simplified)

High-Level Diagram

┌──────────────────────────────────────────────────────────┐
│                       Internet                           │
└────────────────────┬─────────────────────────────────────┘
                     │
        ┌────────────┴────────────┐
        │                         │
        ▼                         ▼
┌───────────────────┐   ┌─────────────────────────┐
│   Next.js App     │   │   Single EC2 Instance   │
│   (Vercel/CDN)    │   │   (your-domain.com)     │
│                   │   │                         │
│   - SSR/SSG       │   │   Rails 7 + GraphQL     │
│   - Apollo Client │───┤   + Sidekiq (same box)  │
│                   │   │   + Nginx/Puma          │
└───────────────────┘   │   + Redis (local)       │
                        └──────────┬──────────────┘
                                   │
                    ┌──────────────┼──────────────┐
                    │              │              │
                    ▼              ▼              ▼
              ┌──────────┐   ┌─────────┐   ┌────────────┐
              │   RDS    │   │  Redis  │   │  StatsCan  │
              │ Postgres │   │ (local) │   │    API     │
              │          │   │   OR    │   │ (External) │
              │ Public   │   │MemoryS- │   └────────────┘
              │ Access   │   │  tore   │
              └──────────┘   └─────────┘

Infrastructure Components

Single EC2 Instance (All-in-One)

  • Instance Type: t3.medium (2 vCPU, 4GB RAM) or t3.large (2 vCPU, 8GB RAM)
  • OS: Ubuntu 22.04 LTS
  • Software Stack:
    • Ruby 3.2 + Rails 7.1
    • GraphQL API (graphql-ruby)
    • Nginx (reverse proxy, SSL termination with Let's Encrypt)
    • Puma (Rails app server)
    • Sidekiq (background jobs)
    • Redis (local install or in-memory)
  • Public IP: Elastic IP for stable DNS pointing
  • Storage: 50GB SSD (expand as needed)
  • Deployment: Git-based (Capistrano) or Docker

RDS PostgreSQL (Public Access)

  • Instance Type: db.t3.small (2 vCPU, 2GB RAM) → start, scale to db.t3.medium if needed
  • Storage: 50GB GP3 SSD, auto-scaling to 200GB
  • Multi-AZ: Optional (adds cost, but provides failover)
  • Backup: Automated daily snapshots, 7-day retention
  • Version: PostgreSQL 15.x
  • Public Access: YES - accessible from internet (read-only users)
  • Encryption: At rest (default AWS encryption)
  • Security:
    • PostgreSQL firewall rules (allow specific IPs or 0.0.0.0/0 with strong passwords)
    • Separate read-only user for public queries
    • Admin user only for EC2 instance

Redis (Local or AWS)

Option A: Local Redis on EC2 (Simplest, $0)

  • Install Redis via apt
  • Use for Rails cache + Sidekiq job queue
  • Persists to disk

Option B: AWS MemoryStore (Managed) (~$15/month)

  • cache.t4g.micro (0.5GB)
  • If you want managed Redis

Recommendation: Start with local Redis on EC2

Email Service (NOT AWS SES)

Options:

  • Resend (resend.com) - 3,000 emails/month free
  • Postmark - 100 emails/month free, then $1.50/1K
  • SendGrid - 100 emails/day free
  • Mailgun - 5,000 emails/month free for 3 months
  • Self-hosted SMTP (Gmail, etc.) - Use with caution

Recommendation: Resend or Postmark for transactional emails

DNS (Managed Externally)

  • You manage DNS elsewhere (Cloudflare, Namecheap, etc.)
  • Point api.canadaspends.ca → EC2 Elastic IP
  • SSL via Let's Encrypt (free, auto-renewing with Certbot)

CloudWatch (Optional)

  • Basic EC2 monitoring included free
  • Optional: Detailed monitoring, custom metrics (~$5/month)

Security Configuration (Simplified)

EC2 Security Group:

Inbound:
  - HTTP (80) from 0.0.0.0/0      # Redirect to HTTPS
  - HTTPS (443) from 0.0.0.0/0    # GraphQL API
  - SSH (22) from YOUR_IP         # Admin access only

Outbound:
  - All traffic (0.0.0.0/0)       # For StatsCan API, package updates, etc.

RDS Security Group:

Inbound:
  - PostgreSQL (5432) from 0.0.0.0/0   # Public read access
  OR
  - PostgreSQL (5432) from EC2 security group only (if you don't want fully public)

Outbound:
  - None required

Database Users:

-- Admin user (EC2 only, full access)
CREATE USER admin WITH PASSWORD 'strong_password' SUPERUSER;

-- Public read-only user (for external queries)
CREATE USER public_reader WITH PASSWORD 'public_password';
GRANT CONNECT ON DATABASE canadaspends TO public_reader;
GRANT USAGE ON SCHEMA public TO public_reader;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO public_reader;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO public_reader;

IAM Roles:

  • EC2 Role: CloudWatch logs (optional), S3 for backups (future)

Estimated AWS Costs (Monthly, USD)

Service Configuration Monthly Cost
EC2 (Single instance) t3.medium (2 vCPU, 4GB RAM) ~$30
RDS PostgreSQL db.t3.small, 50GB, Single-AZ ~$25
Elastic IP 1 address (free when attached) $0
Data Transfer ~20GB egress (modest usage) ~$2
Snapshots/Backups ~10GB RDS backups ~$1
Email (Resend/Postmark) Free tier or ~$1-5 ~$2
CloudWatch (optional) Basic monitoring $0-$2
Total ~$60/month

Optional Add-ons:

  • RDS Multi-AZ: +$25/month (high availability failover)
  • Larger instances: t3.large EC2 (+$30), db.t3.medium RDS (+$25)
  • MemoryStore Redis: cache.t4g.micro (+$15/month)

Cost Optimization:

  • Use Reserved Instances for EC2 & RDS → 40% savings ($36/month)
  • Start small: Single-AZ RDS, t3.small/medium instances
  • Scale up later: Easy to upgrade instance types as traffic grows

Database Schema

Core CanadaSpends Tables

jurisdictions

CREATE TABLE jurisdictions (
  id               UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  slug             VARCHAR(100) UNIQUE NOT NULL,  -- 'ontario', 'toronto'
  name             VARCHAR(200) NOT NULL,          -- 'Ontario', 'Toronto'
  type             VARCHAR(20) NOT NULL,           -- 'provincial', 'municipal'
  province         VARCHAR(100),                   -- For municipalities
  created_at       TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at       TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_jurisdictions_slug ON jurisdictions(slug);
CREATE INDEX idx_jurisdictions_type ON jurisdictions(type);

financial_years

CREATE TABLE financial_years (
  id                    UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  jurisdiction_id       UUID NOT NULL REFERENCES jurisdictions(id) ON DELETE CASCADE,
  year                  VARCHAR(20) NOT NULL,  -- '2023-24'
  total_spending        DECIMAL(15,2),         -- In billions
  total_revenue         DECIMAL(15,2),
  total_employees       INTEGER,
  net_debt              DECIMAL(15,2),
  total_debt            DECIMAL(15,2),
  debt_interest         DECIMAL(15,2),
  source_url            TEXT,
  methodology           TEXT,                  -- Markdown
  credits               TEXT,                  -- Markdown
  generated_at          TIMESTAMPTZ,
  created_at            TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at            TIMESTAMPTZ NOT NULL DEFAULT NOW(),

  UNIQUE(jurisdiction_id, year)
);

CREATE INDEX idx_financial_years_jurisdiction ON financial_years(jurisdiction_id);
CREATE INDEX idx_financial_years_year ON financial_years(year);

sources

CREATE TABLE sources (
  id                UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  financial_year_id UUID NOT NULL REFERENCES financial_years(id) ON DELETE CASCADE,
  label             VARCHAR(500) NOT NULL,
  url               TEXT NOT NULL,
  scope             VARCHAR(200),
  sort_order        INTEGER,
  created_at        TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at        TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_sources_financial_year ON sources(financial_year_id);

departments

CREATE TABLE departments (
  id                UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  financial_year_id UUID NOT NULL REFERENCES financial_years(id) ON DELETE CASCADE,
  slug              VARCHAR(200) NOT NULL,
  name              VARCHAR(500) NOT NULL,
  total_spending    DECIMAL(15,2),
  percentage        DECIMAL(5,2),
  created_at        TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at        TIMESTAMPTZ NOT NULL DEFAULT NOW(),

  UNIQUE(financial_year_id, slug)
);

CREATE INDEX idx_departments_year ON departments(financial_year_id);
CREATE INDEX idx_departments_slug ON departments(slug);

spending_items (Adjacency List for Hierarchy)

CREATE TABLE spending_items (
  id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  department_id UUID NOT NULL REFERENCES departments(id) ON DELETE CASCADE,
  parent_id     UUID REFERENCES spending_items(id) ON DELETE CASCADE,  -- NULL for root
  name          TEXT NOT NULL,
  amount        DECIMAL(15,2),
  level         INTEGER NOT NULL,  -- 0=root, 1=program, 2=sub-program, etc.
  sort_order    INTEGER,
  path          TEXT[],            -- Materialized path: ['health', 'hospitals', 'capital']
  created_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_spending_items_department ON spending_items(department_id);
CREATE INDEX idx_spending_items_parent ON spending_items(parent_id);
CREATE INDEX idx_spending_items_path ON spending_items USING GIN(path);

revenue_items

CREATE TABLE revenue_items (
  id                UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  financial_year_id UUID NOT NULL REFERENCES financial_years(id) ON DELETE CASCADE,
  parent_id         UUID REFERENCES revenue_items(id) ON DELETE CASCADE,
  name              TEXT NOT NULL,
  amount            DECIMAL(15,2),
  level             INTEGER NOT NULL,
  sort_order        INTEGER,
  path              TEXT[],
  created_at        TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at        TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_revenue_items_year ON revenue_items(financial_year_id);
CREATE INDEX idx_revenue_items_parent ON revenue_items(parent_id);
CREATE INDEX idx_revenue_items_path ON revenue_items USING GIN(path);

department_content (Editorial CMS)

CREATE TABLE department_content (
  id                          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  department_id               UUID UNIQUE NOT NULL REFERENCES departments(id) ON DELETE CASCADE,
  intro_text                  TEXT,
  description_text            TEXT,
  role_text                   TEXT,
  programs_heading            VARCHAR(200),
  programs_description        TEXT,
  leadership_heading          VARCHAR(200),
  leadership_description      TEXT,
  priorities_heading          VARCHAR(200),
  priorities_description      TEXT,
  agencies_heading            VARCHAR(200),
  agencies_description        TEXT,
  budget_projections_text     TEXT,
  created_at                  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at                  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_department_content_dept ON department_content(department_id);

Statistics Canada Integration Tables

statscan_geography

CREATE TABLE statscan_geography (
  id                      SERIAL PRIMARY KEY,
  sgc_code                VARCHAR(7) UNIQUE NOT NULL,  -- Standard Geographical Classification
  geo_level               VARCHAR(20) NOT NULL,        -- 'canada', 'province', 'division', 'subdivision'
  parent_sgc_code         VARCHAR(7),
  official_name           VARCHAR(200) NOT NULL,
  name_en                 VARCHAR(200),
  name_fr                 VARCHAR(200),
  geo_type                VARCHAR(50),                 -- 'City', 'Town', 'Municipality'
  province_code           VARCHAR(2),
  census_division_code    VARCHAR(4),
  latitude                DECIMAL(10,7),
  longitude               DECIMAL(10,7),
  vintage                 INTEGER,                     -- Census year (2021, 2016)
  created_at              TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at              TIMESTAMPTZ NOT NULL DEFAULT NOW(),

  FOREIGN KEY (parent_sgc_code) REFERENCES statscan_geography(sgc_code)
);

CREATE INDEX idx_statscan_geo_province ON statscan_geography(province_code);
CREATE INDEX idx_statscan_geo_level ON statscan_geography(geo_level);
CREATE INDEX idx_statscan_geo_parent ON statscan_geography(parent_sgc_code);

statscan_population

CREATE TABLE statscan_population (
  id                    SERIAL PRIMARY KEY,
  sgc_code              VARCHAR(7) NOT NULL,
  ref_date              DATE NOT NULL,            -- July 1 for annual, quarter end for quarterly
  frequency             VARCHAR(10) NOT NULL,     -- 'annual', 'quarterly'
  population            BIGINT NOT NULL,
  age_group             VARCHAR(20),              -- NULL for total, '0-4', '5-9', etc.
  gender                VARCHAR(10),              -- NULL for total, 'Male', 'Female'
  vector_id             VARCHAR(15),              -- StatsCan vector ID
  statscan_update_date  TIMESTAMPTZ,
  created_at            TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at            TIMESTAMPTZ NOT NULL DEFAULT NOW(),

  FOREIGN KEY (sgc_code) REFERENCES statscan_geography(sgc_code) ON DELETE CASCADE,
  UNIQUE(sgc_code, ref_date, frequency, COALESCE(age_group, ''), COALESCE(gender, ''))
);

CREATE INDEX idx_statscan_pop_geo_date ON statscan_population(sgc_code, ref_date);
CREATE INDEX idx_statscan_pop_vector ON statscan_population(vector_id);

statscan_gdp

CREATE TABLE statscan_gdp (
  id                    SERIAL PRIMARY KEY,
  sgc_code              VARCHAR(7) NOT NULL,      -- Province/territory only
  year                  INTEGER NOT NULL,
  price_basis           VARCHAR(30) NOT NULL,     -- 'current', 'constant_2017', 'chained_2017'
  indicator             VARCHAR(100) NOT NULL,    -- 'GDP at market prices', 'Final consumption'
  value_millions        DECIMAL(15,2),            -- Always in millions
  vector_id             VARCHAR(15),
  statscan_update_date  TIMESTAMPTZ,
  created_at            TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at            TIMESTAMPTZ NOT NULL DEFAULT NOW(),

  FOREIGN KEY (sgc_code) REFERENCES statscan_geography(sgc_code) ON DELETE CASCADE,
  UNIQUE(sgc_code, year, price_basis, indicator)
);

CREATE INDEX idx_statscan_gdp_geo_year ON statscan_gdp(sgc_code, year);
CREATE INDEX idx_statscan_gdp_indicator ON statscan_gdp(indicator);

statscan_govt_spending

CREATE TABLE statscan_govt_spending (
  id                    SERIAL PRIMARY KEY,
  sgc_code              VARCHAR(7) NOT NULL,
  year                  INTEGER NOT NULL,
  govt_level            VARCHAR(30),              -- 'Provincial', 'Consolidated PTLG', 'Local'
  ccofog_code           VARCHAR(20),              -- Canadian Classification of Functions of Government
  function_name         VARCHAR(200),             -- 'Health', 'Education', 'Social protection'
  amount_millions       DECIMAL(15,2),
  per_capita            DECIMAL(10,2),            -- Pre-calculated by StatsCan
  vector_id             VARCHAR(15),
  statscan_update_date  TIMESTAMPTZ,
  created_at            TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at            TIMESTAMPTZ NOT NULL DEFAULT NOW(),

  FOREIGN KEY (sgc_code) REFERENCES statscan_geography(sgc_code) ON DELETE CASCADE,
  UNIQUE(sgc_code, year, govt_level, ccofog_code)
);

CREATE INDEX idx_statscan_govt_spend_geo ON statscan_govt_spending(sgc_code, year);
CREATE INDEX idx_statscan_govt_spend_function ON statscan_govt_spending(function_name);

statscan_census_demographics

CREATE TABLE statscan_census_demographics (
  id              SERIAL PRIMARY KEY,
  sgc_code        VARCHAR(7) NOT NULL,
  census_year     INTEGER NOT NULL,             -- 2021, 2016, 2011
  indicator       VARCHAR(100) NOT NULL,        -- 'Median household income', 'Employment rate'
  category        VARCHAR(100),                 -- Additional breakdown
  value           DECIMAL(15,2),
  value_text      VARCHAR(500),                 -- For non-numeric values
  created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),

  FOREIGN KEY (sgc_code) REFERENCES statscan_geography(sgc_code) ON DELETE CASCADE,
  UNIQUE(sgc_code, census_year, indicator, COALESCE(category, ''))
);

CREATE INDEX idx_statscan_census_geo_year ON statscan_census_demographics(sgc_code, census_year);
CREATE INDEX idx_statscan_census_indicator ON statscan_census_demographics(indicator);

statscan_vectors

CREATE TABLE statscan_vectors (
  vector_id           VARCHAR(15) PRIMARY KEY,
  product_id          VARCHAR(10) NOT NULL,
  coordinate          VARCHAR(500),
  frequency           VARCHAR(20),
  scalar_factor_code  VARCHAR(2),
  scalar_description  VARCHAR(50),
  uom                 VARCHAR(100),             -- Unit of measure
  description         TEXT,
  last_update_date    TIMESTAMPTZ,
  created_at          TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at          TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_statscan_vector_product ON statscan_vectors(product_id);

statscan_products

CREATE TABLE statscan_products (
  product_id        VARCHAR(10) PRIMARY KEY,
  title_en          TEXT,
  title_fr          TEXT,
  subject_code      VARCHAR(4),
  frequency         VARCHAR(20),
  start_date        DATE,
  end_date          DATE,
  url               VARCHAR(500),
  last_update_date  TIMESTAMPTZ,
  created_at        TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at        TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

jurisdiction_statscan_mapping

CREATE TABLE jurisdiction_statscan_mapping (
  id                  SERIAL PRIMARY KEY,
  jurisdiction_slug   VARCHAR(100) UNIQUE NOT NULL,
  sgc_code            VARCHAR(7) NOT NULL,
  jurisdiction_type   VARCHAR(20),              -- 'provincial', 'municipal', 'federal'
  province_code       VARCHAR(2),
  confidence          VARCHAR(20),              -- 'exact', 'fuzzy', 'manual'
  notes               TEXT,
  created_at          TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at          TIMESTAMPTZ NOT NULL DEFAULT NOW(),

  FOREIGN KEY (sgc_code) REFERENCES statscan_geography(sgc_code) ON DELETE RESTRICT
);

CREATE INDEX idx_jurisdiction_mapping_slug ON jurisdiction_statscan_mapping(jurisdiction_slug);

-- Example mappings
INSERT INTO jurisdiction_statscan_mapping (jurisdiction_slug, sgc_code, jurisdiction_type, province_code, confidence) VALUES
  ('ontario', '35', 'provincial', '35', 'exact'),
  ('toronto', '3520001', 'municipal', '35', 'exact'),
  ('vancouver', '5915022', 'municipal', '59', 'exact'),
  ('british-columbia', '59', 'provincial', '59', 'exact'),
  ('alberta', '48', 'provincial', '48', 'exact'),
  ('edmonton', '4811061', 'municipal', '48', 'exact');

Notification System Tables

users (Devise Authentication)

CREATE TABLE users (
  id                       UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email                    VARCHAR(255) UNIQUE NOT NULL,
  encrypted_password       VARCHAR(255) NOT NULL,
  reset_password_token     VARCHAR(255) UNIQUE,
  reset_password_sent_at   TIMESTAMPTZ,
  remember_created_at      TIMESTAMPTZ,
  sign_in_count            INTEGER DEFAULT 0,
  current_sign_in_at       TIMESTAMPTZ,
  last_sign_in_at          TIMESTAMPTZ,
  current_sign_in_ip       INET,
  last_sign_in_ip          INET,
  role                     VARCHAR(50) DEFAULT 'viewer',  -- 'admin', 'editor', 'viewer'
  created_at               TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at               TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_reset_token ON users(reset_password_token);

notification_subscriptions

CREATE TABLE notification_subscriptions (
  id                  SERIAL PRIMARY KEY,
  user_id             UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  entity_type         VARCHAR(50) NOT NULL,     -- 'jurisdiction', 'department', 'all'
  entity_id           UUID,                     -- NULL for 'all'
  event_types         VARCHAR(50)[] NOT NULL,   -- ['data_updated', 'data_corrected']
  notification_method VARCHAR(20) DEFAULT 'email',  -- 'email', 'webhook'
  webhook_url         VARCHAR(500),
  is_active           BOOLEAN DEFAULT true,
  created_at          TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at          TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_notif_subs_user ON notification_subscriptions(user_id);
CREATE INDEX idx_notif_subs_entity ON notification_subscriptions(entity_type, entity_id);
CREATE INDEX idx_notif_subs_active ON notification_subscriptions(is_active) WHERE is_active = true;

notification_logs

CREATE TABLE notification_logs (
  id                SERIAL PRIMARY KEY,
  subscription_id   INTEGER NOT NULL REFERENCES notification_subscriptions(id) ON DELETE CASCADE,
  entity_type       VARCHAR(50) NOT NULL,
  entity_id         UUID,
  event_type        VARCHAR(50) NOT NULL,
  change_summary    JSONB,                    -- Details of what changed
  sent_at           TIMESTAMPTZ,
  status            VARCHAR(20),              -- 'sent', 'failed', 'pending'
  error_message     TEXT,
  created_at        TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at        TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_notif_logs_subscription ON notification_logs(subscription_id);
CREATE INDEX idx_notif_logs_created ON notification_logs(created_at DESC);
CREATE INDEX idx_notif_logs_status ON notification_logs(status);

data_change_audit

CREATE TABLE data_change_audit (
  id              SERIAL PRIMARY KEY,
  table_name      VARCHAR(100) NOT NULL,
  record_id       UUID NOT NULL,
  action          VARCHAR(20) NOT NULL,     -- 'insert', 'update', 'delete'
  changed_by      UUID REFERENCES users(id),  -- NULL for automated/system
  old_values      JSONB,
  new_values      JSONB,
  change_reason   TEXT,
  created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_audit_table_record ON data_change_audit(table_name, record_id);
CREATE INDEX idx_audit_created ON data_change_audit(created_at DESC);
CREATE INDEX idx_audit_user ON data_change_audit(changed_by);

data_sync_jobs

CREATE TABLE data_sync_jobs (
  id                    SERIAL PRIMARY KEY,
  sync_type             VARCHAR(50) NOT NULL,    -- 'population', 'gdp', 'census', 'govt_spending'
  product_id            VARCHAR(10),
  start_time            TIMESTAMPTZ NOT NULL,
  end_time              TIMESTAMPTZ,
  records_fetched       INTEGER,
  records_inserted      INTEGER,
  records_updated       INTEGER,
  errors                INTEGER,
  error_details         TEXT,
  status                VARCHAR(20),             -- 'running', 'success', 'failed', 'partial'
  statscan_release_date DATE,
  created_at            TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at            TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_sync_jobs_type_time ON data_sync_jobs(sync_type, start_time DESC);
CREATE INDEX idx_sync_jobs_status ON data_sync_jobs(status);

Database Views & Materialized Views

v_jurisdiction_enriched (Real-time View)

CREATE VIEW v_jurisdiction_enriched AS
SELECT
  j.id,
  j.slug,
  j.name,
  j.type,
  fy.id AS financial_year_id,
  fy.year,
  fy.total_spending AS spending_billions,
  fy.total_revenue AS revenue_billions,
  g.sgc_code,
  g.province_code,
  g.geo_level,
  p.population,
  p.ref_date AS population_date,
  -- Per capita spending
  CASE
    WHEN p.population > 0 THEN (fy.total_spending * 1000000000) / p.population
    ELSE NULL
  END AS spending_per_capita,
  -- GDP data
  gdp.value_millions AS gdp_millions,
  -- Spending as % of GDP
  CASE
    WHEN gdp.value_millions > 0 THEN (fy.total_spending * 1000) / gdp.value_millions * 100
    ELSE NULL
  END AS spending_pct_gdp
FROM jurisdictions j
LEFT JOIN financial_years fy ON j.id = fy.jurisdiction_id
LEFT JOIN jurisdiction_statscan_mapping m ON j.slug = m.jurisdiction_slug
LEFT JOIN statscan_geography g ON m.sgc_code = g.sgc_code
-- Get most recent annual population for this jurisdiction/year
LEFT JOIN LATERAL (
  SELECT population, ref_date
  FROM statscan_population
  WHERE sgc_code = g.sgc_code
    AND frequency = 'annual'
    AND age_group IS NULL
    AND gender IS NULL
  ORDER BY ABS(EXTRACT(YEAR FROM ref_date) - CAST(SUBSTRING(fy.year, 1, 4) AS INTEGER))
  LIMIT 1
) p ON true
-- Get GDP for matching year
LEFT JOIN LATERAL (
  SELECT value_millions
  FROM statscan_gdp
  WHERE sgc_code = g.sgc_code
    AND price_basis = 'current'
    AND indicator = 'GDP at market prices'
  ORDER BY ABS(year - CAST(SUBSTRING(fy.year, 1, 4) AS INTEGER))
  LIMIT 1
) gdp ON true;

mv_sankey_data (Materialized View - Cached)

CREATE MATERIALIZED VIEW mv_sankey_data AS
SELECT
  fy.id AS financial_year_id,
  j.slug AS jurisdiction_slug,
  jsonb_build_object(
    'total', fy.total_spending,
    'spending', fy.total_spending,
    'revenue', fy.total_revenue,
    'spending_data', (
      SELECT jsonb_build_object(
        'name', 'Spending',
        'children', COALESCE(jsonb_agg(dept_tree.tree ORDER BY dept_tree.sort_order), '[]'::jsonb)
      )
      FROM (
        SELECT
          d.total_spending * 100.0 / fy.total_spending AS sort_order,
          jsonb_build_object(
            'name', d.name,
            'amount', d.total_spending,
            'children', (
              SELECT COALESCE(jsonb_agg(
                jsonb_build_object(
                  'name', si.name,
                  'amount', si.amount
                ) ORDER BY si.sort_order
              ), '[]'::jsonb)
              FROM spending_items si
              WHERE si.department_id = d.id AND si.parent_id IS NULL
            )
          ) AS tree
        FROM departments d
        WHERE d.financial_year_id = fy.id
      ) dept_tree
    ),
    'revenue_data', (
      SELECT jsonb_build_object(
        'name', 'Revenue',
        'children', COALESCE(jsonb_agg(
          jsonb_build_object(
            'name', ri.name,
            'amount', ri.amount
          ) ORDER BY ri.sort_order
        ), '[]'::jsonb)
      )
      FROM revenue_items ri
      WHERE ri.financial_year_id = fy.id AND ri.parent_id IS NULL
    )
  ) AS sankey_json
FROM financial_years fy
JOIN jurisdictions j ON fy.jurisdiction_id = j.id;

CREATE UNIQUE INDEX idx_mv_sankey_fy ON mv_sankey_data(financial_year_id);
CREATE INDEX idx_mv_sankey_slug ON mv_sankey_data(jurisdiction_slug);

-- Refresh command (run after data changes):
-- REFRESH MATERIALIZED VIEW CONCURRENTLY mv_sankey_data;

Database Triggers & Functions

Auto-update Department Totals

CREATE OR REPLACE FUNCTION update_department_total()
RETURNS TRIGGER AS $$
BEGIN
  UPDATE departments
  SET
    total_spending = (
      SELECT COALESCE(SUM(amount), 0)
      FROM spending_items
      WHERE department_id = COALESCE(NEW.department_id, OLD.department_id)
        AND parent_id IS NULL
    ),
    updated_at = NOW()
  WHERE id = COALESCE(NEW.department_id, OLD.department_id);

  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_update_dept_total
AFTER INSERT OR UPDATE OR DELETE ON spending_items
FOR EACH ROW EXECUTE FUNCTION update_department_total();

Audit Trail Trigger

CREATE OR REPLACE FUNCTION log_data_change()
RETURNS TRIGGER AS $$
BEGIN
  IF TG_OP = 'UPDATE' THEN
    INSERT INTO data_change_audit (table_name, record_id, action, old_values, new_values, created_at)
    VALUES (
      TG_TABLE_NAME,
      OLD.id,
      'update',
      to_jsonb(OLD),
      to_jsonb(NEW),
      NOW()
    );
  ELSIF TG_OP = 'DELETE' THEN
    INSERT INTO data_change_audit (table_name, record_id, action, old_values, created_at)
    VALUES (
      TG_TABLE_NAME,
      OLD.id,
      'delete',
      to_jsonb(OLD),
      NOW()
    );
  END IF;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Apply to critical tables
CREATE TRIGGER trg_audit_financial_years
AFTER UPDATE OR DELETE ON financial_years
FOR EACH ROW EXECUTE FUNCTION log_data_change();

CREATE TRIGGER trg_audit_departments
AFTER UPDATE OR DELETE ON departments
FOR EACH ROW EXECUTE FUNCTION log_data_change();

CREATE TRIGGER trg_audit_spending_items
AFTER UPDATE OR DELETE ON spending_items
FOR EACH ROW EXECUTE FUNCTION log_data_change();

Notification Trigger

CREATE OR REPLACE FUNCTION queue_notification()
RETURNS TRIGGER AS $$
BEGIN
  -- Notify Sidekiq via pg_notify
  PERFORM pg_notify(
    'data_changed',
    json_build_object(
      'table', TG_TABLE_NAME,
      'record_id', NEW.id,
      'action', TG_OP,
      'timestamp', NOW()
    )::text
  );
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_notify_financial_years
AFTER UPDATE ON financial_years
FOR EACH ROW EXECUTE FUNCTION queue_notification();

CREATE TRIGGER trg_notify_departments
AFTER UPDATE ON departments
FOR EACH ROW EXECUTE FUNCTION queue_notification();

Ruby on Rails Application Architecture

Rails Stack

Gemfile (key gems):

# Gemfile
source 'https://rubygems.org'
ruby '3.2.2'

# Core
gem 'rails', '~> 7.1.0'
gem 'pg', '~> 1.5'
gem 'puma', '~> 6.0'

# GraphQL
gem 'graphql', '~> 2.2'
gem 'graphql-batch'  # N+1 query optimization

# Authentication & Authorization
gem 'devise', '~> 4.9'
gem 'pundit', '~> 2.3'

# Background Jobs
gem 'sidekiq', '~> 7.2'
gem 'sidekiq-cron', '~> 1.12'  # Scheduled jobs

# Redis
gem 'redis', '~> 5.0'
gem 'hiredis', '~> 0.6'

# HTTP Client (for StatsCan API)
gem 'faraday', '~> 2.8'
gem 'faraday-retry'

# Performance
gem 'rack-cors'  # CORS for Next.js
gem 'bootsnap', require: false

# Development/Test
group :development, :test do
  gem 'rspec-rails'
  gem 'factory_bot_rails'
  gem 'faker'
  gem 'pry-rails'
end

Authentication (Devise)

User Model:

# app/models/user.rb
class User < ApplicationRecord
  devise :database_authenticatable, :registerable,
         :recoverable, :rememberable, :validatable

  enum role: { viewer: 0, editor: 1, admin: 2 }

  has_many :notification_subscriptions, dependent: :destroy

  def can_edit?
    editor? || admin?
  end

  def can_admin?
    admin?
  end
end

Authorization (Pundit)

# app/policies/financial_year_policy.rb
class FinancialYearPolicy < ApplicationPolicy
  def index?
    true  # Public read
  end

  def show?
    true  # Public read
  end

  def update?
    user&.can_edit?
  end

  def create?
    user&.can_admin?
  end

  def destroy?
    user&.can_admin?
  end
end

ActiveRecord Models

# app/models/jurisdiction.rb
class Jurisdiction < ApplicationRecord
  has_many :financial_years, dependent: :destroy
  has_one :statscan_mapping, class_name: 'JurisdictionStatscanMapping',
          primary_key: :slug, foreign_key: :jurisdiction_slug

  validates :slug, presence: true, uniqueness: true
  validates :name, presence: true
  validates :type, inclusion: { in: %w[provincial municipal] }

  def latest_year
    financial_years.order(year: :desc).first
  end

  def population
    statscan_mapping&.latest_population
  end

  def spending_per_capita
    return nil unless latest_year && population
    (latest_year.total_spending * 1_000_000_000) / population
  end
end

# app/models/financial_year.rb
class FinancialYear < ApplicationRecord
  belongs_to :jurisdiction
  has_many :departments, dependent: :destroy
  has_many :sources, dependent: :destroy
  has_many :revenue_items, dependent: :destroy

  validates :year, presence: true, uniqueness: { scope: :jurisdiction_id }

  after_update :refresh_sankey_cache, if: :saved_change_to_total_spending?
  after_update :send_notifications

  def sankey_data
    Rails.cache.fetch("sankey_#{id}", expires_in: 1.hour) do
      SankeyDataBuilder.new(self).build
    end
  end

  private

  def refresh_sankey_cache
    RefreshSankeyMaterializedViewJob.perform_async
  end

  def send_notifications
    DataChangeNotificationJob.perform_async('financial_year', id, 'updated')
  end
end

# app/models/department.rb
class Department < ApplicationRecord
  belongs_to :financial_year
  has_many :spending_items, dependent: :destroy
  has_one :content, class_name: 'DepartmentContent', dependent: :destroy

  validates :slug, presence: true, uniqueness: { scope: :financial_year_id }

  def root_spending_items
    spending_items.where(parent_id: nil).order(:sort_order)
  end
end

# app/models/spending_item.rb
class SpendingItem < ApplicationRecord
  belongs_to :department
  belongs_to :parent, class_name: 'SpendingItem', optional: true
  has_many :children, class_name: 'SpendingItem', foreign_key: :parent_id

  before_save :set_level_and_path

  private

  def set_level_and_path
    if parent
      self.level = parent.level + 1
      self.path = parent.path + [name.parameterize]
    else
      self.level = 0
      self.path = [name.parameterize]
    end
  end
end

StatsCan Models

# app/models/statscan/geography.rb
module Statscan
  class Geography < ApplicationRecord
    self.table_name = 'statscan_geography'

    has_many :population_records, class_name: 'Statscan::Population', foreign_key: :sgc_code, primary_key: :sgc_code
    has_many :gdp_records, class_name: 'Statscan::Gdp', foreign_key: :sgc_code, primary_key: :sgc_code

    def latest_population(year: Date.current.year)
      population_records.where(frequency: 'annual', age_group: nil, gender: nil)
                        .order(Arel.sql("ABS(EXTRACT(YEAR FROM ref_date) - #{year})"))
                        .first&.population
    end
  end
end

# app/models/statscan/population.rb
module Statscan
  class Population < ApplicationRecord
    self.table_name = 'statscan_population'

    belongs_to :geography, class_name: 'Statscan::Geography', foreign_key: :sgc_code, primary_key: :sgc_code
  end
end

GraphQL API Schema

Schema Definition

# app/graphql/types/query_type.rb
module Types
  class QueryType < Types::BaseObject
    # Jurisdictions
    field :jurisdictions, [Types::JurisdictionType], null: false,
          description: "List all jurisdictions (provinces and municipalities)"

    field :jurisdiction, Types::JurisdictionType, null: false do
      argument :slug, String, required: true
    end

    # Financial Years
    field :financial_year, Types::FinancialYearType, null: false do
      argument :jurisdiction_slug, String, required: true
      argument :year, String, required: false
      description "Get financial year data. If year is not provided, returns latest year."
    end

    # Departments
    field :department, Types::DepartmentType, null: false do
      argument :jurisdiction_slug, String, required: true
      argument :department_slug, String, required: true
      argument :year, String, required: false
    end

    # StatsCan Data
    field :population_data, [Types::PopulationDataType], null: false do
      argument :sgc_code, String, required: true
      argument :years, Integer, required: false, default_value: 5
    end

    # Resolvers
    def jurisdictions
      Jurisdiction.all.order(:name)
    end

    def jurisdiction(slug:)
      Jurisdiction.find_by!(slug: slug)
    end

    def financial_year(jurisdiction_slug:, year: nil)
      jurisdiction = Jurisdiction.find_by!(slug: jurisdiction_slug)
      query = jurisdiction.financial_years

      if year
        query.find_by!(year: year)
      else
        query.order(year: :desc).first!
      end
    end

    def department(jurisdiction_slug:, department_slug:, year: nil)
      fy = financial_year(jurisdiction_slug: jurisdiction_slug, year: year)
      fy.departments.find_by!(slug: department_slug)
    end

    def population_data(sgc_code:, years:)
      Statscan::Population.where(sgc_code: sgc_code, frequency: 'annual', age_group: nil, gender: nil)
                          .order(ref_date: :desc)
                          .limit(years)
    end
  end
end

GraphQL Types

# app/graphql/types/jurisdiction_type.rb
module Types
  class JurisdictionType < Types::BaseObject
    field :id, ID, null: false
    field :slug, String, null: false
    field :name, String, null: false
    field :type, String, null: false
    field :province, String, null: true

    # Associations
    field :financial_years, [Types::FinancialYearType], null: false
    field :latest_year, Types::FinancialYearType, null: true

    # StatsCan Enrichment
    field :sgc_code, String, null: true
    field :population, Integer, null: true
    field :spending_per_capita, Float, null: true

    def latest_year
      object.latest_year
    end

    def sgc_code
      object.statscan_mapping&.sgc_code
    end

    def population
      BatchLoader::GraphQL.for(object.id).batch do |ids, loader|
        Jurisdiction.includes(statscan_mapping: :geography).where(id: ids).each do |j|
          pop = j.statscan_mapping&.geography&.latest_population
          loader.call(j.id, pop)
        end
      end
    end

    def spending_per_capita
      object.spending_per_capita
    end
  end
end

# app/graphql/types/financial_year_type.rb
module Types
  class FinancialYearType < Types::BaseObject
    field :id, ID, null: false
    field :year, String, null: false
    field :total_spending, Float, null: false
    field :total_revenue, Float, null: true
    field :total_employees, Integer, null: true
    field :net_debt, Float, null: true
    field :total_debt, Float, null: true
    field :debt_interest, Float, null: true
    field :methodology, String, null: true
    field :credits, String, null: true

    field :departments, [Types::DepartmentType], null: false
    field :sources, [Types::SourceType], null: false
    field :sankey_data, GraphQL::Types::JSON, null: true

    def sankey_data
      object.sankey_data
    end
  end
end

# app/graphql/types/department_type.rb
module Types
  class DepartmentType < Types::BaseObject
    field :id, ID, null: false
    field :slug, String, null: false
    field :name, String, null: false
    field :total_spending, Float, null: false
    field :percentage, Float, null: true

    field :spending_items, [Types::SpendingItemType], null: false do
      argument :parent_id, ID, required: false
    end

    field :content, Types::DepartmentContentType, null: true

    def spending_items(parent_id: nil)
      if parent_id
        object.spending_items.where(parent_id: parent_id).order(:sort_order)
      else
        object.root_spending_items
      end
    end

    def content
      object.content
    end
  end
end

# app/graphql/types/spending_item_type.rb
module Types
  class SpendingItemType < Types::BaseObject
    field :id, ID, null: false
    field :name, String, null: false
    field :amount, Float, null: true
    field :level, Integer, null: false
    field :parent_id, ID, null: true

    # Recursive children
    field :children, [Types::SpendingItemType], null: false

    def children
      BatchLoader::GraphQL.for(object.id).batch(default_value: []) do |ids, loader|
        SpendingItem.where(parent_id: ids).order(:sort_order).each do |item|
          loader.call(item.parent_id) { |memo| memo << item }
        end
      end
    end
  end
end

Mutations

# app/graphql/types/mutation_type.rb
module Types
  class MutationType < Types::BaseObject
    field :update_financial_year, mutation: Mutations::UpdateFinancialYear
    field :create_department, mutation: Mutations::CreateDepartment
    field :subscribe_to_notifications, mutation: Mutations::SubscribeToNotifications
  end
end

# app/graphql/mutations/update_financial_year.rb
module Mutations
  class UpdateFinancialYear < BaseMutation
    argument :id, ID, required: true
    argument :total_spending, Float, required: false
    argument :total_revenue, Float, required: false
    argument :net_debt, Float, required: false

    field :financial_year, Types::FinancialYearType, null: true
    field :errors, [String], null: false

    def resolve(id:, **attributes)
      financial_year = FinancialYear.find(id)

      # Authorization
      raise Pundit::NotAuthorizedError unless FinancialYearPolicy.new(context[:current_user], financial_year).update?

      if financial_year.update(attributes)
        { financial_year: financial_year, errors: [] }
      else
        { financial_year: nil, errors: financial_year.errors.full_messages }
      end
    end
  end
end

Example GraphQL Queries (Client-Side)

# Get jurisdiction with latest year
query GetJurisdiction($slug: String!) {
  jurisdiction(slug: $slug) {
    id
    name
    slug
    type
    population
    spendingPerCapita
    latestYear {
      year
      totalSpending
      totalRevenue
      netDebt
      departments {
        id
        slug
        name
        totalSpending
        percentage
      }
      sankeyData
    }
  }
}

# Get department with nested spending (3 levels)
query GetDepartment($jurisdictionSlug: String!, $departmentSlug: String!) {
  department(
    jurisdictionSlug: $jurisdictionSlug
    departmentSlug: $departmentSlug
  ) {
    id
    name
    totalSpending
    spendingItems {
      id
      name
      amount
      children {
        id
        name
        amount
        children {
          id
          name
          amount
        }
      }
    }
    content {
      introText
      descriptionText
      programsDescription
    }
  }
}

# Get population trends
query GetPopulationTrends($sgcCode: String!, $years: Int!) {
  populationData(sgcCode: $sgcCode, years: $years) {
    refDate
    population
  }
}

# Update financial year (admin only)
mutation UpdateFinancialYear($id: ID!, $totalSpending: Float!) {
  updateFinancialYear(id: $id, totalSpending: $totalSpending) {
    financialYear {
      id
      totalSpending
    }
    errors
  }
}

Sidekiq Background Jobs

StatsCan Data Sync Jobs

# app/jobs/statscan/sync_population_job.rb
module Statscan
  class SyncPopulationJob < ApplicationJob
    queue_as :statscan_sync

    def perform(product_id = '1710000501', years = 10)
      sync_job = DataSyncJob.create!(
        sync_type: 'population',
        product_id: product_id,
        start_time: Time.current,
        status: 'running'
      )

      begin
        fetcher = Statscan::PopulationFetcher.new(product_id)
        results = fetcher.fetch_latest(years)

        sync_job.update!(
          records_fetched: results[:fetched],
          records_inserted: results[:inserted],
          records_updated: results[:updated],
          errors: results[:errors],
          status: results[:errors].zero? ? 'success' : 'partial',
          end_time: Time.current
        )

        Rails.logger.info "StatsCan population sync completed: #{results.inspect}"
      rescue => e
        sync_job.update!(
          status: 'failed',
          error_details: e.message,
          end_time: Time.current
        )
        raise
      end
    end
  end
end

# app/services/statscan/population_fetcher.rb
module Statscan
  class PopulationFetcher
    BASE_URL = 'https://www150.statcan.gc.ca/t1/wds/rest'

    def initialize(product_id)
      @product_id = product_id
    end

    def fetch_latest(years)
      vectors = province_vectors  # Mapping of SGC codes to vector IDs
      results = { fetched: 0, inserted: 0, updated: 0, errors: 0 }

      vectors.each do |sgc_code, vector_id|
        begin
          data = fetch_vector_data(vector_id, years)

          data.each do |point|
            pop = Statscan::Population.find_or_initialize_by(
              sgc_code: sgc_code,
              ref_date: point['refPer'],
              frequency: 'annual',
              age_group: nil,
              gender: nil
            )

            pop.assign_attributes(
              population: point['value'].to_i,
              vector_id: vector_id,
              statscan_update_date: Time.current
            )

            if pop.new_record?
              pop.save!
              results[:inserted] += 1
            elsif pop.changed?
              pop.save!
              results[:updated] += 1
            end

            results[:fetched] += 1
          end
        rescue => e
          Rails.logger.error "Error fetching vector #{vector_id}: #{e.message}"
          results[:errors] += 1
        end
      end

      results
    end

    private

    def fetch_vector_data(vector_id, periods)
      conn = Faraday.new(url: BASE_URL) do |f|
        f.request :json
        f.response :json
        f.adapter Faraday.default_adapter
      end

      response = conn.post('getDataFromVectorsAndLatestNPeriods') do |req|
        req.body = [{ vectorId: vector_id, latestN: periods }]
      end

      if response.success? && response.body['status'] == 'SUCCESS'
        response.body.dig('object', 0, 'vectorDataPoint') || []
      else
        raise "StatsCan API error: #{response.body}"
      end
    end

    def province_vectors
      {
        '35' => 'v466670',  # Ontario
        '59' => 'v466730',  # BC
        '48' => 'v466710',  # Alberta
        # Add more provinces...
      }
    end
  end
end

Scheduled Jobs (Cron)

# config/initializers/sidekiq_cron.rb
require 'sidekiq/cron/job'

Sidekiq::Cron::Job.create(
  name: 'Sync StatsCan Population - Monthly',
  cron: '0 2 1 * *',  # 2 AM on 1st of each month
  class: 'Statscan::SyncPopulationJob'
)

Sidekiq::Cron::Job.create(
  name: 'Sync StatsCan GDP - Quarterly',
  cron: '0 3 1 */3 *',  # 3 AM on 1st of every 3 months
  class: 'Statscan::SyncGdpJob'
)

Sidekiq::Cron::Job.create(
  name: 'Refresh Sankey Materialized View - Daily',
  cron: '0 4 * * *',  # 4 AM daily
  class: 'RefreshSankeyMaterializedViewJob'
)

Notification Jobs

# app/jobs/data_change_notification_job.rb
class DataChangeNotificationJob < ApplicationJob
  queue_as :notifications

  def perform(entity_type, entity_id, event_type)
    entity = find_entity(entity_type, entity_id)
    subscriptions = NotificationSubscription.active
                      .where(entity_type: [entity_type, 'all'])
                      .where('? = ANY(event_types)', event_type)

    subscriptions.find_each do |subscription|
      next unless subscription.should_notify?(entity)

      case subscription.notification_method
      when 'email'
        send_email_notification(subscription, entity, event_type)
      when 'webhook'
        send_webhook_notification(subscription, entity, event_type)
      end
    end
  end

  private

  def send_email_notification(subscription, entity, event_type)
    log = NotificationLog.create!(
      subscription: subscription,
      entity_type: entity.class.name,
      entity_id: entity.id,
      event_type: event_type,
      change_summary: build_change_summary(entity),
      status: 'pending'
    )

    begin
      DataChangeMailer.notify(subscription.user, entity, event_type).deliver_now
      log.update!(status: 'sent', sent_at: Time.current)
    rescue => e
      log.update!(status: 'failed', error_message: e.message)
      raise
    end
  end

  def find_entity(type, id)
    type.constantize.find(id)
  end

  def build_change_summary(entity)
    # Get changes from audit log
    audit = DataChangeAudit.where(table_name: entity.class.table_name, record_id: entity.id)
                           .order(created_at: :desc).first

    return {} unless audit

    {
      changes: audit.new_values.except('updated_at').select { |k, v|
        audit.old_values[k] != v
      }
    }
  end
end

# app/mailers/data_change_mailer.rb
class DataChangeMailer < ApplicationMailer
  def notify(user, entity, event_type)
    @user = user
    @entity = entity
    @event_type = event_type
    @changes_url = "#{ENV['FRONTEND_URL']}/#{entity.class.table_name}/#{entity.id}"

    mail(
      to: user.email,
      subject: "Data Updated: #{entity_name(entity)}"
    )
  end

  private

  def entity_name(entity)
    case entity
    when FinancialYear
      "#{entity.jurisdiction.name} #{entity.year}"
    when Department
      entity.name
    else
      entity.to_s
    end
  end
end

Email Template

<!-- app/views/data_change_mailer/notify.html.erb -->
<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <style>
    body { font-family: Arial, sans-serif; line-height: 1.6; }
    .container { max-width: 600px; margin: 0 auto; padding: 20px; }
    .header { background: #2563eb; color: white; padding: 20px; }
    .content { padding: 20px; background: #f9fafb; }
    .button { display: inline-block; background: #2563eb; color: white;
              padding: 12px 24px; text-decoration: none; border-radius: 4px; }
  </style>
</head>
<body>
  <div class="container">
    <div class="header">
      <h1>CanadaSpends Data Update</h1>
    </div>
    <div class="content">
      <p>Hello <%= @user.email %>,</p>

      <p>
        Data has been updated for <strong><%= entity_name(@entity) %></strong>.
      </p>

      <p>Event: <strong><%= @event_type.titleize %></strong></p>

      <p>
        <a href="<%= @changes_url %>" class="button">View Changes</a>
      </p>

      <p>
        <small>
          You're receiving this because you subscribed to notifications for this data.
          <a href="<%= ENV['FRONTEND_URL'] %>/profile/notifications">Manage subscriptions</a>
        </small>
      </p>
    </div>
  </div>
</body>
</html>

Migration Strategy

Phase 1: AWS Infrastructure Setup (Week 1)

  1. Launch EC2 Instance

    • Launch t3.medium Ubuntu 22.04 instance
    • Allocate and attach Elastic IP
    • Configure security group (ports 80, 443, 22)
    • SSH in and update system: sudo apt update && sudo apt upgrade -y
  2. Install Software Stack

    # Install Ruby 3.2
    sudo apt install -y ruby-full build-essential git
    
    # Install PostgreSQL client
    sudo apt install -y postgresql-client
    
    # Install Redis locally
    sudo apt install -y redis-server
    sudo systemctl enable redis-server
    
    # Install Nginx
    sudo apt install -y nginx
    
    # Install Certbot for Let's Encrypt SSL
    sudo apt install -y certbot python3-certbot-nginx
  3. Launch RDS PostgreSQL

    • Launch db.t3.small PostgreSQL 15 instance
    • Enable public access in RDS settings
    • Configure security group (port 5432 from 0.0.0.0/0)
    • Create initial database canadaspends
    • Create admin user and read-only user
    • Enable automated backups (7-day retention)
  4. Configure DNS

    • Point your domain (e.g., api.canadaspends.ca) to Elastic IP
    • Wait for DNS propagation
  5. Set up SSL

    sudo certbot --nginx -d api.canadaspends.ca
    # Auto-renews via cron

Phase 2: Rails Application Setup (Week 2)

  1. Initialize Rails Project

    rails new canadaspends-api --api --database=postgresql
    cd canadaspends-api
  2. Database Migrations

    • Create all table migrations (jurisdictions, financial_years, etc.)
    • Create StatsCan tables
    • Create notification tables
    • Add indexes, triggers, views
  3. Configure Environment

    # .env.production
    DATABASE_URL=postgresql://user:pass@rds-endpoint:5432/canadaspends
    REDIS_URL=redis://elasticache-endpoint:6379/0
    AWS_REGION=us-east-1
    SES_SMTP_ADDRESS=email-smtp.us-east-1.amazonaws.com
    FRONTEND_URL=https://www.canadaspends.ca

Phase 3: Data Migration (Week 2-3)

  1. Create Migration Scripts

    # lib/tasks/migrate_json_data.rake
    namespace :data do
      desc "Migrate JSON files to PostgreSQL"
      task migrate: :environment do
        DataMigrator.new.migrate_all
      end
    end
    
    # app/services/data_migrator.rb
    class DataMigrator
      def migrate_all
        migrate_jurisdictions
        migrate_financial_years
        migrate_departments
        migrate_spending_items
        validate_migration
      end
    
      def migrate_jurisdictions
        Dir.glob('data/provincial/*/').each do |path|
          slug = File.basename(path)
          # Parse summary.json, create Jurisdiction record
        end
      end
    
      # ... (detailed implementation)
    end
  2. Flatten Hierarchical Data

    def migrate_spending_items(department, spending_data, parent = nil, level = 0)
      spending_data['children']&.each_with_index do |item, index|
        spending_item = department.spending_items.create!(
          parent: parent,
          name: item['name'],
          amount: item['amount'],
          level: level,
          sort_order: index
        )
    
        # Recursively process children
        if item['children'].present?
          migrate_spending_items(department, item, spending_item, level + 1)
        end
      end
    end
  3. Validate Data Integrity

    • Compare totals from JSON vs. database
    • Check all relationships exist
    • Verify no data loss

Phase 4: StatsCan Integration (Week 3)

  1. Populate Geographic Reference Data

    • Load SGC 2021 codes into statscan_geography
    • Create jurisdiction mappings
  2. Initial StatsCan Data Sync

    • Run population sync for last 10 years
    • Run GDP sync
    • Run government spending sync (for validation)
  3. Test Scheduled Jobs

    • Verify Sidekiq cron jobs work
    • Check data freshness logic

Phase 5: GraphQL API Development (Week 4)

  1. Implement GraphQL Schema

    • All query types
    • Mutations for admin operations
    • Optimize with graphql-batch
  2. Add Authentication Middleware

    # app/graphql/canadaspends_schema.rb
    class CanadaSpendsSchema < GraphQL::Schema
      mutation(Types::MutationType)
      query(Types::QueryType)
    
      def self.unauthorized_object(error)
        raise GraphQL::ExecutionError, "Unauthorized"
      end
    end
    
    # app/controllers/graphql_controller.rb
    class GraphqlController < ApplicationController
      def execute
        context = {
          current_user: current_user
        }
    
        result = CanadaSpendsSchema.execute(
          params[:query],
          variables: params[:variables],
          context: context
        )
    
        render json: result
      end
    
      private
    
      def current_user
        token = request.headers['Authorization']&.split(' ')&.last
        return nil unless token
    
        # Verify JWT or session
        User.find_by(id: decode_token(token))
      end
    end

Phase 6: Deployment & Configuration (Week 5)

  1. Deploy Rails to EC2

    • Use Capistrano for git-based deployment
    • Configure Nginx reverse proxy (already set up with Certbot)
    • Configure Puma as systemd service
    • Set up Sidekiq as systemd service
  2. Nginx Configuration

    # /etc/nginx/sites-available/canadaspends-api
    upstream puma {
      server unix:///home/deploy/canadaspends-api/shared/tmp/sockets/puma.sock;
    }
    
    server {
      listen 80;
      server_name api.canadaspends.ca;
      return 301 https://$server_name$request_uri;
    }
    
    server {
      listen 443 ssl http2;
      server_name api.canadaspends.ca;
    
      ssl_certificate /etc/letsencrypt/live/api.canadaspends.ca/fullchain.pem;
      ssl_certificate_key /etc/letsencrypt/live/api.canadaspends.ca/privkey.pem;
    
      root /home/deploy/canadaspends-api/current/public;
    
      location / {
        proxy_pass http://puma;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
      }
    }
  3. Health Monitoring

    • Add /health endpoint in Rails
    • Monitor with CloudWatch (optional)
    • Or use external monitoring (UptimeRobot, Pingdom)

Phase 7: Next.js Frontend Updates (Week 5-6)

  1. Install Apollo Client

    pnpm add @apollo/client graphql
  2. Replace Data Fetching

    // src/lib/apollo-client.ts
    import { ApolloClient, InMemoryCache, HttpLink } from '@apollo/client'
    
    export const apolloClient = new ApolloClient({
      link: new HttpLink({
        uri: process.env.NEXT_PUBLIC_GRAPHQL_URL,
      }),
      cache: new InMemoryCache(),
    })
    
    // src/app/[lang]/(main)/[jurisdiction]/page.tsx
    import { gql } from '@apollo/client'
    import { apolloClient } from '@/lib/apollo-client'
    
    const GET_JURISDICTION = gql`
      query GetJurisdiction($slug: String!) {
        jurisdiction(slug: $slug) {
          id
          name
          population
          latestYear {
            totalSpending
            departments { id name totalSpending }
          }
        }
      }
    `
    
    export default async function JurisdictionPage({ params }) {
      const { data } = await apolloClient.query({
        query: GET_JURISDICTION,
        variables: { slug: params.jurisdiction },
      })
    
      return <JurisdictionView data={data.jurisdiction} />
    }

Phase 8: Admin Dashboard (Week 6)

  1. Build Admin UI

    • Option A: Rails Admin gem (quick)
    • Option B: Next.js admin routes + GraphQL mutations (flexible)
    • Features: CRUD for all entities, data import, user management
  2. Notification Subscription UI

    • User profile page
    • Subscribe/unsubscribe to entity changes
    • Email preferences

Phase 9: Testing & Optimization (Week 7)

  1. Performance Testing

    • Load test GraphQL endpoint (k6, Artillery)
    • Optimize slow queries
    • Add database indexes as needed
  2. Caching Implementation

    • Rails.cache for expensive queries
    • HTTP caching headers
    • CDN configuration for GraphQL (if applicable)
  3. Monitoring Setup

    • CloudWatch dashboards
    • Error tracking (Sentry, Rollbar)
    • Uptime monitoring

Phase 10: Launch & Monitoring (Week 8)

  1. Final Testing

    • Full QA testing on EC2 instance
    • Test all GraphQL queries
    • Verify StatsCan sync jobs running
    • Test email notifications
  2. Production Launch

    • Ensure DNS is pointed correctly
    • Update Next.js env vars (NEXT_PUBLIC_GRAPHQL_URL)
    • Deploy Next.js frontend with new API endpoint
    • Monitor for errors
  3. Post-Launch Monitoring

    • Monitor EC2 CPU/memory usage
    • Monitor RDS connections and queries
    • Check Sidekiq job success rates
    • Review email delivery stats
    • Gather user feedback
    • Optimize slow queries

Timeline Estimate

Phase Duration Tasks
Phase 1: AWS Setup 1 week EC2 launch, RDS setup, software install, SSL
Phase 2: Rails App 1 week Init project, migrations, models
Phase 3: Data Migration 1-2 weeks Scripts, flatten hierarchies, validate
Phase 4: StatsCan Integration 1 week Sync jobs, populate data
Phase 5: GraphQL API 1 week Schema, resolvers, auth
Phase 6: Deployment 1 week Capistrano, Nginx, systemd services
Phase 7: Frontend Updates 1-2 weeks Apollo Client, replace data layer
Phase 8: Admin Dashboard 1 week CRUD UI, subscriptions
Phase 9: Testing 1 week Performance, optimization
Phase 10: Launch 1 week Final testing, production deploy
Total 10-12 weeks (2.5-3 months)

Cost-Benefit Analysis

Costs

Development:

  • Developer time: 10-12 weeks at ~$100/hr = $40,000-48,000
  • Or internal team allocation

Infrastructure (Monthly):

  • AWS: $60/month ($36 with Reserved Instances)
  • Email (Resend/Postmark): ~$0-5/month (free tier or minimal)
  • Total: ~$60/month

Annual Infrastructure:

  • ~$720/year (on-demand)
  • ~$432/year (Reserved Instances)

Risks & Mitigation

Risk Impact Probability Mitigation
Migration data loss High Medium Thorough validation, keep JSON backups, staged rollout
AWS costs exceed budget Medium Low Use Reserved Instances, monitor with CloudWatch Alarms, set billing alerts
StatsCan API rate limits Medium Medium Respect 25 req/sec limit, implement exponential backoff, cache aggressively
GraphQL N+1 queries High Medium Use graphql-batch, database query monitoring, add indexes
Security breach High Low Devise + Pundit auth, security groups, regular updates, penetration testing
Email deliverability Low Medium Use AWS SES verified domain, DKIM/SPF records, monitor bounce rates
Timeline overruns Medium Medium Prioritize MVP features, defer nice-to-haves, agile sprints

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions