Database Migration: Move from JSON Files to AWS EC2 on Ruby on Rails

# Database Migration: JSON Files → PostgreSQL (AWS RDS) + Ruby on Rails GraphQL API + StatsCan Integration

## Overview

Migrate CanadaSpends from file-based JSON data storage to a professional full-stack architecture:

**Infrastructure:**
- **Database:** PostgreSQL 15 on AWS RDS (Multi-AZ)
- **Backend API:** Ruby on Rails 7.1 with GraphQL
- **Frontend:** Next.js 15 (existing, updated to consume GraphQL)
- **Compute:** AWS EC2 with Auto Scaling
- **Background Jobs:** Sidekiq + Redis (ElastiCache)
- **Load Balancing:** Application Load Balancer (ALB)
- **Emails:** AWS SES for notifications

**New Capabilities:**
- ✅ Real-time data updates without rebuilding the site
- ✅ GraphQL API for flexible data querying
- ✅ **Statistics Canada (StatsCan) integration** for population, GDP, census data
- ✅ **Email notification system** when data is updated/corrected
- ✅ Admin dashboard for data management
- ✅ Automated StatsCan data syncing (Sidekiq cron jobs)
- ✅ Better scalability and data integrity
- ✅ Professional production-grade architecture

---

## Current Architecture Analysis

### Data Storage
- **Format:** JSON/CSV files in `/data` directory
- **Structure:** Hierarchical (Province → Year → Department → Programs → Line Items)
- **Size:** ~28 departments per province, multiple years, deeply nested spending data
- **Flow:** Build-time file reading (`fs.readFileSync()`) → Static Site Generation (SSG)

### Current Data Entities
1. **Jurisdictions** (provinces/municipalities)
2. **Financial Years** (year-specific totals and metadata)
3. **Departments/Ministries** (per jurisdiction-year)
4. **Spending Items** (hierarchical tree, 3-5 levels deep)
5. **Revenue Items** (hierarchical breakdown)
6. **Editorial Content** (markdown descriptions, methodology, credits)

### Current Tech Stack
- Next.js 15 (App Router) + React 19 + TypeScript
- Data visualization: D3.js (Sankey) + Recharts
- No current database or API layer
- Server Components with file system reads

### Limitations
- ❌ Data updates require full site rebuild
- ❌ No ability to query or filter data dynamically
- ❌ No population/economic context (per-capita calculations return null)
- ❌ Manual data entry process
- ❌ No notifications when data changes
- ❌ Doesn't scale with data growth

---

## Architecture Decision: AWS RDS + Ruby on Rails + GraphQL

### Why This Stack?

#### **PostgreSQL on AWS RDS**
- ✅ Fully managed, production-grade PostgreSQL 15
- ✅ Automatic backups, point-in-time recovery (PITR)
- ✅ Multi-AZ deployment for 99.95% uptime SLA
- ✅ Scalable (start db.t3.medium, scale to db.m5.large+)
- ✅ VPC isolation for security
- ✅ No vendor lock-in (standard Postgres, can export/migrate)
- ✅ Read replicas for heavy analytics
- ✅ Excellent for complex queries (recursive CTEs, window functions, JSONB)

#### **Ruby on Rails 7.1**
- ✅ Mature, battle-tested web framework (18+ years)
- ✅ Convention over configuration → rapid development
- ✅ ActiveRecord ORM → elegant database queries
- ✅ Built-in authentication (Devise gem)
- ✅ Background jobs (ActiveJob + Sidekiq)
- ✅ Large ecosystem of gems
- ✅ Strong MVC patterns
- ✅ Easy deployment (Capistrano, Docker)

#### **GraphQL API (graphql-ruby gem)**
- ✅ Flexible querying (clients request exactly what they need)
- ✅ **Perfect for hierarchical spending data** (nested queries)
- ✅ Single endpoint (vs. dozens of REST routes)
- ✅ Type-safe schema with introspection
- ✅ Excellent Next.js integration (Apollo Client, urql)
- ✅ Reduces over-fetching and under-fetching
- ✅ Built-in query batching and caching

#### **AWS EC2 + Auto Scaling**
- ✅ Full control over application server
- ✅ Auto Scaling Groups scale with traffic
- ✅ Load Balancer distributes traffic
- ✅ Standard deployment target
- ✅ Can run Rails + Sidekiq workers

#### **Sidekiq (Background Jobs)**
- ✅ Efficient multi-threaded job processing
- ✅ **Perfect for StatsCan API syncing** (scheduled jobs)
- ✅ **Email notifications** via ActiveMailer
- ✅ Retry logic, job monitoring
- ✅ Web UI for job inspection

#### **AWS SES (Email)**
- ✅ Transactional email service
- ✅ High deliverability rates
- ✅ Pay-as-you-go ($0.10 per 1,000 emails)
- ✅ SMTP integration with ActionMailer

### PostgreSQL Advantages
- **JSONB columns:** Store nested spending trees efficiently
- **Recursive CTEs:** Query hierarchical data elegantly
- **Materialized views:** Cache complex aggregations (Sankey data)
- **Full-text search:** Search spending items (future feature)
- **Triggers & functions:** Auto-update totals, send notifications
- **Window functions:** Advanced analytics (year-over-year growth, rankings)
- **PostGIS extension:** Geographic analysis (future)

---

## AWS Infrastructure Architecture (Simplified)

### High-Level Diagram

```
┌──────────────────────────────────────────────────────────┐
│                       Internet                           │
└────────────────────┬─────────────────────────────────────┘
                     │
        ┌────────────┴────────────┐
        │                         │
        ▼                         ▼
┌───────────────────┐   ┌─────────────────────────┐
│   Next.js App     │   │   Single EC2 Instance   │
│   (Vercel/CDN)    │   │   (your-domain.com)     │
│                   │   │                         │
│   - SSR/SSG       │   │   Rails 7 + GraphQL     │
│   - Apollo Client │───┤   + Sidekiq (same box)  │
│                   │   │   + Nginx/Puma          │
└───────────────────┘   │   + Redis (local)       │
                        └──────────┬──────────────┘
                                   │
                    ┌──────────────┼──────────────┐
                    │              │              │
                    ▼              ▼              ▼
              ┌──────────┐   ┌─────────┐   ┌────────────┐
              │   RDS    │   │  Redis  │   │  StatsCan  │
              │ Postgres │   │ (local) │   │    API     │
              │          │   │   OR    │   │ (External) │
              │ Public   │   │MemoryS- │   └────────────┘
              │ Access   │   │  tore   │
              └──────────┘   └─────────┘
```

### Infrastructure Components

#### **Single EC2 Instance (All-in-One)**
- **Instance Type:** t3.medium (2 vCPU, 4GB RAM) or t3.large (2 vCPU, 8GB RAM)
- **OS:** Ubuntu 22.04 LTS
- **Software Stack:**
  - Ruby 3.2 + Rails 7.1
  - GraphQL API (graphql-ruby)
  - Nginx (reverse proxy, SSL termination with Let's Encrypt)
  - Puma (Rails app server)
  - Sidekiq (background jobs)
  - Redis (local install or in-memory)
- **Public IP:** Elastic IP for stable DNS pointing
- **Storage:** 50GB SSD (expand as needed)
- **Deployment:** Git-based (Capistrano) or Docker

#### **RDS PostgreSQL (Public Access)**
- **Instance Type:** db.t3.small (2 vCPU, 2GB RAM) → start, scale to db.t3.medium if needed
- **Storage:** 50GB GP3 SSD, auto-scaling to 200GB
- **Multi-AZ:** Optional (adds cost, but provides failover)
- **Backup:** Automated daily snapshots, 7-day retention
- **Version:** PostgreSQL 15.x
- **Public Access:** **YES** - accessible from internet (read-only users)
- **Encryption:** At rest (default AWS encryption)
- **Security:**
  - PostgreSQL firewall rules (allow specific IPs or 0.0.0.0/0 with strong passwords)
  - Separate read-only user for public queries
  - Admin user only for EC2 instance

#### **Redis (Local or AWS)**
**Option A: Local Redis on EC2** (Simplest, $0)
- Install Redis via apt
- Use for Rails cache + Sidekiq job queue
- Persists to disk

**Option B: AWS MemoryStore (Managed)** (~$15/month)
- cache.t4g.micro (0.5GB)
- If you want managed Redis

**Recommendation:** Start with local Redis on EC2

#### **Email Service (NOT AWS SES)**
**Options:**
- **Resend** (resend.com) - 3,000 emails/month free
- **Postmark** - 100 emails/month free, then $1.50/1K
- **SendGrid** - 100 emails/day free
- **Mailgun** - 5,000 emails/month free for 3 months
- **Self-hosted SMTP** (Gmail, etc.) - Use with caution

**Recommendation:** Resend or Postmark for transactional emails

#### **DNS (Managed Externally)**
- You manage DNS elsewhere (Cloudflare, Namecheap, etc.)
- Point `api.canadaspends.ca` → EC2 Elastic IP
- SSL via Let's Encrypt (free, auto-renewing with Certbot)

#### **CloudWatch (Optional)**
- Basic EC2 monitoring included free
- Optional: Detailed monitoring, custom metrics (~$5/month)

### Security Configuration (Simplified)

**EC2 Security Group:**
```
Inbound:
  - HTTP (80) from 0.0.0.0/0      # Redirect to HTTPS
  - HTTPS (443) from 0.0.0.0/0    # GraphQL API
  - SSH (22) from YOUR_IP         # Admin access only

Outbound:
  - All traffic (0.0.0.0/0)       # For StatsCan API, package updates, etc.
```

**RDS Security Group:**
```
Inbound:
  - PostgreSQL (5432) from 0.0.0.0/0   # Public read access
  OR
  - PostgreSQL (5432) from EC2 security group only (if you don't want fully public)

Outbound:
  - None required
```

**Database Users:**
```sql
-- Admin user (EC2 only, full access)
CREATE USER admin WITH PASSWORD 'strong_password' SUPERUSER;

-- Public read-only user (for external queries)
CREATE USER public_reader WITH PASSWORD 'public_password';
GRANT CONNECT ON DATABASE canadaspends TO public_reader;
GRANT USAGE ON SCHEMA public TO public_reader;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO public_reader;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO public_reader;
```

**IAM Roles:**
- EC2 Role: CloudWatch logs (optional), S3 for backups (future)

### Estimated AWS Costs (Monthly, USD)

| Service | Configuration | Monthly Cost |
|---------|--------------|--------------|
| **EC2 (Single instance)** | t3.medium (2 vCPU, 4GB RAM) | ~$30 |
| **RDS PostgreSQL** | db.t3.small, 50GB, Single-AZ | ~$25 |
| **Elastic IP** | 1 address (free when attached) | $0 |
| **Data Transfer** | ~20GB egress (modest usage) | ~$2 |
| **Snapshots/Backups** | ~10GB RDS backups | ~$1 |
| **Email (Resend/Postmark)** | Free tier or ~$1-5 | ~$2 |
| **CloudWatch (optional)** | Basic monitoring | $0-$2 |
| **Total** | | **~$60/month** |

**Optional Add-ons:**
- **RDS Multi-AZ:** +$25/month (high availability failover)
- **Larger instances:** t3.large EC2 (+$30), db.t3.medium RDS (+$25)
- **MemoryStore Redis:** cache.t4g.micro (+$15/month)

**Cost Optimization:**
- Use **Reserved Instances** for EC2 & RDS → ~40% savings (~$36/month)
- **Start small:** Single-AZ RDS, t3.small/medium instances
- **Scale up later:** Easy to upgrade instance types as traffic grows

---

## Database Schema

### Core CanadaSpends Tables

#### `jurisdictions`
```sql
CREATE TABLE jurisdictions (
  id               UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  slug             VARCHAR(100) UNIQUE NOT NULL,  -- 'ontario', 'toronto'
  name             VARCHAR(200) NOT NULL,          -- 'Ontario', 'Toronto'
  type             VARCHAR(20) NOT NULL,           -- 'provincial', 'municipal'
  province         VARCHAR(100),                   -- For municipalities
  created_at       TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at       TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_jurisdictions_slug ON jurisdictions(slug);
CREATE INDEX idx_jurisdictions_type ON jurisdictions(type);
```

#### `financial_years`
```sql
CREATE TABLE financial_years (
  id                    UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  jurisdiction_id       UUID NOT NULL REFERENCES jurisdictions(id) ON DELETE CASCADE,
  year                  VARCHAR(20) NOT NULL,  -- '2023-24'
  total_spending        DECIMAL(15,2),         -- In billions
  total_revenue         DECIMAL(15,2),
  total_employees       INTEGER,
  net_debt              DECIMAL(15,2),
  total_debt            DECIMAL(15,2),
  debt_interest         DECIMAL(15,2),
  source_url            TEXT,
  methodology           TEXT,                  -- Markdown
  credits               TEXT,                  -- Markdown
  generated_at          TIMESTAMPTZ,
  created_at            TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at            TIMESTAMPTZ NOT NULL DEFAULT NOW(),

  UNIQUE(jurisdiction_id, year)
);

CREATE INDEX idx_financial_years_jurisdiction ON financial_years(jurisdiction_id);
CREATE INDEX idx_financial_years_year ON financial_years(year);
```

#### `sources`
```sql
CREATE TABLE sources (
  id                UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  financial_year_id UUID NOT NULL REFERENCES financial_years(id) ON DELETE CASCADE,
  label             VARCHAR(500) NOT NULL,
  url               TEXT NOT NULL,
  scope             VARCHAR(200),
  sort_order        INTEGER,
  created_at        TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at        TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_sources_financial_year ON sources(financial_year_id);
```

#### `departments`
```sql
CREATE TABLE departments (
  id                UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  financial_year_id UUID NOT NULL REFERENCES financial_years(id) ON DELETE CASCADE,
  slug              VARCHAR(200) NOT NULL,
  name              VARCHAR(500) NOT NULL,
  total_spending    DECIMAL(15,2),
  percentage        DECIMAL(5,2),
  created_at        TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at        TIMESTAMPTZ NOT NULL DEFAULT NOW(),

  UNIQUE(financial_year_id, slug)
);

CREATE INDEX idx_departments_year ON departments(financial_year_id);
CREATE INDEX idx_departments_slug ON departments(slug);
```

#### `spending_items` (Adjacency List for Hierarchy)
```sql
CREATE TABLE spending_items (
  id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  department_id UUID NOT NULL REFERENCES departments(id) ON DELETE CASCADE,
  parent_id     UUID REFERENCES spending_items(id) ON DELETE CASCADE,  -- NULL for root
  name          TEXT NOT NULL,
  amount        DECIMAL(15,2),
  level         INTEGER NOT NULL,  -- 0=root, 1=program, 2=sub-program, etc.
  sort_order    INTEGER,
  path          TEXT[],            -- Materialized path: ['health', 'hospitals', 'capital']
  created_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_spending_items_department ON spending_items(department_id);
CREATE INDEX idx_spending_items_parent ON spending_items(parent_id);
CREATE INDEX idx_spending_items_path ON spending_items USING GIN(path);
```

#### `revenue_items`
```sql
CREATE TABLE revenue_items (
  id                UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  financial_year_id UUID NOT NULL REFERENCES financial_years(id) ON DELETE CASCADE,
  parent_id         UUID REFERENCES revenue_items(id) ON DELETE CASCADE,
  name              TEXT NOT NULL,
  amount            DECIMAL(15,2),
  level             INTEGER NOT NULL,
  sort_order        INTEGER,
  path              TEXT[],
  created_at        TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at        TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_revenue_items_year ON revenue_items(financial_year_id);
CREATE INDEX idx_revenue_items_parent ON revenue_items(parent_id);
CREATE INDEX idx_revenue_items_path ON revenue_items USING GIN(path);
```

#### `department_content` (Editorial CMS)
```sql
CREATE TABLE department_content (
  id                          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  department_id               UUID UNIQUE NOT NULL REFERENCES departments(id) ON DELETE CASCADE,
  intro_text                  TEXT,
  description_text            TEXT,
  role_text                   TEXT,
  programs_heading            VARCHAR(200),
  programs_description        TEXT,
  leadership_heading          VARCHAR(200),
  leadership_description      TEXT,
  priorities_heading          VARCHAR(200),
  priorities_description      TEXT,
  agencies_heading            VARCHAR(200),
  agencies_description        TEXT,
  budget_projections_text     TEXT,
  created_at                  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at                  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_department_content_dept ON department_content(department_id);
```

---

### Statistics Canada Integration Tables

#### `statscan_geography`
```sql
CREATE TABLE statscan_geography (
  id                      SERIAL PRIMARY KEY,
  sgc_code                VARCHAR(7) UNIQUE NOT NULL,  -- Standard Geographical Classification
  geo_level               VARCHAR(20) NOT NULL,        -- 'canada', 'province', 'division', 'subdivision'
  parent_sgc_code         VARCHAR(7),
  official_name           VARCHAR(200) NOT NULL,
  name_en                 VARCHAR(200),
  name_fr                 VARCHAR(200),
  geo_type                VARCHAR(50),                 -- 'City', 'Town', 'Municipality'
  province_code           VARCHAR(2),
  census_division_code    VARCHAR(4),
  latitude                DECIMAL(10,7),
  longitude               DECIMAL(10,7),
  vintage                 INTEGER,                     -- Census year (2021, 2016)
  created_at              TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at              TIMESTAMPTZ NOT NULL DEFAULT NOW(),

  FOREIGN KEY (parent_sgc_code) REFERENCES statscan_geography(sgc_code)
);

CREATE INDEX idx_statscan_geo_province ON statscan_geography(province_code);
CREATE INDEX idx_statscan_geo_level ON statscan_geography(geo_level);
CREATE INDEX idx_statscan_geo_parent ON statscan_geography(parent_sgc_code);
```

#### `statscan_population`
```sql
CREATE TABLE statscan_population (
  id                    SERIAL PRIMARY KEY,
  sgc_code              VARCHAR(7) NOT NULL,
  ref_date              DATE NOT NULL,            -- July 1 for annual, quarter end for quarterly
  frequency             VARCHAR(10) NOT NULL,     -- 'annual', 'quarterly'
  population            BIGINT NOT NULL,
  age_group             VARCHAR(20),              -- NULL for total, '0-4', '5-9', etc.
  gender                VARCHAR(10),              -- NULL for total, 'Male', 'Female'
  vector_id             VARCHAR(15),              -- StatsCan vector ID
  statscan_update_date  TIMESTAMPTZ,
  created_at            TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at            TIMESTAMPTZ NOT NULL DEFAULT NOW(),

  FOREIGN KEY (sgc_code) REFERENCES statscan_geography(sgc_code) ON DELETE CASCADE,
  UNIQUE(sgc_code, ref_date, frequency, COALESCE(age_group, ''), COALESCE(gender, ''))
);

CREATE INDEX idx_statscan_pop_geo_date ON statscan_population(sgc_code, ref_date);
CREATE INDEX idx_statscan_pop_vector ON statscan_population(vector_id);
```

#### `statscan_gdp`
```sql
CREATE TABLE statscan_gdp (
  id                    SERIAL PRIMARY KEY,
  sgc_code              VARCHAR(7) NOT NULL,      -- Province/territory only
  year                  INTEGER NOT NULL,
  price_basis           VARCHAR(30) NOT NULL,     -- 'current', 'constant_2017', 'chained_2017'
  indicator             VARCHAR(100) NOT NULL,    -- 'GDP at market prices', 'Final consumption'
  value_millions        DECIMAL(15,2),            -- Always in millions
  vector_id             VARCHAR(15),
  statscan_update_date  TIMESTAMPTZ,
  created_at            TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at            TIMESTAMPTZ NOT NULL DEFAULT NOW(),

  FOREIGN KEY (sgc_code) REFERENCES statscan_geography(sgc_code) ON DELETE CASCADE,
  UNIQUE(sgc_code, year, price_basis, indicator)
);

CREATE INDEX idx_statscan_gdp_geo_year ON statscan_gdp(sgc_code, year);
CREATE INDEX idx_statscan_gdp_indicator ON statscan_gdp(indicator);
```

#### `statscan_govt_spending`
```sql
CREATE TABLE statscan_govt_spending (
  id                    SERIAL PRIMARY KEY,
  sgc_code              VARCHAR(7) NOT NULL,
  year                  INTEGER NOT NULL,
  govt_level            VARCHAR(30),              -- 'Provincial', 'Consolidated PTLG', 'Local'
  ccofog_code           VARCHAR(20),              -- Canadian Classification of Functions of Government
  function_name         VARCHAR(200),             -- 'Health', 'Education', 'Social protection'
  amount_millions       DECIMAL(15,2),
  per_capita            DECIMAL(10,2),            -- Pre-calculated by StatsCan
  vector_id             VARCHAR(15),
  statscan_update_date  TIMESTAMPTZ,
  created_at            TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at            TIMESTAMPTZ NOT NULL DEFAULT NOW(),

  FOREIGN KEY (sgc_code) REFERENCES statscan_geography(sgc_code) ON DELETE CASCADE,
  UNIQUE(sgc_code, year, govt_level, ccofog_code)
);

CREATE INDEX idx_statscan_govt_spend_geo ON statscan_govt_spending(sgc_code, year);
CREATE INDEX idx_statscan_govt_spend_function ON statscan_govt_spending(function_name);
```

#### `statscan_census_demographics`
```sql
CREATE TABLE statscan_census_demographics (
  id              SERIAL PRIMARY KEY,
  sgc_code        VARCHAR(7) NOT NULL,
  census_year     INTEGER NOT NULL,             -- 2021, 2016, 2011
  indicator       VARCHAR(100) NOT NULL,        -- 'Median household income', 'Employment rate'
  category        VARCHAR(100),                 -- Additional breakdown
  value           DECIMAL(15,2),
  value_text      VARCHAR(500),                 -- For non-numeric values
  created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),

  FOREIGN KEY (sgc_code) REFERENCES statscan_geography(sgc_code) ON DELETE CASCADE,
  UNIQUE(sgc_code, census_year, indicator, COALESCE(category, ''))
);

CREATE INDEX idx_statscan_census_geo_year ON statscan_census_demographics(sgc_code, census_year);
CREATE INDEX idx_statscan_census_indicator ON statscan_census_demographics(indicator);
```

#### `statscan_vectors`
```sql
CREATE TABLE statscan_vectors (
  vector_id           VARCHAR(15) PRIMARY KEY,
  product_id          VARCHAR(10) NOT NULL,
  coordinate          VARCHAR(500),
  frequency           VARCHAR(20),
  scalar_factor_code  VARCHAR(2),
  scalar_description  VARCHAR(50),
  uom                 VARCHAR(100),             -- Unit of measure
  description         TEXT,
  last_update_date    TIMESTAMPTZ,
  created_at          TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at          TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_statscan_vector_product ON statscan_vectors(product_id);
```

#### `statscan_products`
```sql
CREATE TABLE statscan_products (
  product_id        VARCHAR(10) PRIMARY KEY,
  title_en          TEXT,
  title_fr          TEXT,
  subject_code      VARCHAR(4),
  frequency         VARCHAR(20),
  start_date        DATE,
  end_date          DATE,
  url               VARCHAR(500),
  last_update_date  TIMESTAMPTZ,
  created_at        TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at        TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
```

#### `jurisdiction_statscan_mapping`
```sql
CREATE TABLE jurisdiction_statscan_mapping (
  id                  SERIAL PRIMARY KEY,
  jurisdiction_slug   VARCHAR(100) UNIQUE NOT NULL,
  sgc_code            VARCHAR(7) NOT NULL,
  jurisdiction_type   VARCHAR(20),              -- 'provincial', 'municipal', 'federal'
  province_code       VARCHAR(2),
  confidence          VARCHAR(20),              -- 'exact', 'fuzzy', 'manual'
  notes               TEXT,
  created_at          TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at          TIMESTAMPTZ NOT NULL DEFAULT NOW(),

  FOREIGN KEY (sgc_code) REFERENCES statscan_geography(sgc_code) ON DELETE RESTRICT
);

CREATE INDEX idx_jurisdiction_mapping_slug ON jurisdiction_statscan_mapping(jurisdiction_slug);

-- Example mappings
INSERT INTO jurisdiction_statscan_mapping (jurisdiction_slug, sgc_code, jurisdiction_type, province_code, confidence) VALUES
  ('ontario', '35', 'provincial', '35', 'exact'),
  ('toronto', '3520001', 'municipal', '35', 'exact'),
  ('vancouver', '5915022', 'municipal', '59', 'exact'),
  ('british-columbia', '59', 'provincial', '59', 'exact'),
  ('alberta', '48', 'provincial', '48', 'exact'),
  ('edmonton', '4811061', 'municipal', '48', 'exact');
```

---

### Notification System Tables

#### `users` (Devise Authentication)
```sql
CREATE TABLE users (
  id                       UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email                    VARCHAR(255) UNIQUE NOT NULL,
  encrypted_password       VARCHAR(255) NOT NULL,
  reset_password_token     VARCHAR(255) UNIQUE,
  reset_password_sent_at   TIMESTAMPTZ,
  remember_created_at      TIMESTAMPTZ,
  sign_in_count            INTEGER DEFAULT 0,
  current_sign_in_at       TIMESTAMPTZ,
  last_sign_in_at          TIMESTAMPTZ,
  current_sign_in_ip       INET,
  last_sign_in_ip          INET,
  role                     VARCHAR(50) DEFAULT 'viewer',  -- 'admin', 'editor', 'viewer'
  created_at               TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at               TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_reset_token ON users(reset_password_token);
```

#### `notification_subscriptions`
```sql
CREATE TABLE notification_subscriptions (
  id                  SERIAL PRIMARY KEY,
  user_id             UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  entity_type         VARCHAR(50) NOT NULL,     -- 'jurisdiction', 'department', 'all'
  entity_id           UUID,                     -- NULL for 'all'
  event_types         VARCHAR(50)[] NOT NULL,   -- ['data_updated', 'data_corrected']
  notification_method VARCHAR(20) DEFAULT 'email',  -- 'email', 'webhook'
  webhook_url         VARCHAR(500),
  is_active           BOOLEAN DEFAULT true,
  created_at          TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at          TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_notif_subs_user ON notification_subscriptions(user_id);
CREATE INDEX idx_notif_subs_entity ON notification_subscriptions(entity_type, entity_id);
CREATE INDEX idx_notif_subs_active ON notification_subscriptions(is_active) WHERE is_active = true;
```

#### `notification_logs`
```sql
CREATE TABLE notification_logs (
  id                SERIAL PRIMARY KEY,
  subscription_id   INTEGER NOT NULL REFERENCES notification_subscriptions(id) ON DELETE CASCADE,
  entity_type       VARCHAR(50) NOT NULL,
  entity_id         UUID,
  event_type        VARCHAR(50) NOT NULL,
  change_summary    JSONB,                    -- Details of what changed
  sent_at           TIMESTAMPTZ,
  status            VARCHAR(20),              -- 'sent', 'failed', 'pending'
  error_message     TEXT,
  created_at        TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at        TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_notif_logs_subscription ON notification_logs(subscription_id);
CREATE INDEX idx_notif_logs_created ON notification_logs(created_at DESC);
CREATE INDEX idx_notif_logs_status ON notification_logs(status);
```

#### `data_change_audit`
```sql
CREATE TABLE data_change_audit (
  id              SERIAL PRIMARY KEY,
  table_name      VARCHAR(100) NOT NULL,
  record_id       UUID NOT NULL,
  action          VARCHAR(20) NOT NULL,     -- 'insert', 'update', 'delete'
  changed_by      UUID REFERENCES users(id),  -- NULL for automated/system
  old_values      JSONB,
  new_values      JSONB,
  change_reason   TEXT,
  created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_audit_table_record ON data_change_audit(table_name, record_id);
CREATE INDEX idx_audit_created ON data_change_audit(created_at DESC);
CREATE INDEX idx_audit_user ON data_change_audit(changed_by);
```

#### `data_sync_jobs`
```sql
CREATE TABLE data_sync_jobs (
  id                    SERIAL PRIMARY KEY,
  sync_type             VARCHAR(50) NOT NULL,    -- 'population', 'gdp', 'census', 'govt_spending'
  product_id            VARCHAR(10),
  start_time            TIMESTAMPTZ NOT NULL,
  end_time              TIMESTAMPTZ,
  records_fetched       INTEGER,
  records_inserted      INTEGER,
  records_updated       INTEGER,
  errors                INTEGER,
  error_details         TEXT,
  status                VARCHAR(20),             -- 'running', 'success', 'failed', 'partial'
  statscan_release_date DATE,
  created_at            TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at            TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_sync_jobs_type_time ON data_sync_jobs(sync_type, start_time DESC);
CREATE INDEX idx_sync_jobs_status ON data_sync_jobs(status);
```

---

### Database Views & Materialized Views

#### `v_jurisdiction_enriched` (Real-time View)
```sql
CREATE VIEW v_jurisdiction_enriched AS
SELECT
  j.id,
  j.slug,
  j.name,
  j.type,
  fy.id AS financial_year_id,
  fy.year,
  fy.total_spending AS spending_billions,
  fy.total_revenue AS revenue_billions,
  g.sgc_code,
  g.province_code,
  g.geo_level,
  p.population,
  p.ref_date AS population_date,
  -- Per capita spending
  CASE
    WHEN p.population > 0 THEN (fy.total_spending * 1000000000) / p.population
    ELSE NULL
  END AS spending_per_capita,
  -- GDP data
  gdp.value_millions AS gdp_millions,
  -- Spending as % of GDP
  CASE
    WHEN gdp.value_millions > 0 THEN (fy.total_spending * 1000) / gdp.value_millions * 100
    ELSE NULL
  END AS spending_pct_gdp
FROM jurisdictions j
LEFT JOIN financial_years fy ON j.id = fy.jurisdiction_id
LEFT JOIN jurisdiction_statscan_mapping m ON j.slug = m.jurisdiction_slug
LEFT JOIN statscan_geography g ON m.sgc_code = g.sgc_code
-- Get most recent annual population for this jurisdiction/year
LEFT JOIN LATERAL (
  SELECT population, ref_date
  FROM statscan_population
  WHERE sgc_code = g.sgc_code
    AND frequency = 'annual'
    AND age_group IS NULL
    AND gender IS NULL
  ORDER BY ABS(EXTRACT(YEAR FROM ref_date) - CAST(SUBSTRING(fy.year, 1, 4) AS INTEGER))
  LIMIT 1
) p ON true
-- Get GDP for matching year
LEFT JOIN LATERAL (
  SELECT value_millions
  FROM statscan_gdp
  WHERE sgc_code = g.sgc_code
    AND price_basis = 'current'
    AND indicator = 'GDP at market prices'
  ORDER BY ABS(year - CAST(SUBSTRING(fy.year, 1, 4) AS INTEGER))
  LIMIT 1
) gdp ON true;
```

#### `mv_sankey_data` (Materialized View - Cached)
```sql
CREATE MATERIALIZED VIEW mv_sankey_data AS
SELECT
  fy.id AS financial_year_id,
  j.slug AS jurisdiction_slug,
  jsonb_build_object(
    'total', fy.total_spending,
    'spending', fy.total_spending,
    'revenue', fy.total_revenue,
    'spending_data', (
      SELECT jsonb_build_object(
        'name', 'Spending',
        'children', COALESCE(jsonb_agg(dept_tree.tree ORDER BY dept_tree.sort_order), '[]'::jsonb)
      )
      FROM (
        SELECT
          d.total_spending * 100.0 / fy.total_spending AS sort_order,
          jsonb_build_object(
            'name', d.name,
            'amount', d.total_spending,
            'children', (
              SELECT COALESCE(jsonb_agg(
                jsonb_build_object(
                  'name', si.name,
                  'amount', si.amount
                ) ORDER BY si.sort_order
              ), '[]'::jsonb)
              FROM spending_items si
              WHERE si.department_id = d.id AND si.parent_id IS NULL
            )
          ) AS tree
        FROM departments d
        WHERE d.financial_year_id = fy.id
      ) dept_tree
    ),
    'revenue_data', (
      SELECT jsonb_build_object(
        'name', 'Revenue',
        'children', COALESCE(jsonb_agg(
          jsonb_build_object(
            'name', ri.name,
            'amount', ri.amount
          ) ORDER BY ri.sort_order
        ), '[]'::jsonb)
      )
      FROM revenue_items ri
      WHERE ri.financial_year_id = fy.id AND ri.parent_id IS NULL
    )
  ) AS sankey_json
FROM financial_years fy
JOIN jurisdictions j ON fy.jurisdiction_id = j.id;

CREATE UNIQUE INDEX idx_mv_sankey_fy ON mv_sankey_data(financial_year_id);
CREATE INDEX idx_mv_sankey_slug ON mv_sankey_data(jurisdiction_slug);

-- Refresh command (run after data changes):
-- REFRESH MATERIALIZED VIEW CONCURRENTLY mv_sankey_data;
```

---

### Database Triggers & Functions

#### Auto-update Department Totals
```sql
CREATE OR REPLACE FUNCTION update_department_total()
RETURNS TRIGGER AS $$
BEGIN
  UPDATE departments
  SET
    total_spending = (
      SELECT COALESCE(SUM(amount), 0)
      FROM spending_items
      WHERE department_id = COALESCE(NEW.department_id, OLD.department_id)
        AND parent_id IS NULL
    ),
    updated_at = NOW()
  WHERE id = COALESCE(NEW.department_id, OLD.department_id);

  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_update_dept_total
AFTER INSERT OR UPDATE OR DELETE ON spending_items
FOR EACH ROW EXECUTE FUNCTION update_department_total();
```

#### Audit Trail Trigger
```sql
CREATE OR REPLACE FUNCTION log_data_change()
RETURNS TRIGGER AS $$
BEGIN
  IF TG_OP = 'UPDATE' THEN
    INSERT INTO data_change_audit (table_name, record_id, action, old_values, new_values, created_at)
    VALUES (
      TG_TABLE_NAME,
      OLD.id,
      'update',
      to_jsonb(OLD),
      to_jsonb(NEW),
      NOW()
    );
  ELSIF TG_OP = 'DELETE' THEN
    INSERT INTO data_change_audit (table_name, record_id, action, old_values, created_at)
    VALUES (
      TG_TABLE_NAME,
      OLD.id,
      'delete',
      to_jsonb(OLD),
      NOW()
    );
  END IF;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Apply to critical tables
CREATE TRIGGER trg_audit_financial_years
AFTER UPDATE OR DELETE ON financial_years
FOR EACH ROW EXECUTE FUNCTION log_data_change();

CREATE TRIGGER trg_audit_departments
AFTER UPDATE OR DELETE ON departments
FOR EACH ROW EXECUTE FUNCTION log_data_change();

CREATE TRIGGER trg_audit_spending_items
AFTER UPDATE OR DELETE ON spending_items
FOR EACH ROW EXECUTE FUNCTION log_data_change();
```

#### Notification Trigger
```sql
CREATE OR REPLACE FUNCTION queue_notification()
RETURNS TRIGGER AS $$
BEGIN
  -- Notify Sidekiq via pg_notify
  PERFORM pg_notify(
    'data_changed',
    json_build_object(
      'table', TG_TABLE_NAME,
      'record_id', NEW.id,
      'action', TG_OP,
      'timestamp', NOW()
    )::text
  );
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_notify_financial_years
AFTER UPDATE ON financial_years
FOR EACH ROW EXECUTE FUNCTION queue_notification();

CREATE TRIGGER trg_notify_departments
AFTER UPDATE ON departments
FOR EACH ROW EXECUTE FUNCTION queue_notification();
```

---

## Ruby on Rails Application Architecture

### Rails Stack

**Gemfile (key gems):**
```ruby
# Gemfile
source 'https://rubygems.org'
ruby '3.2.2'

# Core
gem 'rails', '~> 7.1.0'
gem 'pg', '~> 1.5'
gem 'puma', '~> 6.0'

# GraphQL
gem 'graphql', '~> 2.2'
gem 'graphql-batch'  # N+1 query optimization

# Authentication & Authorization
gem 'devise', '~> 4.9'
gem 'pundit', '~> 2.3'

# Background Jobs
gem 'sidekiq', '~> 7.2'
gem 'sidekiq-cron', '~> 1.12'  # Scheduled jobs

# Redis
gem 'redis', '~> 5.0'
gem 'hiredis', '~> 0.6'

# HTTP Client (for StatsCan API)
gem 'faraday', '~> 2.8'
gem 'faraday-retry'

# Performance
gem 'rack-cors'  # CORS for Next.js
gem 'bootsnap', require: false

# Development/Test
group :development, :test do
  gem 'rspec-rails'
  gem 'factory_bot_rails'
  gem 'faker'
  gem 'pry-rails'
end
```

### Authentication (Devise)

**User Model:**
```ruby
# app/models/user.rb
class User < ApplicationRecord
  devise :database_authenticatable, :registerable,
         :recoverable, :rememberable, :validatable

  enum role: { viewer: 0, editor: 1, admin: 2 }

  has_many :notification_subscriptions, dependent: :destroy

  def can_edit?
    editor? || admin?
  end

  def can_admin?
    admin?
  end
end
```

### Authorization (Pundit)

```ruby
# app/policies/financial_year_policy.rb
class FinancialYearPolicy < ApplicationPolicy
  def index?
    true  # Public read
  end

  def show?
    true  # Public read
  end

  def update?
    user&.can_edit?
  end

  def create?
    user&.can_admin?
  end

  def destroy?
    user&.can_admin?
  end
end
```

### ActiveRecord Models

```ruby
# app/models/jurisdiction.rb
class Jurisdiction < ApplicationRecord
  has_many :financial_years, dependent: :destroy
  has_one :statscan_mapping, class_name: 'JurisdictionStatscanMapping',
          primary_key: :slug, foreign_key: :jurisdiction_slug

  validates :slug, presence: true, uniqueness: true
  validates :name, presence: true
  validates :type, inclusion: { in: %w[provincial municipal] }

  def latest_year
    financial_years.order(year: :desc).first
  end

  def population
    statscan_mapping&.latest_population
  end

  def spending_per_capita
    return nil unless latest_year && population
    (latest_year.total_spending * 1_000_000_000) / population
  end
end

# app/models/financial_year.rb
class FinancialYear < ApplicationRecord
  belongs_to :jurisdiction
  has_many :departments, dependent: :destroy
  has_many :sources, dependent: :destroy
  has_many :revenue_items, dependent: :destroy

  validates :year, presence: true, uniqueness: { scope: :jurisdiction_id }

  after_update :refresh_sankey_cache, if: :saved_change_to_total_spending?
  after_update :send_notifications

  def sankey_data
    Rails.cache.fetch("sankey_#{id}", expires_in: 1.hour) do
      SankeyDataBuilder.new(self).build
    end
  end

  private

  def refresh_sankey_cache
    RefreshSankeyMaterializedViewJob.perform_async
  end

  def send_notifications
    DataChangeNotificationJob.perform_async('financial_year', id, 'updated')
  end
end

# app/models/department.rb
class Department < ApplicationRecord
  belongs_to :financial_year
  has_many :spending_items, dependent: :destroy
  has_one :content, class_name: 'DepartmentContent', dependent: :destroy

  validates :slug, presence: true, uniqueness: { scope: :financial_year_id }

  def root_spending_items
    spending_items.where(parent_id: nil).order(:sort_order)
  end
end

# app/models/spending_item.rb
class SpendingItem < ApplicationRecord
  belongs_to :department
  belongs_to :parent, class_name: 'SpendingItem', optional: true
  has_many :children, class_name: 'SpendingItem', foreign_key: :parent_id

  before_save :set_level_and_path

  private

  def set_level_and_path
    if parent
      self.level = parent.level + 1
      self.path = parent.path + [name.parameterize]
    else
      self.level = 0
      self.path = [name.parameterize]
    end
  end
end
```

### StatsCan Models

```ruby
# app/models/statscan/geography.rb
module Statscan
  class Geography < ApplicationRecord
    self.table_name = 'statscan_geography'

    has_many :population_records, class_name: 'Statscan::Population', foreign_key: :sgc_code, primary_key: :sgc_code
    has_many :gdp_records, class_name: 'Statscan::Gdp', foreign_key: :sgc_code, primary_key: :sgc_code

    def latest_population(year: Date.current.year)
      population_records.where(frequency: 'annual', age_group: nil, gender: nil)
                        .order(Arel.sql("ABS(EXTRACT(YEAR FROM ref_date) - #{year})"))
                        .first&.population
    end
  end
end

# app/models/statscan/population.rb
module Statscan
  class Population < ApplicationRecord
    self.table_name = 'statscan_population'

    belongs_to :geography, class_name: 'Statscan::Geography', foreign_key: :sgc_code, primary_key: :sgc_code
  end
end
```

---

## GraphQL API Schema

### Schema Definition

```ruby
# app/graphql/types/query_type.rb
module Types
  class QueryType < Types::BaseObject
    # Jurisdictions
    field :jurisdictions, [Types::JurisdictionType], null: false,
          description: "List all jurisdictions (provinces and municipalities)"

    field :jurisdiction, Types::JurisdictionType, null: false do
      argument :slug, String, required: true
    end

    # Financial Years
    field :financial_year, Types::FinancialYearType, null: false do
      argument :jurisdiction_slug, String, required: true
      argument :year, String, required: false
      description "Get financial year data. If year is not provided, returns latest year."
    end

    # Departments
    field :department, Types::DepartmentType, null: false do
      argument :jurisdiction_slug, String, required: true
      argument :department_slug, String, required: true
      argument :year, String, required: false
    end

    # StatsCan Data
    field :population_data, [Types::PopulationDataType], null: false do
      argument :sgc_code, String, required: true
      argument :years, Integer, required: false, default_value: 5
    end

    # Resolvers
    def jurisdictions
      Jurisdiction.all.order(:name)
    end

    def jurisdiction(slug:)
      Jurisdiction.find_by!(slug: slug)
    end

    def financial_year(jurisdiction_slug:, year: nil)
      jurisdiction = Jurisdiction.find_by!(slug: jurisdiction_slug)
      query = jurisdiction.financial_years

      if year
        query.find_by!(year: year)
      else
        query.order(year: :desc).first!
      end
    end

    def department(jurisdiction_slug:, department_slug:, year: nil)
      fy = financial_year(jurisdiction_slug: jurisdiction_slug, year: year)
      fy.departments.find_by!(slug: department_slug)
    end

    def population_data(sgc_code:, years:)
      Statscan::Population.where(sgc_code: sgc_code, frequency: 'annual', age_group: nil, gender: nil)
                          .order(ref_date: :desc)
                          .limit(years)
    end
  end
end
```

### GraphQL Types

```ruby
# app/graphql/types/jurisdiction_type.rb
module Types
  class JurisdictionType < Types::BaseObject
    field :id, ID, null: false
    field :slug, String, null: false
    field :name, String, null: false
    field :type, String, null: false
    field :province, String, null: true

    # Associations
    field :financial_years, [Types::FinancialYearType], null: false
    field :latest_year, Types::FinancialYearType, null: true

    # StatsCan Enrichment
    field :sgc_code, String, null: true
    field :population, Integer, null: true
    field :spending_per_capita, Float, null: true

    def latest_year
      object.latest_year
    end

    def sgc_code
      object.statscan_mapping&.sgc_code
    end

    def population
      BatchLoader::GraphQL.for(object.id).batch do |ids, loader|
        Jurisdiction.includes(statscan_mapping: :geography).where(id: ids).each do |j|
          pop = j.statscan_mapping&.geography&.latest_population
          loader.call(j.id, pop)
        end
      end
    end

    def spending_per_capita
      object.spending_per_capita
    end
  end
end

# app/graphql/types/financial_year_type.rb
module Types
  class FinancialYearType < Types::BaseObject
    field :id, ID, null: false
    field :year, String, null: false
    field :total_spending, Float, null: false
    field :total_revenue, Float, null: true
    field :total_employees, Integer, null: true
    field :net_debt, Float, null: true
    field :total_debt, Float, null: true
    field :debt_interest, Float, null: true
    field :methodology, String, null: true
    field :credits, String, null: true

    field :departments, [Types::DepartmentType], null: false
    field :sources, [Types::SourceType], null: false
    field :sankey_data, GraphQL::Types::JSON, null: true

    def sankey_data
      object.sankey_data
    end
  end
end

# app/graphql/types/department_type.rb
module Types
  class DepartmentType < Types::BaseObject
    field :id, ID, null: false
    field :slug, String, null: false
    field :name, String, null: false
    field :total_spending, Float, null: false
    field :percentage, Float, null: true

    field :spending_items, [Types::SpendingItemType], null: false do
      argument :parent_id, ID, required: false
    end

    field :content, Types::DepartmentContentType, null: true

    def spending_items(parent_id: nil)
      if parent_id
        object.spending_items.where(parent_id: parent_id).order(:sort_order)
      else
        object.root_spending_items
      end
    end

    def content
      object.content
    end
  end
end

# app/graphql/types/spending_item_type.rb
module Types
  class SpendingItemType < Types::BaseObject
    field :id, ID, null: false
    field :name, String, null: false
    field :amount, Float, null: true
    field :level, Integer, null: false
    field :parent_id, ID, null: true

    # Recursive children
    field :children, [Types::SpendingItemType], null: false

    def children
      BatchLoader::GraphQL.for(object.id).batch(default_value: []) do |ids, loader|
        SpendingItem.where(parent_id: ids).order(:sort_order).each do |item|
          loader.call(item.parent_id) { |memo| memo << item }
        end
      end
    end
  end
end
```

### Mutations

```ruby
# app/graphql/types/mutation_type.rb
module Types
  class MutationType < Types::BaseObject
    field :update_financial_year, mutation: Mutations::UpdateFinancialYear
    field :create_department, mutation: Mutations::CreateDepartment
    field :subscribe_to_notifications, mutation: Mutations::SubscribeToNotifications
  end
end

# app/graphql/mutations/update_financial_year.rb
module Mutations
  class UpdateFinancialYear < BaseMutation
    argument :id, ID, required: true
    argument :total_spending, Float, required: false
    argument :total_revenue, Float, required: false
    argument :net_debt, Float, required: false

    field :financial_year, Types::FinancialYearType, null: true
    field :errors, [String], null: false

    def resolve(id:, **attributes)
      financial_year = FinancialYear.find(id)

      # Authorization
      raise Pundit::NotAuthorizedError unless FinancialYearPolicy.new(context[:current_user], financial_year).update?

      if financial_year.update(attributes)
        { financial_year: financial_year, errors: [] }
      else
        { financial_year: nil, errors: financial_year.errors.full_messages }
      end
    end
  end
end
```

### Example GraphQL Queries (Client-Side)

```graphql
# Get jurisdiction with latest year
query GetJurisdiction($slug: String!) {
  jurisdiction(slug: $slug) {
    id
    name
    slug
    type
    population
    spendingPerCapita
    latestYear {
      year
      totalSpending
      totalRevenue
      netDebt
      departments {
        id
        slug
        name
        totalSpending
        percentage
      }
      sankeyData
    }
  }
}

# Get department with nested spending (3 levels)
query GetDepartment($jurisdictionSlug: String!, $departmentSlug: String!) {
  department(
    jurisdictionSlug: $jurisdictionSlug
    departmentSlug: $departmentSlug
  ) {
    id
    name
    totalSpending
    spendingItems {
      id
      name
      amount
      children {
        id
        name
        amount
        children {
          id
          name
          amount
        }
      }
    }
    content {
      introText
      descriptionText
      programsDescription
    }
  }
}

# Get population trends
query GetPopulationTrends($sgcCode: String!, $years: Int!) {
  populationData(sgcCode: $sgcCode, years: $years) {
    refDate
    population
  }
}

# Update financial year (admin only)
mutation UpdateFinancialYear($id: ID!, $totalSpending: Float!) {
  updateFinancialYear(id: $id, totalSpending: $totalSpending) {
    financialYear {
      id
      totalSpending
    }
    errors
  }
}
```

---

## Sidekiq Background Jobs

### StatsCan Data Sync Jobs

```ruby
# app/jobs/statscan/sync_population_job.rb
module Statscan
  class SyncPopulationJob < ApplicationJob
    queue_as :statscan_sync

    def perform(product_id = '1710000501', years = 10)
      sync_job = DataSyncJob.create!(
        sync_type: 'population',
        product_id: product_id,
        start_time: Time.current,
        status: 'running'
      )

      begin
        fetcher = Statscan::PopulationFetcher.new(product_id)
        results = fetcher.fetch_latest(years)

        sync_job.update!(
          records_fetched: results[:fetched],
          records_inserted: results[:inserted],
          records_updated: results[:updated],
          errors: results[:errors],
          status: results[:errors].zero? ? 'success' : 'partial',
          end_time: Time.current
        )

        Rails.logger.info "StatsCan population sync completed: #{results.inspect}"
      rescue => e
        sync_job.update!(
          status: 'failed',
          error_details: e.message,
          end_time: Time.current
        )
        raise
      end
    end
  end
end

# app/services/statscan/population_fetcher.rb
module Statscan
  class PopulationFetcher
    BASE_URL = 'https://www150.statcan.gc.ca/t1/wds/rest'

    def initialize(product_id)
      @product_id = product_id
    end

    def fetch_latest(years)
      vectors = province_vectors  # Mapping of SGC codes to vector IDs
      results = { fetched: 0, inserted: 0, updated: 0, errors: 0 }

      vectors.each do |sgc_code, vector_id|
        begin
          data = fetch_vector_data(vector_id, years)

          data.each do |point|
            pop = Statscan::Population.find_or_initialize_by(
              sgc_code: sgc_code,
              ref_date: point['refPer'],
              frequency: 'annual',
              age_group: nil,
              gender: nil
            )

            pop.assign_attributes(
              population: point['value'].to_i,
              vector_id: vector_id,
              statscan_update_date: Time.current
            )

            if pop.new_record?
              pop.save!
              results[:inserted] += 1
            elsif pop.changed?
              pop.save!
              results[:updated] += 1
            end

            results[:fetched] += 1
          end
        rescue => e
          Rails.logger.error "Error fetching vector #{vector_id}: #{e.message}"
          results[:errors] += 1
        end
      end

      results
    end

    private

    def fetch_vector_data(vector_id, periods)
      conn = Faraday.new(url: BASE_URL) do |f|
        f.request :json
        f.response :json
        f.adapter Faraday.default_adapter
      end

      response = conn.post('getDataFromVectorsAndLatestNPeriods') do |req|
        req.body = [{ vectorId: vector_id, latestN: periods }]
      end

      if response.success? && response.body['status'] == 'SUCCESS'
        response.body.dig('object', 0, 'vectorDataPoint') || []
      else
        raise "StatsCan API error: #{response.body}"
      end
    end

    def province_vectors
      {
        '35' => 'v466670',  # Ontario
        '59' => 'v466730',  # BC
        '48' => 'v466710',  # Alberta
        # Add more provinces...
      }
    end
  end
end
```

### Scheduled Jobs (Cron)

```ruby
# config/initializers/sidekiq_cron.rb
require 'sidekiq/cron/job'

Sidekiq::Cron::Job.create(
  name: 'Sync StatsCan Population - Monthly',
  cron: '0 2 1 * *',  # 2 AM on 1st of each month
  class: 'Statscan::SyncPopulationJob'
)

Sidekiq::Cron::Job.create(
  name: 'Sync StatsCan GDP - Quarterly',
  cron: '0 3 1 */3 *',  # 3 AM on 1st of every 3 months
  class: 'Statscan::SyncGdpJob'
)

Sidekiq::Cron::Job.create(
  name: 'Refresh Sankey Materialized View - Daily',
  cron: '0 4 * * *',  # 4 AM daily
  class: 'RefreshSankeyMaterializedViewJob'
)
```

### Notification Jobs

```ruby
# app/jobs/data_change_notification_job.rb
class DataChangeNotificationJob < ApplicationJob
  queue_as :notifications

  def perform(entity_type, entity_id, event_type)
    entity = find_entity(entity_type, entity_id)
    subscriptions = NotificationSubscription.active
                      .where(entity_type: [entity_type, 'all'])
                      .where('? = ANY(event_types)', event_type)

    subscriptions.find_each do |subscription|
      next unless subscription.should_notify?(entity)

      case subscription.notification_method
      when 'email'
        send_email_notification(subscription, entity, event_type)
      when 'webhook'
        send_webhook_notification(subscription, entity, event_type)
      end
    end
  end

  private

  def send_email_notification(subscription, entity, event_type)
    log = NotificationLog.create!(
      subscription: subscription,
      entity_type: entity.class.name,
      entity_id: entity.id,
      event_type: event_type,
      change_summary: build_change_summary(entity),
      status: 'pending'
    )

    begin
      DataChangeMailer.notify(subscription.user, entity, event_type).deliver_now
      log.update!(status: 'sent', sent_at: Time.current)
    rescue => e
      log.update!(status: 'failed', error_message: e.message)
      raise
    end
  end

  def find_entity(type, id)
    type.constantize.find(id)
  end

  def build_change_summary(entity)
    # Get changes from audit log
    audit = DataChangeAudit.where(table_name: entity.class.table_name, record_id: entity.id)
                           .order(created_at: :desc).first

    return {} unless audit

    {
      changes: audit.new_values.except('updated_at').select { |k, v|
        audit.old_values[k] != v
      }
    }
  end
end

# app/mailers/data_change_mailer.rb
class DataChangeMailer < ApplicationMailer
  def notify(user, entity, event_type)
    @user = user
    @entity = entity
    @event_type = event_type
    @changes_url = "#{ENV['FRONTEND_URL']}/#{entity.class.table_name}/#{entity.id}"

    mail(
      to: user.email,
      subject: "Data Updated: #{entity_name(entity)}"
    )
  end

  private

  def entity_name(entity)
    case entity
    when FinancialYear
      "#{entity.jurisdiction.name} #{entity.year}"
    when Department
      entity.name
    else
      entity.to_s
    end
  end
end
```

### Email Template

```erb

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <style>
    body { font-family: Arial, sans-serif; line-height: 1.6; }
    .container { max-width: 600px; margin: 0 auto; padding: 20px; }
    .header { background: #2563eb; color: white; padding: 20px; }
    .content { padding: 20px; background: #f9fafb; }
    .button { display: inline-block; background: #2563eb; color: white;
              padding: 12px 24px; text-decoration: none; border-radius: 4px; }
  </style>
</head>
<body>
  <div class="container">
    <div class="header">
      <h1>CanadaSpends Data Update</h1>
    </div>
    <div class="content">
      <p>Hello <%= @user.email %>,</p>

      <p>
        Data has been updated for <strong><%= entity_name(@entity) %></strong>.
      </p>

      <p>Event: <strong><%= @event_type.titleize %></strong></p>

      <p>
        <a href="<%= @changes_url %>" class="button">View Changes</a>
      </p>

      <p>
        <small>
          You're receiving this because you subscribed to notifications for this data.
          <a href="<%= ENV['FRONTEND_URL'] %>/profile/notifications">Manage subscriptions</a>
        </small>
      </p>
    </div>
  </div>
</body>
</html>
```

---

## Migration Strategy

### Phase 1: AWS Infrastructure Setup (Week 1)
1. **Launch EC2 Instance**
   - Launch t3.medium Ubuntu 22.04 instance
   - Allocate and attach Elastic IP
   - Configure security group (ports 80, 443, 22)
   - SSH in and update system: `sudo apt update && sudo apt upgrade -y`

2. **Install Software Stack**
   ```bash
   # Install Ruby 3.2
   sudo apt install -y ruby-full build-essential git

   # Install PostgreSQL client
   sudo apt install -y postgresql-client

   # Install Redis locally
   sudo apt install -y redis-server
   sudo systemctl enable redis-server

   # Install Nginx
   sudo apt install -y nginx

   # Install Certbot for Let's Encrypt SSL
   sudo apt install -y certbot python3-certbot-nginx
   ```

3. **Launch RDS PostgreSQL**
   - Launch db.t3.small PostgreSQL 15 instance
   - **Enable public access** in RDS settings
   - Configure security group (port 5432 from 0.0.0.0/0)
   - Create initial database `canadaspends`
   - Create admin user and read-only user
   - Enable automated backups (7-day retention)

4. **Configure DNS**
   - Point your domain (e.g., `api.canadaspends.ca`) to Elastic IP
   - Wait for DNS propagation

5. **Set up SSL**
   ```bash
   sudo certbot --nginx -d api.canadaspends.ca
   # Auto-renews via cron
   ```

### Phase 2: Rails Application Setup (Week 2)
5. **Initialize Rails Project**
   ```bash
   rails new canadaspends-api --api --database=postgresql
   cd canadaspends-api
   ```

6. **Database Migrations**
   - Create all table migrations (jurisdictions, financial_years, etc.)
   - Create StatsCan tables
   - Create notification tables
   - Add indexes, triggers, views

7. **Configure Environment**
   ```bash
   # .env.production
   DATABASE_URL=postgresql://user:pass@rds-endpoint:5432/canadaspends
   REDIS_URL=redis://elasticache-endpoint:6379/0
   AWS_REGION=us-east-1
   SES_SMTP_ADDRESS=email-smtp.us-east-1.amazonaws.com
   FRONTEND_URL=https://www.canadaspends.ca
   ```

### Phase 3: Data Migration (Week 2-3)
8. **Create Migration Scripts**
   ```ruby
   # lib/tasks/migrate_json_data.rake
   namespace :data do
     desc "Migrate JSON files to PostgreSQL"
     task migrate: :environment do
       DataMigrator.new.migrate_all
     end
   end

   # app/services/data_migrator.rb
   class DataMigrator
     def migrate_all
       migrate_jurisdictions
       migrate_financial_years
       migrate_departments
       migrate_spending_items
       validate_migration
     end

     def migrate_jurisdictions
       Dir.glob('data/provincial/*/').each do |path|
         slug = File.basename(path)
         # Parse summary.json, create Jurisdiction record
       end
     end

     # ... (detailed implementation)
   end
   ```

9. **Flatten Hierarchical Data**
   ```ruby
   def migrate_spending_items(department, spending_data, parent = nil, level = 0)
     spending_data['children']&.each_with_index do |item, index|
       spending_item = department.spending_items.create!(
         parent: parent,
         name: item['name'],
         amount: item['amount'],
         level: level,
         sort_order: index
       )

       # Recursively process children
       if item['children'].present?
         migrate_spending_items(department, item, spending_item, level + 1)
       end
     end
   end
   ```

10. **Validate Data Integrity**
    - Compare totals from JSON vs. database
    - Check all relationships exist
    - Verify no data loss

### Phase 4: StatsCan Integration (Week 3)
11. **Populate Geographic Reference Data**
    - Load SGC 2021 codes into `statscan_geography`
    - Create jurisdiction mappings

12. **Initial StatsCan Data Sync**
    - Run population sync for last 10 years
    - Run GDP sync
    - Run government spending sync (for validation)

13. **Test Scheduled Jobs**
    - Verify Sidekiq cron jobs work
    - Check data freshness logic

### Phase 5: GraphQL API Development (Week 4)
14. **Implement GraphQL Schema**
    - All query types
    - Mutations for admin operations
    - Optimize with graphql-batch

15. **Add Authentication Middleware**
    ```ruby
    # app/graphql/canadaspends_schema.rb
    class CanadaSpendsSchema < GraphQL::Schema
      mutation(Types::MutationType)
      query(Types::QueryType)

      def self.unauthorized_object(error)
        raise GraphQL::ExecutionError, "Unauthorized"
      end
    end

    # app/controllers/graphql_controller.rb
    class GraphqlController < ApplicationController
      def execute
        context = {
          current_user: current_user
        }

        result = CanadaSpendsSchema.execute(
          params[:query],
          variables: params[:variables],
          context: context
        )

        render json: result
      end

      private

      def current_user
        token = request.headers['Authorization']&.split(' ')&.last
        return nil unless token

        # Verify JWT or session
        User.find_by(id: decode_token(token))
      end
    end
    ```

### Phase 6: Deployment & Configuration (Week 5)
16. **Deploy Rails to EC2**
    - Use Capistrano for git-based deployment
    - Configure Nginx reverse proxy (already set up with Certbot)
    - Configure Puma as systemd service
    - Set up Sidekiq as systemd service

17. **Nginx Configuration**
    ```nginx
    # /etc/nginx/sites-available/canadaspends-api
    upstream puma {
      server unix:///home/deploy/canadaspends-api/shared/tmp/sockets/puma.sock;
    }

    server {
      listen 80;
      server_name api.canadaspends.ca;
      return 301 https://$server_name$request_uri;
    }

    server {
      listen 443 ssl http2;
      server_name api.canadaspends.ca;

      ssl_certificate /etc/letsencrypt/live/api.canadaspends.ca/fullchain.pem;
      ssl_certificate_key /etc/letsencrypt/live/api.canadaspends.ca/privkey.pem;

      root /home/deploy/canadaspends-api/current/public;

      location / {
        proxy_pass http://puma;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
      }
    }
    ```

18. **Health Monitoring**
    - Add `/health` endpoint in Rails
    - Monitor with CloudWatch (optional)
    - Or use external monitoring (UptimeRobot, Pingdom)

### Phase 7: Next.js Frontend Updates (Week 5-6)
19. **Install Apollo Client**
    ```bash
    pnpm add @apollo/client graphql
    ```

20. **Replace Data Fetching**
    ```typescript
    // src/lib/apollo-client.ts
    import { ApolloClient, InMemoryCache, HttpLink } from '@apollo/client'

    export const apolloClient = new ApolloClient({
      link: new HttpLink({
        uri: process.env.NEXT_PUBLIC_GRAPHQL_URL,
      }),
      cache: new InMemoryCache(),
    })

    // src/app/[lang]/(main)/[jurisdiction]/page.tsx
    import { gql } from '@apollo/client'
    import { apolloClient } from '@/lib/apollo-client'

    const GET_JURISDICTION = gql`
      query GetJurisdiction($slug: String!) {
        jurisdiction(slug: $slug) {
          id
          name
          population
          latestYear {
            totalSpending
            departments { id name totalSpending }
          }
        }
      }
    `

    export default async function JurisdictionPage({ params }) {
      const { data } = await apolloClient.query({
        query: GET_JURISDICTION,
        variables: { slug: params.jurisdiction },
      })

      return <JurisdictionView data={data.jurisdiction} />
    }
    ```

### Phase 8: Admin Dashboard (Week 6)
21. **Build Admin UI**
    - Option A: Rails Admin gem (quick)
    - Option B: Next.js admin routes + GraphQL mutations (flexible)
    - Features: CRUD for all entities, data import, user management

22. **Notification Subscription UI**
    - User profile page
    - Subscribe/unsubscribe to entity changes
    - Email preferences

### Phase 9: Testing & Optimization (Week 7)
23. **Performance Testing**
    - Load test GraphQL endpoint (k6, Artillery)
    - Optimize slow queries
    - Add database indexes as needed

24. **Caching Implementation**
    - Rails.cache for expensive queries
    - HTTP caching headers
    - CDN configuration for GraphQL (if applicable)

25. **Monitoring Setup**
    - CloudWatch dashboards
    - Error tracking (Sentry, Rollbar)
    - Uptime monitoring

### Phase 10: Launch & Monitoring (Week 8)
26. **Final Testing**
    - Full QA testing on EC2 instance
    - Test all GraphQL queries
    - Verify StatsCan sync jobs running
    - Test email notifications

27. **Production Launch**
    - Ensure DNS is pointed correctly
    - Update Next.js env vars (`NEXT_PUBLIC_GRAPHQL_URL`)
    - Deploy Next.js frontend with new API endpoint
    - Monitor for errors

28. **Post-Launch Monitoring**
    - Monitor EC2 CPU/memory usage
    - Monitor RDS connections and queries
    - Check Sidekiq job success rates
    - Review email delivery stats
    - Gather user feedback
    - Optimize slow queries

---

## Timeline Estimate

| Phase | Duration | Tasks |
|-------|----------|-------|
| **Phase 1: AWS Setup** | 1 week | EC2 launch, RDS setup, software install, SSL |
| **Phase 2: Rails App** | 1 week | Init project, migrations, models |
| **Phase 3: Data Migration** | 1-2 weeks | Scripts, flatten hierarchies, validate |
| **Phase 4: StatsCan Integration** | 1 week | Sync jobs, populate data |
| **Phase 5: GraphQL API** | 1 week | Schema, resolvers, auth |
| **Phase 6: Deployment** | 1 week | Capistrano, Nginx, systemd services |
| **Phase 7: Frontend Updates** | 1-2 weeks | Apollo Client, replace data layer |
| **Phase 8: Admin Dashboard** | 1 week | CRUD UI, subscriptions |
| **Phase 9: Testing** | 1 week | Performance, optimization |
| **Phase 10: Launch** | 1 week | Final testing, production deploy |
| **Total** | **10-12 weeks** | **(2.5-3 months)** |

---

## Cost-Benefit Analysis

### Costs

**Development:**
- Developer time: 10-12 weeks at ~$100/hr = $40,000-48,000
- Or internal team allocation

**Infrastructure (Monthly):**
- AWS: ~$60/month (~$36 with Reserved Instances)
- Email (Resend/Postmark): ~$0-5/month (free tier or minimal)
- **Total: ~$60/month**

**Annual Infrastructure:**
- ~$720/year (on-demand)
- ~$432/year (Reserved Instances)

---

## Risks & Mitigation

| Risk | Impact | Probability | Mitigation |
|------|--------|-------------|------------|
| **Migration data loss** | High | Medium | Thorough validation, keep JSON backups, staged rollout |
| **AWS costs exceed budget** | Medium | Low | Use Reserved Instances, monitor with CloudWatch Alarms, set billing alerts |
| **StatsCan API rate limits** | Medium | Medium | Respect 25 req/sec limit, implement exponential backoff, cache aggressively |
| **GraphQL N+1 queries** | High | Medium | Use graphql-batch, database query monitoring, add indexes |
| **Security breach** | High | Low | Devise + Pundit auth, security groups, regular updates, penetration testing |
| **Email deliverability** | Low | Medium | Use AWS SES verified domain, DKIM/SPF records, monitor bounce rates |
| **Timeline overruns** | Medium | Medium | Prioritize MVP features, defer nice-to-haves, agile sprints |

Service	Configuration	Monthly Cost
EC2 (Single instance)	t3.medium (2 vCPU, 4GB RAM)	~$30
RDS PostgreSQL	db.t3.small, 50GB, Single-AZ	~$25
Elastic IP	1 address (free when attached)	$0
Data Transfer	~20GB egress (modest usage)	~$2
Snapshots/Backups	~10GB RDS backups	~$1
Email (Resend/Postmark)	Free tier or ~$1-5	~$2
CloudWatch (optional)	Basic monitoring	$0-$2
Total		~$60/month

Phase	Duration	Tasks
Phase 1: AWS Setup	1 week	EC2 launch, RDS setup, software install, SSL
Phase 2: Rails App	1 week	Init project, migrations, models
Phase 3: Data Migration	1-2 weeks	Scripts, flatten hierarchies, validate
Phase 4: StatsCan Integration	1 week	Sync jobs, populate data
Phase 5: GraphQL API	1 week	Schema, resolvers, auth
Phase 6: Deployment	1 week	Capistrano, Nginx, systemd services
Phase 7: Frontend Updates	1-2 weeks	Apollo Client, replace data layer
Phase 8: Admin Dashboard	1 week	CRUD UI, subscriptions
Phase 9: Testing	1 week	Performance, optimization
Phase 10: Launch	1 week	Final testing, production deploy
Total	10-12 weeks	(2.5-3 months)

Risk	Impact	Probability	Mitigation
Migration data loss	High	Medium	Thorough validation, keep JSON backups, staged rollout
AWS costs exceed budget	Medium	Low	Use Reserved Instances, monitor with CloudWatch Alarms, set billing alerts
StatsCan API rate limits	Medium	Medium	Respect 25 req/sec limit, implement exponential backoff, cache aggressively
GraphQL N+1 queries	High	Medium	Use graphql-batch, database query monitoring, add indexes
Security breach	High	Low	Devise + Pundit auth, security groups, regular updates, penetration testing
Email deliverability	Low	Medium	Use AWS SES verified domain, DKIM/SPF records, monitor bounce rates
Timeline overruns	Medium	Medium	Prioritize MVP features, defer nice-to-haves, agile sprints

Database Migration: Move from JSON Files to AWS EC2 on Ruby on Rails #245

Description

Database Migration: JSON Files → PostgreSQL (AWS RDS) + Ruby on Rails GraphQL API + StatsCan Integration

Overview

Current Architecture Analysis

Data Storage

Current Data Entities

Current Tech Stack

Limitations

Architecture Decision: AWS RDS + Ruby on Rails + GraphQL

Why This Stack?

PostgreSQL on AWS RDS

Ruby on Rails 7.1

GraphQL API (graphql-ruby gem)

AWS EC2 + Auto Scaling

Sidekiq (Background Jobs)

AWS SES (Email)

PostgreSQL Advantages

AWS Infrastructure Architecture (Simplified)

High-Level Diagram

Infrastructure Components

Single EC2 Instance (All-in-One)

RDS PostgreSQL (Public Access)

Redis (Local or AWS)

Email Service (NOT AWS SES)

DNS (Managed Externally)

CloudWatch (Optional)

Security Configuration (Simplified)

Estimated AWS Costs (Monthly, USD)

Database Schema

Core CanadaSpends Tables

jurisdictions

financial_years

sources

departments

spending_items (Adjacency List for Hierarchy)

revenue_items

department_content (Editorial CMS)

Statistics Canada Integration Tables

statscan_geography

statscan_population

statscan_gdp

statscan_govt_spending

statscan_census_demographics

statscan_vectors

statscan_products

jurisdiction_statscan_mapping

Notification System Tables

users (Devise Authentication)

notification_subscriptions

notification_logs

data_change_audit

data_sync_jobs

Database Views & Materialized Views

v_jurisdiction_enriched (Real-time View)

mv_sankey_data (Materialized View - Cached)

Database Triggers & Functions

Auto-update Department Totals

Audit Trail Trigger

Notification Trigger

Ruby on Rails Application Architecture

Rails Stack

Authentication (Devise)

Authorization (Pundit)

ActiveRecord Models

StatsCan Models

GraphQL API Schema

Schema Definition

GraphQL Types

Mutations

Example GraphQL Queries (Client-Side)

Sidekiq Background Jobs

StatsCan Data Sync Jobs

Scheduled Jobs (Cron)

Notification Jobs

Email Template

Migration Strategy

Phase 1: AWS Infrastructure Setup (Week 1)

Phase 2: Rails Application Setup (Week 2)

Phase 3: Data Migration (Week 2-3)

`jurisdictions`

`financial_years`

`sources`

`departments`

`spending_items` (Adjacency List for Hierarchy)

`revenue_items`

`department_content` (Editorial CMS)

`statscan_geography`

`statscan_population`

`statscan_gdp`

`statscan_govt_spending`

`statscan_census_demographics`

`statscan_vectors`

`statscan_products`

`jurisdiction_statscan_mapping`

`users` (Devise Authentication)

`notification_subscriptions`

`notification_logs`

`data_change_audit`

`data_sync_jobs`

`v_jurisdiction_enriched` (Real-time View)

`mv_sankey_data` (Materialized View - Cached)