-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Database Migration: JSON Files → PostgreSQL (AWS RDS) + Ruby on Rails GraphQL API + StatsCan Integration
Overview
Migrate CanadaSpends from file-based JSON data storage to a professional full-stack architecture:
Infrastructure:
- Database: PostgreSQL 15 on AWS RDS (Multi-AZ)
- Backend API: Ruby on Rails 7.1 with GraphQL
- Frontend: Next.js 15 (existing, updated to consume GraphQL)
- Compute: AWS EC2 with Auto Scaling
- Background Jobs: Sidekiq + Redis (ElastiCache)
- Load Balancing: Application Load Balancer (ALB)
- Emails: AWS SES for notifications
New Capabilities:
- ✅ Real-time data updates without rebuilding the site
- ✅ GraphQL API for flexible data querying
- ✅ Statistics Canada (StatsCan) integration for population, GDP, census data
- ✅ Email notification system when data is updated/corrected
- ✅ Admin dashboard for data management
- ✅ Automated StatsCan data syncing (Sidekiq cron jobs)
- ✅ Better scalability and data integrity
- ✅ Professional production-grade architecture
Current Architecture Analysis
Data Storage
- Format: JSON/CSV files in
/datadirectory - Structure: Hierarchical (Province → Year → Department → Programs → Line Items)
- Size: ~28 departments per province, multiple years, deeply nested spending data
- Flow: Build-time file reading (
fs.readFileSync()) → Static Site Generation (SSG)
Current Data Entities
- Jurisdictions (provinces/municipalities)
- Financial Years (year-specific totals and metadata)
- Departments/Ministries (per jurisdiction-year)
- Spending Items (hierarchical tree, 3-5 levels deep)
- Revenue Items (hierarchical breakdown)
- Editorial Content (markdown descriptions, methodology, credits)
Current Tech Stack
- Next.js 15 (App Router) + React 19 + TypeScript
- Data visualization: D3.js (Sankey) + Recharts
- No current database or API layer
- Server Components with file system reads
Limitations
- ❌ Data updates require full site rebuild
- ❌ No ability to query or filter data dynamically
- ❌ No population/economic context (per-capita calculations return null)
- ❌ Manual data entry process
- ❌ No notifications when data changes
- ❌ Doesn't scale with data growth
Architecture Decision: AWS RDS + Ruby on Rails + GraphQL
Why This Stack?
PostgreSQL on AWS RDS
- ✅ Fully managed, production-grade PostgreSQL 15
- ✅ Automatic backups, point-in-time recovery (PITR)
- ✅ Multi-AZ deployment for 99.95% uptime SLA
- ✅ Scalable (start db.t3.medium, scale to db.m5.large+)
- ✅ VPC isolation for security
- ✅ No vendor lock-in (standard Postgres, can export/migrate)
- ✅ Read replicas for heavy analytics
- ✅ Excellent for complex queries (recursive CTEs, window functions, JSONB)
Ruby on Rails 7.1
- ✅ Mature, battle-tested web framework (18+ years)
- ✅ Convention over configuration → rapid development
- ✅ ActiveRecord ORM → elegant database queries
- ✅ Built-in authentication (Devise gem)
- ✅ Background jobs (ActiveJob + Sidekiq)
- ✅ Large ecosystem of gems
- ✅ Strong MVC patterns
- ✅ Easy deployment (Capistrano, Docker)
GraphQL API (graphql-ruby gem)
- ✅ Flexible querying (clients request exactly what they need)
- ✅ Perfect for hierarchical spending data (nested queries)
- ✅ Single endpoint (vs. dozens of REST routes)
- ✅ Type-safe schema with introspection
- ✅ Excellent Next.js integration (Apollo Client, urql)
- ✅ Reduces over-fetching and under-fetching
- ✅ Built-in query batching and caching
AWS EC2 + Auto Scaling
- ✅ Full control over application server
- ✅ Auto Scaling Groups scale with traffic
- ✅ Load Balancer distributes traffic
- ✅ Standard deployment target
- ✅ Can run Rails + Sidekiq workers
Sidekiq (Background Jobs)
- ✅ Efficient multi-threaded job processing
- ✅ Perfect for StatsCan API syncing (scheduled jobs)
- ✅ Email notifications via ActiveMailer
- ✅ Retry logic, job monitoring
- ✅ Web UI for job inspection
AWS SES (Email)
- ✅ Transactional email service
- ✅ High deliverability rates
- ✅ Pay-as-you-go ($0.10 per 1,000 emails)
- ✅ SMTP integration with ActionMailer
PostgreSQL Advantages
- JSONB columns: Store nested spending trees efficiently
- Recursive CTEs: Query hierarchical data elegantly
- Materialized views: Cache complex aggregations (Sankey data)
- Full-text search: Search spending items (future feature)
- Triggers & functions: Auto-update totals, send notifications
- Window functions: Advanced analytics (year-over-year growth, rankings)
- PostGIS extension: Geographic analysis (future)
AWS Infrastructure Architecture (Simplified)
High-Level Diagram
┌──────────────────────────────────────────────────────────┐
│ Internet │
└────────────────────┬─────────────────────────────────────┘
│
┌────────────┴────────────┐
│ │
▼ ▼
┌───────────────────┐ ┌─────────────────────────┐
│ Next.js App │ │ Single EC2 Instance │
│ (Vercel/CDN) │ │ (your-domain.com) │
│ │ │ │
│ - SSR/SSG │ │ Rails 7 + GraphQL │
│ - Apollo Client │───┤ + Sidekiq (same box) │
│ │ │ + Nginx/Puma │
└───────────────────┘ │ + Redis (local) │
└──────────┬──────────────┘
│
┌──────────────┼──────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌─────────┐ ┌────────────┐
│ RDS │ │ Redis │ │ StatsCan │
│ Postgres │ │ (local) │ │ API │
│ │ │ OR │ │ (External) │
│ Public │ │MemoryS- │ └────────────┘
│ Access │ │ tore │
└──────────┘ └─────────┘
Infrastructure Components
Single EC2 Instance (All-in-One)
- Instance Type: t3.medium (2 vCPU, 4GB RAM) or t3.large (2 vCPU, 8GB RAM)
- OS: Ubuntu 22.04 LTS
- Software Stack:
- Ruby 3.2 + Rails 7.1
- GraphQL API (graphql-ruby)
- Nginx (reverse proxy, SSL termination with Let's Encrypt)
- Puma (Rails app server)
- Sidekiq (background jobs)
- Redis (local install or in-memory)
- Public IP: Elastic IP for stable DNS pointing
- Storage: 50GB SSD (expand as needed)
- Deployment: Git-based (Capistrano) or Docker
RDS PostgreSQL (Public Access)
- Instance Type: db.t3.small (2 vCPU, 2GB RAM) → start, scale to db.t3.medium if needed
- Storage: 50GB GP3 SSD, auto-scaling to 200GB
- Multi-AZ: Optional (adds cost, but provides failover)
- Backup: Automated daily snapshots, 7-day retention
- Version: PostgreSQL 15.x
- Public Access: YES - accessible from internet (read-only users)
- Encryption: At rest (default AWS encryption)
- Security:
- PostgreSQL firewall rules (allow specific IPs or 0.0.0.0/0 with strong passwords)
- Separate read-only user for public queries
- Admin user only for EC2 instance
Redis (Local or AWS)
Option A: Local Redis on EC2 (Simplest, $0)
- Install Redis via apt
- Use for Rails cache + Sidekiq job queue
- Persists to disk
Option B: AWS MemoryStore (Managed) (~$15/month)
- cache.t4g.micro (0.5GB)
- If you want managed Redis
Recommendation: Start with local Redis on EC2
Email Service (NOT AWS SES)
Options:
- Resend (resend.com) - 3,000 emails/month free
- Postmark - 100 emails/month free, then $1.50/1K
- SendGrid - 100 emails/day free
- Mailgun - 5,000 emails/month free for 3 months
- Self-hosted SMTP (Gmail, etc.) - Use with caution
Recommendation: Resend or Postmark for transactional emails
DNS (Managed Externally)
- You manage DNS elsewhere (Cloudflare, Namecheap, etc.)
- Point
api.canadaspends.ca→ EC2 Elastic IP - SSL via Let's Encrypt (free, auto-renewing with Certbot)
CloudWatch (Optional)
- Basic EC2 monitoring included free
- Optional: Detailed monitoring, custom metrics (~$5/month)
Security Configuration (Simplified)
EC2 Security Group:
Inbound:
- HTTP (80) from 0.0.0.0/0 # Redirect to HTTPS
- HTTPS (443) from 0.0.0.0/0 # GraphQL API
- SSH (22) from YOUR_IP # Admin access only
Outbound:
- All traffic (0.0.0.0/0) # For StatsCan API, package updates, etc.
RDS Security Group:
Inbound:
- PostgreSQL (5432) from 0.0.0.0/0 # Public read access
OR
- PostgreSQL (5432) from EC2 security group only (if you don't want fully public)
Outbound:
- None required
Database Users:
-- Admin user (EC2 only, full access)
CREATE USER admin WITH PASSWORD 'strong_password' SUPERUSER;
-- Public read-only user (for external queries)
CREATE USER public_reader WITH PASSWORD 'public_password';
GRANT CONNECT ON DATABASE canadaspends TO public_reader;
GRANT USAGE ON SCHEMA public TO public_reader;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO public_reader;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO public_reader;IAM Roles:
- EC2 Role: CloudWatch logs (optional), S3 for backups (future)
Estimated AWS Costs (Monthly, USD)
| Service | Configuration | Monthly Cost |
|---|---|---|
| EC2 (Single instance) | t3.medium (2 vCPU, 4GB RAM) | ~$30 |
| RDS PostgreSQL | db.t3.small, 50GB, Single-AZ | ~$25 |
| Elastic IP | 1 address (free when attached) | $0 |
| Data Transfer | ~20GB egress (modest usage) | ~$2 |
| Snapshots/Backups | ~10GB RDS backups | ~$1 |
| Email (Resend/Postmark) | Free tier or ~$1-5 | ~$2 |
| CloudWatch (optional) | Basic monitoring | $0-$2 |
| Total | ~$60/month |
Optional Add-ons:
- RDS Multi-AZ: +$25/month (high availability failover)
- Larger instances: t3.large EC2 (+$30), db.t3.medium RDS (+$25)
- MemoryStore Redis: cache.t4g.micro (+$15/month)
Cost Optimization:
- Use Reserved Instances for EC2 & RDS →
40% savings ($36/month) - Start small: Single-AZ RDS, t3.small/medium instances
- Scale up later: Easy to upgrade instance types as traffic grows
Database Schema
Core CanadaSpends Tables
jurisdictions
CREATE TABLE jurisdictions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
slug VARCHAR(100) UNIQUE NOT NULL, -- 'ontario', 'toronto'
name VARCHAR(200) NOT NULL, -- 'Ontario', 'Toronto'
type VARCHAR(20) NOT NULL, -- 'provincial', 'municipal'
province VARCHAR(100), -- For municipalities
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_jurisdictions_slug ON jurisdictions(slug);
CREATE INDEX idx_jurisdictions_type ON jurisdictions(type);financial_years
CREATE TABLE financial_years (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
jurisdiction_id UUID NOT NULL REFERENCES jurisdictions(id) ON DELETE CASCADE,
year VARCHAR(20) NOT NULL, -- '2023-24'
total_spending DECIMAL(15,2), -- In billions
total_revenue DECIMAL(15,2),
total_employees INTEGER,
net_debt DECIMAL(15,2),
total_debt DECIMAL(15,2),
debt_interest DECIMAL(15,2),
source_url TEXT,
methodology TEXT, -- Markdown
credits TEXT, -- Markdown
generated_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(jurisdiction_id, year)
);
CREATE INDEX idx_financial_years_jurisdiction ON financial_years(jurisdiction_id);
CREATE INDEX idx_financial_years_year ON financial_years(year);sources
CREATE TABLE sources (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
financial_year_id UUID NOT NULL REFERENCES financial_years(id) ON DELETE CASCADE,
label VARCHAR(500) NOT NULL,
url TEXT NOT NULL,
scope VARCHAR(200),
sort_order INTEGER,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_sources_financial_year ON sources(financial_year_id);departments
CREATE TABLE departments (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
financial_year_id UUID NOT NULL REFERENCES financial_years(id) ON DELETE CASCADE,
slug VARCHAR(200) NOT NULL,
name VARCHAR(500) NOT NULL,
total_spending DECIMAL(15,2),
percentage DECIMAL(5,2),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(financial_year_id, slug)
);
CREATE INDEX idx_departments_year ON departments(financial_year_id);
CREATE INDEX idx_departments_slug ON departments(slug);spending_items (Adjacency List for Hierarchy)
CREATE TABLE spending_items (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
department_id UUID NOT NULL REFERENCES departments(id) ON DELETE CASCADE,
parent_id UUID REFERENCES spending_items(id) ON DELETE CASCADE, -- NULL for root
name TEXT NOT NULL,
amount DECIMAL(15,2),
level INTEGER NOT NULL, -- 0=root, 1=program, 2=sub-program, etc.
sort_order INTEGER,
path TEXT[], -- Materialized path: ['health', 'hospitals', 'capital']
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_spending_items_department ON spending_items(department_id);
CREATE INDEX idx_spending_items_parent ON spending_items(parent_id);
CREATE INDEX idx_spending_items_path ON spending_items USING GIN(path);revenue_items
CREATE TABLE revenue_items (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
financial_year_id UUID NOT NULL REFERENCES financial_years(id) ON DELETE CASCADE,
parent_id UUID REFERENCES revenue_items(id) ON DELETE CASCADE,
name TEXT NOT NULL,
amount DECIMAL(15,2),
level INTEGER NOT NULL,
sort_order INTEGER,
path TEXT[],
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_revenue_items_year ON revenue_items(financial_year_id);
CREATE INDEX idx_revenue_items_parent ON revenue_items(parent_id);
CREATE INDEX idx_revenue_items_path ON revenue_items USING GIN(path);department_content (Editorial CMS)
CREATE TABLE department_content (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
department_id UUID UNIQUE NOT NULL REFERENCES departments(id) ON DELETE CASCADE,
intro_text TEXT,
description_text TEXT,
role_text TEXT,
programs_heading VARCHAR(200),
programs_description TEXT,
leadership_heading VARCHAR(200),
leadership_description TEXT,
priorities_heading VARCHAR(200),
priorities_description TEXT,
agencies_heading VARCHAR(200),
agencies_description TEXT,
budget_projections_text TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_department_content_dept ON department_content(department_id);Statistics Canada Integration Tables
statscan_geography
CREATE TABLE statscan_geography (
id SERIAL PRIMARY KEY,
sgc_code VARCHAR(7) UNIQUE NOT NULL, -- Standard Geographical Classification
geo_level VARCHAR(20) NOT NULL, -- 'canada', 'province', 'division', 'subdivision'
parent_sgc_code VARCHAR(7),
official_name VARCHAR(200) NOT NULL,
name_en VARCHAR(200),
name_fr VARCHAR(200),
geo_type VARCHAR(50), -- 'City', 'Town', 'Municipality'
province_code VARCHAR(2),
census_division_code VARCHAR(4),
latitude DECIMAL(10,7),
longitude DECIMAL(10,7),
vintage INTEGER, -- Census year (2021, 2016)
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
FOREIGN KEY (parent_sgc_code) REFERENCES statscan_geography(sgc_code)
);
CREATE INDEX idx_statscan_geo_province ON statscan_geography(province_code);
CREATE INDEX idx_statscan_geo_level ON statscan_geography(geo_level);
CREATE INDEX idx_statscan_geo_parent ON statscan_geography(parent_sgc_code);statscan_population
CREATE TABLE statscan_population (
id SERIAL PRIMARY KEY,
sgc_code VARCHAR(7) NOT NULL,
ref_date DATE NOT NULL, -- July 1 for annual, quarter end for quarterly
frequency VARCHAR(10) NOT NULL, -- 'annual', 'quarterly'
population BIGINT NOT NULL,
age_group VARCHAR(20), -- NULL for total, '0-4', '5-9', etc.
gender VARCHAR(10), -- NULL for total, 'Male', 'Female'
vector_id VARCHAR(15), -- StatsCan vector ID
statscan_update_date TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
FOREIGN KEY (sgc_code) REFERENCES statscan_geography(sgc_code) ON DELETE CASCADE,
UNIQUE(sgc_code, ref_date, frequency, COALESCE(age_group, ''), COALESCE(gender, ''))
);
CREATE INDEX idx_statscan_pop_geo_date ON statscan_population(sgc_code, ref_date);
CREATE INDEX idx_statscan_pop_vector ON statscan_population(vector_id);statscan_gdp
CREATE TABLE statscan_gdp (
id SERIAL PRIMARY KEY,
sgc_code VARCHAR(7) NOT NULL, -- Province/territory only
year INTEGER NOT NULL,
price_basis VARCHAR(30) NOT NULL, -- 'current', 'constant_2017', 'chained_2017'
indicator VARCHAR(100) NOT NULL, -- 'GDP at market prices', 'Final consumption'
value_millions DECIMAL(15,2), -- Always in millions
vector_id VARCHAR(15),
statscan_update_date TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
FOREIGN KEY (sgc_code) REFERENCES statscan_geography(sgc_code) ON DELETE CASCADE,
UNIQUE(sgc_code, year, price_basis, indicator)
);
CREATE INDEX idx_statscan_gdp_geo_year ON statscan_gdp(sgc_code, year);
CREATE INDEX idx_statscan_gdp_indicator ON statscan_gdp(indicator);statscan_govt_spending
CREATE TABLE statscan_govt_spending (
id SERIAL PRIMARY KEY,
sgc_code VARCHAR(7) NOT NULL,
year INTEGER NOT NULL,
govt_level VARCHAR(30), -- 'Provincial', 'Consolidated PTLG', 'Local'
ccofog_code VARCHAR(20), -- Canadian Classification of Functions of Government
function_name VARCHAR(200), -- 'Health', 'Education', 'Social protection'
amount_millions DECIMAL(15,2),
per_capita DECIMAL(10,2), -- Pre-calculated by StatsCan
vector_id VARCHAR(15),
statscan_update_date TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
FOREIGN KEY (sgc_code) REFERENCES statscan_geography(sgc_code) ON DELETE CASCADE,
UNIQUE(sgc_code, year, govt_level, ccofog_code)
);
CREATE INDEX idx_statscan_govt_spend_geo ON statscan_govt_spending(sgc_code, year);
CREATE INDEX idx_statscan_govt_spend_function ON statscan_govt_spending(function_name);statscan_census_demographics
CREATE TABLE statscan_census_demographics (
id SERIAL PRIMARY KEY,
sgc_code VARCHAR(7) NOT NULL,
census_year INTEGER NOT NULL, -- 2021, 2016, 2011
indicator VARCHAR(100) NOT NULL, -- 'Median household income', 'Employment rate'
category VARCHAR(100), -- Additional breakdown
value DECIMAL(15,2),
value_text VARCHAR(500), -- For non-numeric values
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
FOREIGN KEY (sgc_code) REFERENCES statscan_geography(sgc_code) ON DELETE CASCADE,
UNIQUE(sgc_code, census_year, indicator, COALESCE(category, ''))
);
CREATE INDEX idx_statscan_census_geo_year ON statscan_census_demographics(sgc_code, census_year);
CREATE INDEX idx_statscan_census_indicator ON statscan_census_demographics(indicator);statscan_vectors
CREATE TABLE statscan_vectors (
vector_id VARCHAR(15) PRIMARY KEY,
product_id VARCHAR(10) NOT NULL,
coordinate VARCHAR(500),
frequency VARCHAR(20),
scalar_factor_code VARCHAR(2),
scalar_description VARCHAR(50),
uom VARCHAR(100), -- Unit of measure
description TEXT,
last_update_date TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_statscan_vector_product ON statscan_vectors(product_id);statscan_products
CREATE TABLE statscan_products (
product_id VARCHAR(10) PRIMARY KEY,
title_en TEXT,
title_fr TEXT,
subject_code VARCHAR(4),
frequency VARCHAR(20),
start_date DATE,
end_date DATE,
url VARCHAR(500),
last_update_date TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);jurisdiction_statscan_mapping
CREATE TABLE jurisdiction_statscan_mapping (
id SERIAL PRIMARY KEY,
jurisdiction_slug VARCHAR(100) UNIQUE NOT NULL,
sgc_code VARCHAR(7) NOT NULL,
jurisdiction_type VARCHAR(20), -- 'provincial', 'municipal', 'federal'
province_code VARCHAR(2),
confidence VARCHAR(20), -- 'exact', 'fuzzy', 'manual'
notes TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
FOREIGN KEY (sgc_code) REFERENCES statscan_geography(sgc_code) ON DELETE RESTRICT
);
CREATE INDEX idx_jurisdiction_mapping_slug ON jurisdiction_statscan_mapping(jurisdiction_slug);
-- Example mappings
INSERT INTO jurisdiction_statscan_mapping (jurisdiction_slug, sgc_code, jurisdiction_type, province_code, confidence) VALUES
('ontario', '35', 'provincial', '35', 'exact'),
('toronto', '3520001', 'municipal', '35', 'exact'),
('vancouver', '5915022', 'municipal', '59', 'exact'),
('british-columbia', '59', 'provincial', '59', 'exact'),
('alberta', '48', 'provincial', '48', 'exact'),
('edmonton', '4811061', 'municipal', '48', 'exact');Notification System Tables
users (Devise Authentication)
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE NOT NULL,
encrypted_password VARCHAR(255) NOT NULL,
reset_password_token VARCHAR(255) UNIQUE,
reset_password_sent_at TIMESTAMPTZ,
remember_created_at TIMESTAMPTZ,
sign_in_count INTEGER DEFAULT 0,
current_sign_in_at TIMESTAMPTZ,
last_sign_in_at TIMESTAMPTZ,
current_sign_in_ip INET,
last_sign_in_ip INET,
role VARCHAR(50) DEFAULT 'viewer', -- 'admin', 'editor', 'viewer'
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_reset_token ON users(reset_password_token);notification_subscriptions
CREATE TABLE notification_subscriptions (
id SERIAL PRIMARY KEY,
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
entity_type VARCHAR(50) NOT NULL, -- 'jurisdiction', 'department', 'all'
entity_id UUID, -- NULL for 'all'
event_types VARCHAR(50)[] NOT NULL, -- ['data_updated', 'data_corrected']
notification_method VARCHAR(20) DEFAULT 'email', -- 'email', 'webhook'
webhook_url VARCHAR(500),
is_active BOOLEAN DEFAULT true,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_notif_subs_user ON notification_subscriptions(user_id);
CREATE INDEX idx_notif_subs_entity ON notification_subscriptions(entity_type, entity_id);
CREATE INDEX idx_notif_subs_active ON notification_subscriptions(is_active) WHERE is_active = true;notification_logs
CREATE TABLE notification_logs (
id SERIAL PRIMARY KEY,
subscription_id INTEGER NOT NULL REFERENCES notification_subscriptions(id) ON DELETE CASCADE,
entity_type VARCHAR(50) NOT NULL,
entity_id UUID,
event_type VARCHAR(50) NOT NULL,
change_summary JSONB, -- Details of what changed
sent_at TIMESTAMPTZ,
status VARCHAR(20), -- 'sent', 'failed', 'pending'
error_message TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_notif_logs_subscription ON notification_logs(subscription_id);
CREATE INDEX idx_notif_logs_created ON notification_logs(created_at DESC);
CREATE INDEX idx_notif_logs_status ON notification_logs(status);data_change_audit
CREATE TABLE data_change_audit (
id SERIAL PRIMARY KEY,
table_name VARCHAR(100) NOT NULL,
record_id UUID NOT NULL,
action VARCHAR(20) NOT NULL, -- 'insert', 'update', 'delete'
changed_by UUID REFERENCES users(id), -- NULL for automated/system
old_values JSONB,
new_values JSONB,
change_reason TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_audit_table_record ON data_change_audit(table_name, record_id);
CREATE INDEX idx_audit_created ON data_change_audit(created_at DESC);
CREATE INDEX idx_audit_user ON data_change_audit(changed_by);data_sync_jobs
CREATE TABLE data_sync_jobs (
id SERIAL PRIMARY KEY,
sync_type VARCHAR(50) NOT NULL, -- 'population', 'gdp', 'census', 'govt_spending'
product_id VARCHAR(10),
start_time TIMESTAMPTZ NOT NULL,
end_time TIMESTAMPTZ,
records_fetched INTEGER,
records_inserted INTEGER,
records_updated INTEGER,
errors INTEGER,
error_details TEXT,
status VARCHAR(20), -- 'running', 'success', 'failed', 'partial'
statscan_release_date DATE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_sync_jobs_type_time ON data_sync_jobs(sync_type, start_time DESC);
CREATE INDEX idx_sync_jobs_status ON data_sync_jobs(status);Database Views & Materialized Views
v_jurisdiction_enriched (Real-time View)
CREATE VIEW v_jurisdiction_enriched AS
SELECT
j.id,
j.slug,
j.name,
j.type,
fy.id AS financial_year_id,
fy.year,
fy.total_spending AS spending_billions,
fy.total_revenue AS revenue_billions,
g.sgc_code,
g.province_code,
g.geo_level,
p.population,
p.ref_date AS population_date,
-- Per capita spending
CASE
WHEN p.population > 0 THEN (fy.total_spending * 1000000000) / p.population
ELSE NULL
END AS spending_per_capita,
-- GDP data
gdp.value_millions AS gdp_millions,
-- Spending as % of GDP
CASE
WHEN gdp.value_millions > 0 THEN (fy.total_spending * 1000) / gdp.value_millions * 100
ELSE NULL
END AS spending_pct_gdp
FROM jurisdictions j
LEFT JOIN financial_years fy ON j.id = fy.jurisdiction_id
LEFT JOIN jurisdiction_statscan_mapping m ON j.slug = m.jurisdiction_slug
LEFT JOIN statscan_geography g ON m.sgc_code = g.sgc_code
-- Get most recent annual population for this jurisdiction/year
LEFT JOIN LATERAL (
SELECT population, ref_date
FROM statscan_population
WHERE sgc_code = g.sgc_code
AND frequency = 'annual'
AND age_group IS NULL
AND gender IS NULL
ORDER BY ABS(EXTRACT(YEAR FROM ref_date) - CAST(SUBSTRING(fy.year, 1, 4) AS INTEGER))
LIMIT 1
) p ON true
-- Get GDP for matching year
LEFT JOIN LATERAL (
SELECT value_millions
FROM statscan_gdp
WHERE sgc_code = g.sgc_code
AND price_basis = 'current'
AND indicator = 'GDP at market prices'
ORDER BY ABS(year - CAST(SUBSTRING(fy.year, 1, 4) AS INTEGER))
LIMIT 1
) gdp ON true;mv_sankey_data (Materialized View - Cached)
CREATE MATERIALIZED VIEW mv_sankey_data AS
SELECT
fy.id AS financial_year_id,
j.slug AS jurisdiction_slug,
jsonb_build_object(
'total', fy.total_spending,
'spending', fy.total_spending,
'revenue', fy.total_revenue,
'spending_data', (
SELECT jsonb_build_object(
'name', 'Spending',
'children', COALESCE(jsonb_agg(dept_tree.tree ORDER BY dept_tree.sort_order), '[]'::jsonb)
)
FROM (
SELECT
d.total_spending * 100.0 / fy.total_spending AS sort_order,
jsonb_build_object(
'name', d.name,
'amount', d.total_spending,
'children', (
SELECT COALESCE(jsonb_agg(
jsonb_build_object(
'name', si.name,
'amount', si.amount
) ORDER BY si.sort_order
), '[]'::jsonb)
FROM spending_items si
WHERE si.department_id = d.id AND si.parent_id IS NULL
)
) AS tree
FROM departments d
WHERE d.financial_year_id = fy.id
) dept_tree
),
'revenue_data', (
SELECT jsonb_build_object(
'name', 'Revenue',
'children', COALESCE(jsonb_agg(
jsonb_build_object(
'name', ri.name,
'amount', ri.amount
) ORDER BY ri.sort_order
), '[]'::jsonb)
)
FROM revenue_items ri
WHERE ri.financial_year_id = fy.id AND ri.parent_id IS NULL
)
) AS sankey_json
FROM financial_years fy
JOIN jurisdictions j ON fy.jurisdiction_id = j.id;
CREATE UNIQUE INDEX idx_mv_sankey_fy ON mv_sankey_data(financial_year_id);
CREATE INDEX idx_mv_sankey_slug ON mv_sankey_data(jurisdiction_slug);
-- Refresh command (run after data changes):
-- REFRESH MATERIALIZED VIEW CONCURRENTLY mv_sankey_data;Database Triggers & Functions
Auto-update Department Totals
CREATE OR REPLACE FUNCTION update_department_total()
RETURNS TRIGGER AS $$
BEGIN
UPDATE departments
SET
total_spending = (
SELECT COALESCE(SUM(amount), 0)
FROM spending_items
WHERE department_id = COALESCE(NEW.department_id, OLD.department_id)
AND parent_id IS NULL
),
updated_at = NOW()
WHERE id = COALESCE(NEW.department_id, OLD.department_id);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_update_dept_total
AFTER INSERT OR UPDATE OR DELETE ON spending_items
FOR EACH ROW EXECUTE FUNCTION update_department_total();Audit Trail Trigger
CREATE OR REPLACE FUNCTION log_data_change()
RETURNS TRIGGER AS $$
BEGIN
IF TG_OP = 'UPDATE' THEN
INSERT INTO data_change_audit (table_name, record_id, action, old_values, new_values, created_at)
VALUES (
TG_TABLE_NAME,
OLD.id,
'update',
to_jsonb(OLD),
to_jsonb(NEW),
NOW()
);
ELSIF TG_OP = 'DELETE' THEN
INSERT INTO data_change_audit (table_name, record_id, action, old_values, created_at)
VALUES (
TG_TABLE_NAME,
OLD.id,
'delete',
to_jsonb(OLD),
NOW()
);
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- Apply to critical tables
CREATE TRIGGER trg_audit_financial_years
AFTER UPDATE OR DELETE ON financial_years
FOR EACH ROW EXECUTE FUNCTION log_data_change();
CREATE TRIGGER trg_audit_departments
AFTER UPDATE OR DELETE ON departments
FOR EACH ROW EXECUTE FUNCTION log_data_change();
CREATE TRIGGER trg_audit_spending_items
AFTER UPDATE OR DELETE ON spending_items
FOR EACH ROW EXECUTE FUNCTION log_data_change();Notification Trigger
CREATE OR REPLACE FUNCTION queue_notification()
RETURNS TRIGGER AS $$
BEGIN
-- Notify Sidekiq via pg_notify
PERFORM pg_notify(
'data_changed',
json_build_object(
'table', TG_TABLE_NAME,
'record_id', NEW.id,
'action', TG_OP,
'timestamp', NOW()
)::text
);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_notify_financial_years
AFTER UPDATE ON financial_years
FOR EACH ROW EXECUTE FUNCTION queue_notification();
CREATE TRIGGER trg_notify_departments
AFTER UPDATE ON departments
FOR EACH ROW EXECUTE FUNCTION queue_notification();Ruby on Rails Application Architecture
Rails Stack
Gemfile (key gems):
# Gemfile
source 'https://rubygems.org'
ruby '3.2.2'
# Core
gem 'rails', '~> 7.1.0'
gem 'pg', '~> 1.5'
gem 'puma', '~> 6.0'
# GraphQL
gem 'graphql', '~> 2.2'
gem 'graphql-batch' # N+1 query optimization
# Authentication & Authorization
gem 'devise', '~> 4.9'
gem 'pundit', '~> 2.3'
# Background Jobs
gem 'sidekiq', '~> 7.2'
gem 'sidekiq-cron', '~> 1.12' # Scheduled jobs
# Redis
gem 'redis', '~> 5.0'
gem 'hiredis', '~> 0.6'
# HTTP Client (for StatsCan API)
gem 'faraday', '~> 2.8'
gem 'faraday-retry'
# Performance
gem 'rack-cors' # CORS for Next.js
gem 'bootsnap', require: false
# Development/Test
group :development, :test do
gem 'rspec-rails'
gem 'factory_bot_rails'
gem 'faker'
gem 'pry-rails'
endAuthentication (Devise)
User Model:
# app/models/user.rb
class User < ApplicationRecord
devise :database_authenticatable, :registerable,
:recoverable, :rememberable, :validatable
enum role: { viewer: 0, editor: 1, admin: 2 }
has_many :notification_subscriptions, dependent: :destroy
def can_edit?
editor? || admin?
end
def can_admin?
admin?
end
endAuthorization (Pundit)
# app/policies/financial_year_policy.rb
class FinancialYearPolicy < ApplicationPolicy
def index?
true # Public read
end
def show?
true # Public read
end
def update?
user&.can_edit?
end
def create?
user&.can_admin?
end
def destroy?
user&.can_admin?
end
endActiveRecord Models
# app/models/jurisdiction.rb
class Jurisdiction < ApplicationRecord
has_many :financial_years, dependent: :destroy
has_one :statscan_mapping, class_name: 'JurisdictionStatscanMapping',
primary_key: :slug, foreign_key: :jurisdiction_slug
validates :slug, presence: true, uniqueness: true
validates :name, presence: true
validates :type, inclusion: { in: %w[provincial municipal] }
def latest_year
financial_years.order(year: :desc).first
end
def population
statscan_mapping&.latest_population
end
def spending_per_capita
return nil unless latest_year && population
(latest_year.total_spending * 1_000_000_000) / population
end
end
# app/models/financial_year.rb
class FinancialYear < ApplicationRecord
belongs_to :jurisdiction
has_many :departments, dependent: :destroy
has_many :sources, dependent: :destroy
has_many :revenue_items, dependent: :destroy
validates :year, presence: true, uniqueness: { scope: :jurisdiction_id }
after_update :refresh_sankey_cache, if: :saved_change_to_total_spending?
after_update :send_notifications
def sankey_data
Rails.cache.fetch("sankey_#{id}", expires_in: 1.hour) do
SankeyDataBuilder.new(self).build
end
end
private
def refresh_sankey_cache
RefreshSankeyMaterializedViewJob.perform_async
end
def send_notifications
DataChangeNotificationJob.perform_async('financial_year', id, 'updated')
end
end
# app/models/department.rb
class Department < ApplicationRecord
belongs_to :financial_year
has_many :spending_items, dependent: :destroy
has_one :content, class_name: 'DepartmentContent', dependent: :destroy
validates :slug, presence: true, uniqueness: { scope: :financial_year_id }
def root_spending_items
spending_items.where(parent_id: nil).order(:sort_order)
end
end
# app/models/spending_item.rb
class SpendingItem < ApplicationRecord
belongs_to :department
belongs_to :parent, class_name: 'SpendingItem', optional: true
has_many :children, class_name: 'SpendingItem', foreign_key: :parent_id
before_save :set_level_and_path
private
def set_level_and_path
if parent
self.level = parent.level + 1
self.path = parent.path + [name.parameterize]
else
self.level = 0
self.path = [name.parameterize]
end
end
endStatsCan Models
# app/models/statscan/geography.rb
module Statscan
class Geography < ApplicationRecord
self.table_name = 'statscan_geography'
has_many :population_records, class_name: 'Statscan::Population', foreign_key: :sgc_code, primary_key: :sgc_code
has_many :gdp_records, class_name: 'Statscan::Gdp', foreign_key: :sgc_code, primary_key: :sgc_code
def latest_population(year: Date.current.year)
population_records.where(frequency: 'annual', age_group: nil, gender: nil)
.order(Arel.sql("ABS(EXTRACT(YEAR FROM ref_date) - #{year})"))
.first&.population
end
end
end
# app/models/statscan/population.rb
module Statscan
class Population < ApplicationRecord
self.table_name = 'statscan_population'
belongs_to :geography, class_name: 'Statscan::Geography', foreign_key: :sgc_code, primary_key: :sgc_code
end
endGraphQL API Schema
Schema Definition
# app/graphql/types/query_type.rb
module Types
class QueryType < Types::BaseObject
# Jurisdictions
field :jurisdictions, [Types::JurisdictionType], null: false,
description: "List all jurisdictions (provinces and municipalities)"
field :jurisdiction, Types::JurisdictionType, null: false do
argument :slug, String, required: true
end
# Financial Years
field :financial_year, Types::FinancialYearType, null: false do
argument :jurisdiction_slug, String, required: true
argument :year, String, required: false
description "Get financial year data. If year is not provided, returns latest year."
end
# Departments
field :department, Types::DepartmentType, null: false do
argument :jurisdiction_slug, String, required: true
argument :department_slug, String, required: true
argument :year, String, required: false
end
# StatsCan Data
field :population_data, [Types::PopulationDataType], null: false do
argument :sgc_code, String, required: true
argument :years, Integer, required: false, default_value: 5
end
# Resolvers
def jurisdictions
Jurisdiction.all.order(:name)
end
def jurisdiction(slug:)
Jurisdiction.find_by!(slug: slug)
end
def financial_year(jurisdiction_slug:, year: nil)
jurisdiction = Jurisdiction.find_by!(slug: jurisdiction_slug)
query = jurisdiction.financial_years
if year
query.find_by!(year: year)
else
query.order(year: :desc).first!
end
end
def department(jurisdiction_slug:, department_slug:, year: nil)
fy = financial_year(jurisdiction_slug: jurisdiction_slug, year: year)
fy.departments.find_by!(slug: department_slug)
end
def population_data(sgc_code:, years:)
Statscan::Population.where(sgc_code: sgc_code, frequency: 'annual', age_group: nil, gender: nil)
.order(ref_date: :desc)
.limit(years)
end
end
endGraphQL Types
# app/graphql/types/jurisdiction_type.rb
module Types
class JurisdictionType < Types::BaseObject
field :id, ID, null: false
field :slug, String, null: false
field :name, String, null: false
field :type, String, null: false
field :province, String, null: true
# Associations
field :financial_years, [Types::FinancialYearType], null: false
field :latest_year, Types::FinancialYearType, null: true
# StatsCan Enrichment
field :sgc_code, String, null: true
field :population, Integer, null: true
field :spending_per_capita, Float, null: true
def latest_year
object.latest_year
end
def sgc_code
object.statscan_mapping&.sgc_code
end
def population
BatchLoader::GraphQL.for(object.id).batch do |ids, loader|
Jurisdiction.includes(statscan_mapping: :geography).where(id: ids).each do |j|
pop = j.statscan_mapping&.geography&.latest_population
loader.call(j.id, pop)
end
end
end
def spending_per_capita
object.spending_per_capita
end
end
end
# app/graphql/types/financial_year_type.rb
module Types
class FinancialYearType < Types::BaseObject
field :id, ID, null: false
field :year, String, null: false
field :total_spending, Float, null: false
field :total_revenue, Float, null: true
field :total_employees, Integer, null: true
field :net_debt, Float, null: true
field :total_debt, Float, null: true
field :debt_interest, Float, null: true
field :methodology, String, null: true
field :credits, String, null: true
field :departments, [Types::DepartmentType], null: false
field :sources, [Types::SourceType], null: false
field :sankey_data, GraphQL::Types::JSON, null: true
def sankey_data
object.sankey_data
end
end
end
# app/graphql/types/department_type.rb
module Types
class DepartmentType < Types::BaseObject
field :id, ID, null: false
field :slug, String, null: false
field :name, String, null: false
field :total_spending, Float, null: false
field :percentage, Float, null: true
field :spending_items, [Types::SpendingItemType], null: false do
argument :parent_id, ID, required: false
end
field :content, Types::DepartmentContentType, null: true
def spending_items(parent_id: nil)
if parent_id
object.spending_items.where(parent_id: parent_id).order(:sort_order)
else
object.root_spending_items
end
end
def content
object.content
end
end
end
# app/graphql/types/spending_item_type.rb
module Types
class SpendingItemType < Types::BaseObject
field :id, ID, null: false
field :name, String, null: false
field :amount, Float, null: true
field :level, Integer, null: false
field :parent_id, ID, null: true
# Recursive children
field :children, [Types::SpendingItemType], null: false
def children
BatchLoader::GraphQL.for(object.id).batch(default_value: []) do |ids, loader|
SpendingItem.where(parent_id: ids).order(:sort_order).each do |item|
loader.call(item.parent_id) { |memo| memo << item }
end
end
end
end
endMutations
# app/graphql/types/mutation_type.rb
module Types
class MutationType < Types::BaseObject
field :update_financial_year, mutation: Mutations::UpdateFinancialYear
field :create_department, mutation: Mutations::CreateDepartment
field :subscribe_to_notifications, mutation: Mutations::SubscribeToNotifications
end
end
# app/graphql/mutations/update_financial_year.rb
module Mutations
class UpdateFinancialYear < BaseMutation
argument :id, ID, required: true
argument :total_spending, Float, required: false
argument :total_revenue, Float, required: false
argument :net_debt, Float, required: false
field :financial_year, Types::FinancialYearType, null: true
field :errors, [String], null: false
def resolve(id:, **attributes)
financial_year = FinancialYear.find(id)
# Authorization
raise Pundit::NotAuthorizedError unless FinancialYearPolicy.new(context[:current_user], financial_year).update?
if financial_year.update(attributes)
{ financial_year: financial_year, errors: [] }
else
{ financial_year: nil, errors: financial_year.errors.full_messages }
end
end
end
endExample GraphQL Queries (Client-Side)
# Get jurisdiction with latest year
query GetJurisdiction($slug: String!) {
jurisdiction(slug: $slug) {
id
name
slug
type
population
spendingPerCapita
latestYear {
year
totalSpending
totalRevenue
netDebt
departments {
id
slug
name
totalSpending
percentage
}
sankeyData
}
}
}
# Get department with nested spending (3 levels)
query GetDepartment($jurisdictionSlug: String!, $departmentSlug: String!) {
department(
jurisdictionSlug: $jurisdictionSlug
departmentSlug: $departmentSlug
) {
id
name
totalSpending
spendingItems {
id
name
amount
children {
id
name
amount
children {
id
name
amount
}
}
}
content {
introText
descriptionText
programsDescription
}
}
}
# Get population trends
query GetPopulationTrends($sgcCode: String!, $years: Int!) {
populationData(sgcCode: $sgcCode, years: $years) {
refDate
population
}
}
# Update financial year (admin only)
mutation UpdateFinancialYear($id: ID!, $totalSpending: Float!) {
updateFinancialYear(id: $id, totalSpending: $totalSpending) {
financialYear {
id
totalSpending
}
errors
}
}Sidekiq Background Jobs
StatsCan Data Sync Jobs
# app/jobs/statscan/sync_population_job.rb
module Statscan
class SyncPopulationJob < ApplicationJob
queue_as :statscan_sync
def perform(product_id = '1710000501', years = 10)
sync_job = DataSyncJob.create!(
sync_type: 'population',
product_id: product_id,
start_time: Time.current,
status: 'running'
)
begin
fetcher = Statscan::PopulationFetcher.new(product_id)
results = fetcher.fetch_latest(years)
sync_job.update!(
records_fetched: results[:fetched],
records_inserted: results[:inserted],
records_updated: results[:updated],
errors: results[:errors],
status: results[:errors].zero? ? 'success' : 'partial',
end_time: Time.current
)
Rails.logger.info "StatsCan population sync completed: #{results.inspect}"
rescue => e
sync_job.update!(
status: 'failed',
error_details: e.message,
end_time: Time.current
)
raise
end
end
end
end
# app/services/statscan/population_fetcher.rb
module Statscan
class PopulationFetcher
BASE_URL = 'https://www150.statcan.gc.ca/t1/wds/rest'
def initialize(product_id)
@product_id = product_id
end
def fetch_latest(years)
vectors = province_vectors # Mapping of SGC codes to vector IDs
results = { fetched: 0, inserted: 0, updated: 0, errors: 0 }
vectors.each do |sgc_code, vector_id|
begin
data = fetch_vector_data(vector_id, years)
data.each do |point|
pop = Statscan::Population.find_or_initialize_by(
sgc_code: sgc_code,
ref_date: point['refPer'],
frequency: 'annual',
age_group: nil,
gender: nil
)
pop.assign_attributes(
population: point['value'].to_i,
vector_id: vector_id,
statscan_update_date: Time.current
)
if pop.new_record?
pop.save!
results[:inserted] += 1
elsif pop.changed?
pop.save!
results[:updated] += 1
end
results[:fetched] += 1
end
rescue => e
Rails.logger.error "Error fetching vector #{vector_id}: #{e.message}"
results[:errors] += 1
end
end
results
end
private
def fetch_vector_data(vector_id, periods)
conn = Faraday.new(url: BASE_URL) do |f|
f.request :json
f.response :json
f.adapter Faraday.default_adapter
end
response = conn.post('getDataFromVectorsAndLatestNPeriods') do |req|
req.body = [{ vectorId: vector_id, latestN: periods }]
end
if response.success? && response.body['status'] == 'SUCCESS'
response.body.dig('object', 0, 'vectorDataPoint') || []
else
raise "StatsCan API error: #{response.body}"
end
end
def province_vectors
{
'35' => 'v466670', # Ontario
'59' => 'v466730', # BC
'48' => 'v466710', # Alberta
# Add more provinces...
}
end
end
endScheduled Jobs (Cron)
# config/initializers/sidekiq_cron.rb
require 'sidekiq/cron/job'
Sidekiq::Cron::Job.create(
name: 'Sync StatsCan Population - Monthly',
cron: '0 2 1 * *', # 2 AM on 1st of each month
class: 'Statscan::SyncPopulationJob'
)
Sidekiq::Cron::Job.create(
name: 'Sync StatsCan GDP - Quarterly',
cron: '0 3 1 */3 *', # 3 AM on 1st of every 3 months
class: 'Statscan::SyncGdpJob'
)
Sidekiq::Cron::Job.create(
name: 'Refresh Sankey Materialized View - Daily',
cron: '0 4 * * *', # 4 AM daily
class: 'RefreshSankeyMaterializedViewJob'
)Notification Jobs
# app/jobs/data_change_notification_job.rb
class DataChangeNotificationJob < ApplicationJob
queue_as :notifications
def perform(entity_type, entity_id, event_type)
entity = find_entity(entity_type, entity_id)
subscriptions = NotificationSubscription.active
.where(entity_type: [entity_type, 'all'])
.where('? = ANY(event_types)', event_type)
subscriptions.find_each do |subscription|
next unless subscription.should_notify?(entity)
case subscription.notification_method
when 'email'
send_email_notification(subscription, entity, event_type)
when 'webhook'
send_webhook_notification(subscription, entity, event_type)
end
end
end
private
def send_email_notification(subscription, entity, event_type)
log = NotificationLog.create!(
subscription: subscription,
entity_type: entity.class.name,
entity_id: entity.id,
event_type: event_type,
change_summary: build_change_summary(entity),
status: 'pending'
)
begin
DataChangeMailer.notify(subscription.user, entity, event_type).deliver_now
log.update!(status: 'sent', sent_at: Time.current)
rescue => e
log.update!(status: 'failed', error_message: e.message)
raise
end
end
def find_entity(type, id)
type.constantize.find(id)
end
def build_change_summary(entity)
# Get changes from audit log
audit = DataChangeAudit.where(table_name: entity.class.table_name, record_id: entity.id)
.order(created_at: :desc).first
return {} unless audit
{
changes: audit.new_values.except('updated_at').select { |k, v|
audit.old_values[k] != v
}
}
end
end
# app/mailers/data_change_mailer.rb
class DataChangeMailer < ApplicationMailer
def notify(user, entity, event_type)
@user = user
@entity = entity
@event_type = event_type
@changes_url = "#{ENV['FRONTEND_URL']}/#{entity.class.table_name}/#{entity.id}"
mail(
to: user.email,
subject: "Data Updated: #{entity_name(entity)}"
)
end
private
def entity_name(entity)
case entity
when FinancialYear
"#{entity.jurisdiction.name} #{entity.year}"
when Department
entity.name
else
entity.to_s
end
end
endEmail Template
<!-- app/views/data_change_mailer/notify.html.erb -->
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<style>
body { font-family: Arial, sans-serif; line-height: 1.6; }
.container { max-width: 600px; margin: 0 auto; padding: 20px; }
.header { background: #2563eb; color: white; padding: 20px; }
.content { padding: 20px; background: #f9fafb; }
.button { display: inline-block; background: #2563eb; color: white;
padding: 12px 24px; text-decoration: none; border-radius: 4px; }
</style>
</head>
<body>
<div class="container">
<div class="header">
<h1>CanadaSpends Data Update</h1>
</div>
<div class="content">
<p>Hello <%= @user.email %>,</p>
<p>
Data has been updated for <strong><%= entity_name(@entity) %></strong>.
</p>
<p>Event: <strong><%= @event_type.titleize %></strong></p>
<p>
<a href="<%= @changes_url %>" class="button">View Changes</a>
</p>
<p>
<small>
You're receiving this because you subscribed to notifications for this data.
<a href="<%= ENV['FRONTEND_URL'] %>/profile/notifications">Manage subscriptions</a>
</small>
</p>
</div>
</div>
</body>
</html>Migration Strategy
Phase 1: AWS Infrastructure Setup (Week 1)
-
Launch EC2 Instance
- Launch t3.medium Ubuntu 22.04 instance
- Allocate and attach Elastic IP
- Configure security group (ports 80, 443, 22)
- SSH in and update system:
sudo apt update && sudo apt upgrade -y
-
Install Software Stack
# Install Ruby 3.2 sudo apt install -y ruby-full build-essential git # Install PostgreSQL client sudo apt install -y postgresql-client # Install Redis locally sudo apt install -y redis-server sudo systemctl enable redis-server # Install Nginx sudo apt install -y nginx # Install Certbot for Let's Encrypt SSL sudo apt install -y certbot python3-certbot-nginx
-
Launch RDS PostgreSQL
- Launch db.t3.small PostgreSQL 15 instance
- Enable public access in RDS settings
- Configure security group (port 5432 from 0.0.0.0/0)
- Create initial database
canadaspends - Create admin user and read-only user
- Enable automated backups (7-day retention)
-
Configure DNS
- Point your domain (e.g.,
api.canadaspends.ca) to Elastic IP - Wait for DNS propagation
- Point your domain (e.g.,
-
Set up SSL
sudo certbot --nginx -d api.canadaspends.ca # Auto-renews via cron
Phase 2: Rails Application Setup (Week 2)
-
Initialize Rails Project
rails new canadaspends-api --api --database=postgresql cd canadaspends-api -
Database Migrations
- Create all table migrations (jurisdictions, financial_years, etc.)
- Create StatsCan tables
- Create notification tables
- Add indexes, triggers, views
-
Configure Environment
# .env.production DATABASE_URL=postgresql://user:pass@rds-endpoint:5432/canadaspends REDIS_URL=redis://elasticache-endpoint:6379/0 AWS_REGION=us-east-1 SES_SMTP_ADDRESS=email-smtp.us-east-1.amazonaws.com FRONTEND_URL=https://www.canadaspends.ca
Phase 3: Data Migration (Week 2-3)
-
Create Migration Scripts
# lib/tasks/migrate_json_data.rake namespace :data do desc "Migrate JSON files to PostgreSQL" task migrate: :environment do DataMigrator.new.migrate_all end end # app/services/data_migrator.rb class DataMigrator def migrate_all migrate_jurisdictions migrate_financial_years migrate_departments migrate_spending_items validate_migration end def migrate_jurisdictions Dir.glob('data/provincial/*/').each do |path| slug = File.basename(path) # Parse summary.json, create Jurisdiction record end end # ... (detailed implementation) end
-
Flatten Hierarchical Data
def migrate_spending_items(department, spending_data, parent = nil, level = 0) spending_data['children']&.each_with_index do |item, index| spending_item = department.spending_items.create!( parent: parent, name: item['name'], amount: item['amount'], level: level, sort_order: index ) # Recursively process children if item['children'].present? migrate_spending_items(department, item, spending_item, level + 1) end end end
-
Validate Data Integrity
- Compare totals from JSON vs. database
- Check all relationships exist
- Verify no data loss
Phase 4: StatsCan Integration (Week 3)
-
Populate Geographic Reference Data
- Load SGC 2021 codes into
statscan_geography - Create jurisdiction mappings
- Load SGC 2021 codes into
-
Initial StatsCan Data Sync
- Run population sync for last 10 years
- Run GDP sync
- Run government spending sync (for validation)
-
Test Scheduled Jobs
- Verify Sidekiq cron jobs work
- Check data freshness logic
Phase 5: GraphQL API Development (Week 4)
-
Implement GraphQL Schema
- All query types
- Mutations for admin operations
- Optimize with graphql-batch
-
Add Authentication Middleware
# app/graphql/canadaspends_schema.rb class CanadaSpendsSchema < GraphQL::Schema mutation(Types::MutationType) query(Types::QueryType) def self.unauthorized_object(error) raise GraphQL::ExecutionError, "Unauthorized" end end # app/controllers/graphql_controller.rb class GraphqlController < ApplicationController def execute context = { current_user: current_user } result = CanadaSpendsSchema.execute( params[:query], variables: params[:variables], context: context ) render json: result end private def current_user token = request.headers['Authorization']&.split(' ')&.last return nil unless token # Verify JWT or session User.find_by(id: decode_token(token)) end end
Phase 6: Deployment & Configuration (Week 5)
-
Deploy Rails to EC2
- Use Capistrano for git-based deployment
- Configure Nginx reverse proxy (already set up with Certbot)
- Configure Puma as systemd service
- Set up Sidekiq as systemd service
-
Nginx Configuration
# /etc/nginx/sites-available/canadaspends-api upstream puma { server unix:///home/deploy/canadaspends-api/shared/tmp/sockets/puma.sock; } server { listen 80; server_name api.canadaspends.ca; return 301 https://$server_name$request_uri; } server { listen 443 ssl http2; server_name api.canadaspends.ca; ssl_certificate /etc/letsencrypt/live/api.canadaspends.ca/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/api.canadaspends.ca/privkey.pem; root /home/deploy/canadaspends-api/current/public; location / { proxy_pass http://puma; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } }
-
Health Monitoring
- Add
/healthendpoint in Rails - Monitor with CloudWatch (optional)
- Or use external monitoring (UptimeRobot, Pingdom)
- Add
Phase 7: Next.js Frontend Updates (Week 5-6)
-
Install Apollo Client
pnpm add @apollo/client graphql
-
Replace Data Fetching
// src/lib/apollo-client.ts import { ApolloClient, InMemoryCache, HttpLink } from '@apollo/client' export const apolloClient = new ApolloClient({ link: new HttpLink({ uri: process.env.NEXT_PUBLIC_GRAPHQL_URL, }), cache: new InMemoryCache(), }) // src/app/[lang]/(main)/[jurisdiction]/page.tsx import { gql } from '@apollo/client' import { apolloClient } from '@/lib/apollo-client' const GET_JURISDICTION = gql` query GetJurisdiction($slug: String!) { jurisdiction(slug: $slug) { id name population latestYear { totalSpending departments { id name totalSpending } } } } ` export default async function JurisdictionPage({ params }) { const { data } = await apolloClient.query({ query: GET_JURISDICTION, variables: { slug: params.jurisdiction }, }) return <JurisdictionView data={data.jurisdiction} /> }
Phase 8: Admin Dashboard (Week 6)
-
Build Admin UI
- Option A: Rails Admin gem (quick)
- Option B: Next.js admin routes + GraphQL mutations (flexible)
- Features: CRUD for all entities, data import, user management
-
Notification Subscription UI
- User profile page
- Subscribe/unsubscribe to entity changes
- Email preferences
Phase 9: Testing & Optimization (Week 7)
-
Performance Testing
- Load test GraphQL endpoint (k6, Artillery)
- Optimize slow queries
- Add database indexes as needed
-
Caching Implementation
- Rails.cache for expensive queries
- HTTP caching headers
- CDN configuration for GraphQL (if applicable)
-
Monitoring Setup
- CloudWatch dashboards
- Error tracking (Sentry, Rollbar)
- Uptime monitoring
Phase 10: Launch & Monitoring (Week 8)
-
Final Testing
- Full QA testing on EC2 instance
- Test all GraphQL queries
- Verify StatsCan sync jobs running
- Test email notifications
-
Production Launch
- Ensure DNS is pointed correctly
- Update Next.js env vars (
NEXT_PUBLIC_GRAPHQL_URL) - Deploy Next.js frontend with new API endpoint
- Monitor for errors
-
Post-Launch Monitoring
- Monitor EC2 CPU/memory usage
- Monitor RDS connections and queries
- Check Sidekiq job success rates
- Review email delivery stats
- Gather user feedback
- Optimize slow queries
Timeline Estimate
| Phase | Duration | Tasks |
|---|---|---|
| Phase 1: AWS Setup | 1 week | EC2 launch, RDS setup, software install, SSL |
| Phase 2: Rails App | 1 week | Init project, migrations, models |
| Phase 3: Data Migration | 1-2 weeks | Scripts, flatten hierarchies, validate |
| Phase 4: StatsCan Integration | 1 week | Sync jobs, populate data |
| Phase 5: GraphQL API | 1 week | Schema, resolvers, auth |
| Phase 6: Deployment | 1 week | Capistrano, Nginx, systemd services |
| Phase 7: Frontend Updates | 1-2 weeks | Apollo Client, replace data layer |
| Phase 8: Admin Dashboard | 1 week | CRUD UI, subscriptions |
| Phase 9: Testing | 1 week | Performance, optimization |
| Phase 10: Launch | 1 week | Final testing, production deploy |
| Total | 10-12 weeks | (2.5-3 months) |
Cost-Benefit Analysis
Costs
Development:
- Developer time: 10-12 weeks at ~$100/hr = $40,000-48,000
- Or internal team allocation
Infrastructure (Monthly):
- AWS:
$60/month ($36 with Reserved Instances) - Email (Resend/Postmark): ~$0-5/month (free tier or minimal)
- Total: ~$60/month
Annual Infrastructure:
- ~$720/year (on-demand)
- ~$432/year (Reserved Instances)
Risks & Mitigation
| Risk | Impact | Probability | Mitigation |
|---|---|---|---|
| Migration data loss | High | Medium | Thorough validation, keep JSON backups, staged rollout |
| AWS costs exceed budget | Medium | Low | Use Reserved Instances, monitor with CloudWatch Alarms, set billing alerts |
| StatsCan API rate limits | Medium | Medium | Respect 25 req/sec limit, implement exponential backoff, cache aggressively |
| GraphQL N+1 queries | High | Medium | Use graphql-batch, database query monitoring, add indexes |
| Security breach | High | Low | Devise + Pundit auth, security groups, regular updates, penetration testing |
| Email deliverability | Low | Medium | Use AWS SES verified domain, DKIM/SPF records, monitor bounce rates |
| Timeline overruns | Medium | Medium | Prioritize MVP features, defer nice-to-haves, agile sprints |