As Lead Analytics Engineer, you'll build the data infrastructure that powers product decisions, customer insights, and business intelligence across Riot. This case study simulates a real project you'd tackle in your first 90 days.
You've just extracted data from our Stripe account. The Sales and Product teams need reliable reporting on customer segments, revenue metrics, and our Organizations feature performance. You need to build clean, reusable data models that will serve as the foundation for our analytics.
We're providing two CSV files that represent our Stripe data
stripe_customers.csvstripe_subscriptions.csv
Data Dictionary:
metadata_organization_id: If present, this customer's Workspace belongs to an Organization (our enterprise feature allowing one parent entity to manage multiple workspaces)- Multiple customers can share the same
organization_id(representing different Workspaces under one parent Organization) metadata_workspace_id: Links to our product database (each Stripe customer = one RIOT Workspace)mrr: Monthly Recurring Revenue in USDmodule: Which RIOT product module (Awareness, Simulation, Slash, Sonar, Studio, Smishing, Vishing)status: active, canceled, past_due, trialing- One customer can have multiple subscriptions (different modules)
Create a dbt-style data model to support analytics
Consider: How would someone use these tables to answer business questions about revenue, customer segments, and Organizations adoption?
Using the raw CSVs as source tables, write SQL to answer these business questions:
Query A - Customer Segmentation:
Calculate total MRR by customer segment as of today:
- Organization customers (has organization_id)
- Standalone customers (no organization_id)
- Multi-module customers (2+ active subscriptions to different modules)
Show: segment_name, customer_count, total_mrr, avg_mrr_per_customerQuery B - Module Performance:
For each module, calculate:
- Number of active subscriptions
- Total MRR from active subscriptions
- Average MRR per subscription
- Churn rate (% of all-time subscriptions that have been canceled)
Order by total MRR descendingQuery C - Organization Analysis:
For each organization, show:
- Organization ID
- Number of workspaces (distinct customers in that org)
- Total MRR across all workspaces
- Comma-separated list of unique modules in use across the org
- Earliest workspace creation date (when did this org first join RIOT?)
Only include organizations with 2+ workspacesWhat dbt tests would you add to ensure data quality?
List at least 6-8 tests across your models. For each test, specify:
- Model/table name
- Column(s) being tested
- Test type (unique, not_null, relationships, accepted_values, custom, etc.)
- Why this test matters