Skip to content

tryriot/take-home-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Lead Analytics Engineer Case Study

Your Mission

As Lead Analytics Engineer, you'll build the data infrastructure that powers product decisions, customer insights, and business intelligence across Riot. This case study simulates a real project you'd tackle in your first 90 days.

The Exercise: Stripe Data Modeling

Scenario

You've just extracted data from our Stripe account. The Sales and Product teams need reliable reporting on customer segments, revenue metrics, and our Organizations feature performance. You need to build clean, reusable data models that will serve as the foundation for our analytics.

The Data

We're providing two CSV files that represent our Stripe data

  1. stripe_customers.csv
  2. stripe_subscriptions.csv

Data Dictionary:

  • metadata_organization_id: If present, this customer's Workspace belongs to an Organization (our enterprise feature allowing one parent entity to manage multiple workspaces)
  • Multiple customers can share the same organization_id (representing different Workspaces under one parent Organization)
  • metadata_workspace_id: Links to our product database (each Stripe customer = one RIOT Workspace)
  • mrr: Monthly Recurring Revenue in USD
  • module: Which RIOT product module (Awareness, Simulation, Slash, Sonar, Studio, Smishing, Vishing)
  • status: active, canceled, past_due, trialing
  • One customer can have multiple subscriptions (different modules)

Your Task

Part 1: Design the dimensional model

Create a dbt-style data model to support analytics

Consider: How would someone use these tables to answer business questions about revenue, customer segments, and Organizations adoption?

Part 2: Write SQL queries

Using the raw CSVs as source tables, write SQL to answer these business questions:

Query A - Customer Segmentation:

Calculate total MRR by customer segment as of today:
- Organization customers (has organization_id)
- Standalone customers (no organization_id)
- Multi-module customers (2+ active subscriptions to different modules)

Show: segment_name, customer_count, total_mrr, avg_mrr_per_customer

Query B - Module Performance:

For each module, calculate:
- Number of active subscriptions
- Total MRR from active subscriptions
- Average MRR per subscription
- Churn rate (% of all-time subscriptions that have been canceled)

Order by total MRR descending

Query C - Organization Analysis:

For each organization, show:
- Organization ID
- Number of workspaces (distinct customers in that org)
- Total MRR across all workspaces
- Comma-separated list of unique modules in use across the org
- Earliest workspace creation date (when did this org first join RIOT?)

Only include organizations with 2+ workspaces

Part 3: Data Quality

What dbt tests would you add to ensure data quality?

List at least 6-8 tests across your models. For each test, specify:

  • Model/table name
  • Column(s) being tested
  • Test type (unique, not_null, relationships, accepted_values, custom, etc.)
  • Why this test matters

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published