HMM User Segmentation

In the following notebook I use a Hidden Markov Model to segment users. If you have similar data, I hope this can be of help!

HMM User Segmentation

This is a V1 analysis notebook exploring a proof of concept for segmenting users into behavioral states using a Hidden Markov Model. The actual data, credentials, schema names, and any company-specific identifiers have been redacted — this is shared as an example of how I approach a problem like this from scratch.

What It Does

The notebook uses a Gaussian Hidden Markov Model (HMM) to segment users into latent behavioral states based on their weekly activity. The idea is that instead of giving each user a single static label, the HMM lets you see how users move between states over time — which is a lot more useful for understanding engagement, churn risk, and lifecycle patterns.

The pipeline:

Data pull — Samples ~60,000 users from a transactional database via parameterized SQL.
Feature engineering — Builds behavioral features at the ride/session level: day-of-week patterns, cancellations, payment mode, platform (iOS/Android), rolling price averages, time on platform.
Weekly aggregation — Reshapes into a (user, week) panel and forward-fills signals across inactive weeks so sequences stay continuous.
Scaling — StandardScaler on continuous features; binary features passed through as-is.
HMM fitting — Fits a GaussianHMM (diagonal covariance) via hmmlearn, model selection tested across 2–13 states.
State assignment — Predicts state sequences per user and pulls out each user's dominant state across their history.
Visualization — Transition matrix heatmap, state mean profiles, segment distributions.

Why HMM

Personas are used to construct fictional characters that, based on data, can represent the different user types that might use our service. There are many classification or categorization methods that range from theory based (grouping from an idea or notion) to purely data driven groupings.

We have rich, time series data hence it makes sense for us to use both time and features in any modeling efforts. "Personas" are often conceptualized as static and perpetual: "She's a weekend-partier"; "He's an occasional commuter," but the reality is everybody has some degree of 'weekend partier-ness' and 'commuter-ness' in them, can switch between them, AND can have the propensity to evolve over time. For example, as a user matures they can transition from a weekend partier to a commuter - age comes for us all. Our thinking, and therefore our model, should be able to accommodate this evolution.

To address our problem statement and accommodate customer evolution, time, and available features, I turned to a Hidden Markov Model (HMM).

In (very) short, an HMM uses observable signals or data to estimate “hidden” or observable states. In the case with our data, we ideally want to understand if a user is going to be in a position where they would be likely to take a ride in a given week. If we have this information we can try to positively affect the likelihood of ordering a ride with coupons or communication, as well as try to encourage a user to be in this “potential-ride-ordering-state” more often.

This “state” however, is not something we can observe directly. What we do instead is use the data listed above to try and predict what state a user will be in based on this information, as well as, their current state. We label the states below to make them more concrete.

What's Redacted

Database credentials, hostnames, schema and table names
Country/city identifiers (replaced with COUNTRY_A, COUNTRY_B, etc.)
Product-specific column names and business logic

Redacted sections are marked with # REDACTED: comments throughout.

Stack

Tool	Purpose
`pandas` / `numpy`	Data manipulation
`hmmlearn`	Gaussian HMM
`sklearn`	Feature scaling
`matplotlib` / `seaborn`	Visualization
`psycopg2`	Database connection
`graphviz`	Transition graph

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
HMM.ipynb		HMM.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HMM User Segmentation

What It Does

Why HMM

What's Redacted

Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HMM User Segmentation

What It Does

Why HMM

What's Redacted

Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages