Vitality × Spotify Rewards — Product Analytics Case Study

The Problem

Do rewards change behaviour, or attract people who are already active?

Rewards programmes that integrate third-party lifestyle products face a core measurement challenge. Without a structured pilot and bias controls, any observed improvement in activity or retention cannot be cleanly attributed to the reward itself. This case study builds the analytics layer to evaluate that question for a hypothetical Vitality-style programme integrating Spotify Africa.

Measurement Design

12-Week Pilot Structure

Four weeks of pre-activation baseline data followed by eight weeks post reward activation, with an opt-in cohort measured against a non-engaged control group throughout.

Week 5: Reward activation — members opt in to Spotify benefit

Weeks 1–4 · Pre-activation

Weeks 5–12 · Post-activation (measurement window)

Pre Period

Baseline capture

Activity scores, Spotify engagement, and cohort equivalence tested before any reward exposure.

Post Period (Primary Window)

Behavioural measurement

Weekly activity events, stream counts, and monthly retention tracked. Engaged cohort compared against both baseline and control group.

Key Findings · Activity

Activity Score — Pre vs Post Activation

Average weekly activity score by cohort. Engaged members improved from their own baseline; the control group showed minimal change.

Weekly Activity Score (avg, out of 100)

Engaged cohort n=214 · Control cohort n=186 · Synthetic data

Engaged · Pre-activation

Engaged · Post-activation

Control · Pre-activation

Control · Post-activation

Pre-activation baseline

Post-activation score

Selection Bias — Documented Confound

Opt-in members entered the pilot with a baseline activity score of 63 vs 49 for the control group — a 14-point gap before the reward activated. Members who chose to participate were already more active.

The post-pilot improvement in the engaged cohort (+14 points) cannot be cleanly attributed to the reward. A randomised assignment design would be required to establish causal direction.

Bias Controls

Baseline Equivalence Test

Cohort characteristics at week 1 (before any reward exposure). A well-designed experiment would show comparable baselines. These results show they do not.

Engaged Cohort (Opt-In)

Avg activity score 63.2

Weekly exercise events 4.1

Spotify streams / week 38

Prior 90-day retention 81%

Control Cohort (Non-Opt-In)

Avg activity score 49.1

Weekly exercise events 2.6

Spotify streams / week 21

Prior 90-day retention 64%

Key Findings · Retention

Cohort Retention at Week 8

Monthly retention tracked by engagement tier. The engaged cohort retained at a materially higher rate by the 8-week mark — though baseline differences mean this should be read as directional, not causal.

8-Week Retention Rate

% of cohort still active

Engaged

74%

Control

52%

Spotify Streams — Pre vs Post

Engaged cohort · avg weekly streams

Pre (wk 1–4)

Post (wk 5–12)

Data Model

Warehouse-Style Star Schema

Designed for Power BI with clearly separated dimension and fact layers. All tables are generated synthetically via the Python pipeline.

Dimensions

dim_member dim_week dim_month dim_reward dim_content_category

Facts

fact_activity_weekly fact_spotify_weekly fact_campaign_exposure fact_reward_events fact_retention_monthly

Analysis-ready outputs: member_week_pilot.csv and member_summary.csv

Product Implications

What the Analysis Suggests

Tier the reward structure

A single reward tier treats all members equally. Tiering by engagement level would sharpen impact and reduce cost-per-active-member for lower-engaged segments.

Retention lever, not activation lever

The reward performs best for members already engaged. Inactive segments require a different intervention — the music reward alone is unlikely to shift their behavioural baseline.

Randomised design next cycle

Opt-in selection bias limits causal claims. Randomly assigning reward eligibility in a future pilot would allow attributable measurement rather than directional correlation.

Document limitations explicitly

Surfacing the bias risk in reporting — rather than presenting headline numbers without context — is what makes the analysis credible for product and partnerships decisions.

Stack

Tools & Libraries

Python pandas numpy matplotlib seaborn Jupyter Notebooks Power BI