End-to-End User Journey Analysis in a Fragmented SPA & Cross-Device Environment

Role: Senior Digital Data Analyst

Note

The dashboard reported a 42% conversion rate.
After tracking audit and journey reconstruction, the reliable number was 31% — a gap driven by measurement issues, not user behavior alone.

Key Results

Conversion corrected: 42% → 31%
28% of paid sessions misattributed (UTM loss)
19% of users stuck in pricing loops
–44% conversion for looping users

The Problem

A subscription-based digital product — web SPA + mobile app (iOS/Android) — was reporting a 42% conversion rate from account creation to paid subscription. The metric looked stable. Stakeholders were not alarmed.

But the business was still missing its growth targets. Support tickets mentioned confusion. The product team noticed high drop-off on the pricing page but couldn't explain it. Marketing saw paid campaigns performing well on last-click attribution but suspected something was off further down the funnel.

The ask: figure out what's actually happening between account creation and subscription.

Business Context

Product: NovaPlus — a SaaS platform with a freemium-to-paid conversion model
Platforms: Web (React SPA), iOS app, Android app
Data stack: GA4-style event tracking → BigQuery → Looker
Funnel stages: Registration → Pricing viewed → Plan compared → Payment started → Subscription started
Reported conversion (dashboard): 42% (account created → subscription started)
Stakeholders: Head of Product, Head of Growth, Head of Data

What This Project Demonstrates

This is not a clean-data exercise. The entire point is to work through the mess:

Reconstructing real user journeys from raw, inconsistent event data
Identifying and quantifying tracking gaps before drawing any conclusions
Comparing dashboard metrics against event-level reality
Detecting behavioral patterns (loops, exits, cross-device handoffs) that aggregated funnels hide
Connecting quantitative signals to qualitative evidence
Translating analysis into concrete, prioritized business recommendations

Repository Structure

├── README.md
├── requirements.txt
├── .gitignore
├── LICENSE
├── docs/
│   ├── 01_problem_framing.md          # Business question, stakeholder context, analysis scope
│   ├── 02_tracking_audit.md           # Data quality checks: completeness, accuracy, consistency
│   ├── 03_analysis_methodology.md     # Mixed-method approach and theoretical frameworks
│   ├── 04_quantitative_analysis.md    # Funnel, journeys, loops, platform comparison, Markov, survival
│   ├── 05_qualitative_analysis.md     # Session replays, support tickets, UX friction coding
│   ├── 06_business_recommendations.md # Prioritized actions by tracking / product / marketing
│   └── 07_executive_summary.md        # One-page senior summary
├── sql/
│   ├── 01_event_quality_audit.sql     # Missing events, null rates, duplicate detection
│   ├── 02_session_reconstruction.sql  # Session-level journey rebuild from raw events
│   ├── 03_funnel_baseline.sql         # Dashboard funnel vs. reconstructed funnel
│   ├── 04_journey_sequences.sql       # Event path sequences per user
│   ├── 05_loop_detection.sql          # Pricing loop identification and loop counts
│   ├── 06_web_vs_app_comparison.sql   # Platform-level funnel and behavior comparison
│   ├── 07_utm_loss_analysis.sql       # UTM parameter persistence across SPA navigation
│   └── 08_markov_transition_matrix.sql # Transition probabilities between journey states
├── python/
│   ├── generate_synthetic_events.py   # Generates sample_events.csv (10,000 users)
│   ├── journey_analysis.py            # Session reconstruction, loop detection, funnel metrics
│   ├── markov_analysis.py             # Markov transition matrix and friction state analysis
│   ├── survival_analysis.py           # Time-to-conversion by platform and traffic source
│   └── visualizations.py             # Generates all 5 charts → saves to /images/
├── notebooks/
│   └── user_journey_analysis_walkthrough.ipynb  # Step-by-step analysis walkthrough
├── data/
│   ├── sample_events.csv              # Synthetic event data (generated by Python script)
│   └── data_dictionary.md            # Field definitions and accepted values
├── images/
│   ├── funnel_comparison.png
│   ├── pricing_loops.png
│   ├── web_vs_app_conversion.png
│   ├── utm_loss.png
│   └── time_to_conversion.png
└── outputs/
    ├── executive_summary.md           #  one-page summary
    ├── key_findings.md                # Numbered findings with supporting metrics
    └── recommendations.md             # Prioritized action plan

Setup

git clone https://github.com/maissabounar/cross-device-journey-analysis.git
cd cross-device-journey-analysis

pip install -r requirements.txt

python python/generate_synthetic_events.py   # generates data/sample_events.csv
python python/journey_analysis.py
python python/markov_analysis.py
python python/survival_analysis.py
python python/visualizations.py             # saves charts to images/

To explore interactively:

jupyter notebook notebooks/user_journey_analysis_walkthrough.ipynb

Methodology

Phase 1 — Tracking Audit Before any analysis, assess whether the data is trustworthy. Check for missing events by platform, duplicate sessions, null user IDs, and UTM drop-off. This step alone changed the headline conversion number from 42% to 31%.

Phase 2 — Quantitative Analysis Reconstruct sessions from raw events. Build event path sequences. Detect behavioral loops. Compare platforms. Measure UTM persistence. Apply Markov chain analysis to identify high-friction transitions. Run survival analysis on time-to-conversion.

Phase 3 — Qualitative Validation Layer in session replay tags, support ticket themes, and UX friction codes. The numbers tell you where things break. The qualitative evidence tells you why.

Phase 4 — Synthesis and Recommendations Bring quant signals and qualitative evidence together into a coherent narrative. Prioritize recommendations by effort-to-impact. Distinguish tracking problems from product problems from marketing problems.

Key Findings

Finding	Metric
Reliable funnel conversion (after data quality fixes)	31% (dashboard reported 42%)
Android missing `subscription_started` events	~18% of Android subscriptions untracked
UTM parameters lost after SPA route change	28% of paid traffic sessions
Users who looped on pricing 3+ times	19% of total users reaching pricing
Conversion rate: 3+ pricing loops vs. linear journey	–44% lower conversion
Paid traffic vs. organic: time-to-conversion	Paid: faster (median 2.1 days) / Organic: slower (4.8 days)
Paid traffic 90-day retention signal	Weaker than organic
Support tickets mentioning plan confusion	34% of all friction-related tickets

Important

The primary issue uncovered in this analysis was not product performance.
It was measurement bias introduced by tracking gaps (missing events, UTM loss, platform inconsistencies).

Decision Impact

The findings from this analysis changed what the business was about to do — and stopped one expensive mistake.

Before this analysis:

Conversion reported at 42%. Growth targets and budgets set against this figure.
Product team planning a full funnel redesign to address the conversion "problem."
Paid campaigns considered high-performing based on last-click ROAS.
Android underperformance versus iOS attributed to audience quality differences.

After:

What changed	Consequence
Conversion corrected from 42% to 31%	Growth targets need recalibration. The gap is a measurement problem, not a product problem — which changes the scope and urgency of the product roadmap entirely.
28% of paid sessions were misattributed as direct	Paid ROAS was inflated. Media spend decisions based on current data should be paused until attribution is corrected.
Android underreporting identified as a tracking bug	The Android "audience quality gap" was a measurement gap. Fixing server-side tracking will likely close most of the iOS/Android conversion difference.
Pricing loop root cause confirmed as confusion, not deliberation	A targeted fix (rewrite plan descriptions, fix tooltip bug, fix price toggle) is sufficient. A full funnel redesign is not needed. Scope and cost reduced significantly.
Payment drop-off is trust-driven, not intent-driven	Low-cost trust signals should be tested before any checkout flow restructuring. Users reaching the payment form have already decided — they are abandoning for a different reason.
Organic traffic undervalued in the attribution model	Organic attribution was more accurate than paid. Organic users show stronger downstream engagement. The channel mix decision was being made on biased data.

The single highest-priority action — fixing UTM persistence in the SPA — requires roughly half a day of engineering work. It corrects the most consequential measurement error in the stack.

Visuals

Charts generated from data/sample_events.csv by python/visualizations.py.

Dashboard vs. Reconstructed Funnel

The 11-point gap between the reported and cleaned conversion rate is entirely a measurement problem.

Conversion Rate by Pricing Loop Count

The conversion cliff appears at 3+ loops. Users looping more than twice are not deciding — they are stuck.

Funnel Conversion by Platform

iOS shows the cleanest funnel. Android's subscription step gap reflects the client-side tracking bug.

UTM Loss Rate by Traffic Source

Paid sessions on web lose attribution after SPA navigation. Organic is unaffected.

Time to Paid Subscription — Paid vs. Organic

Paid converts faster early; organic converges to a similar or higher rate by day 30.

Business Recommendations

Tracking (fix first — these corrupt every other metric)

Fix Android subscription_started — implement server-side fallback using transaction ID
Fix UTM persistence on SPA route changes — store UTMs in sessionStorage on first load
Normalize Android event names — plan_compare → plan_compared, payment_initiated → payment_started
Add weekly data quality alerts — null rate, event count deviation, UTM persistence rate

Product (reduce friction) 5. Redesign the plan comparison page — rewrite benefit descriptions, fix tooltip bug, fix price toggle, reduce comparison table from 23 to 8 rows 6. Add trust signals to the payment page — SSL badge, cancellation policy, money-back statement 7. Improve payment error messages — map processor error codes to plain-language user messages

Marketing (fix attribution first, then re-evaluate) 8. Pause paid spend optimization decisions until UTM fix is live and 4 weeks of clean data are available 9. Invest in organic evaluation-phase content — FAQs, plan comparison guides, use case examples

Skills Demonstrated

Technical

BigQuery Standard SQL: CTEs, window functions, session reconstruction, Markov matrices
Python: pandas, event simulation, Markov chains, Kaplan-Meier survival analysis
GA4 / event-based tracking architecture
SPA tracking issues: UTM persistence, history API edge cases, cross-device stitching

Analytical

Data quality auditing before analysis (completeness, accuracy, consistency, timeliness)
Funnel analysis: reported vs. reconstructed
Behavioral sequence analysis and loop detection
Markov chain transition probability analysis
Survival analysis for time-to-event segmentation
Mixed-method triangulation: quant signals + qualitative evidence + business context

Strategic

Stakeholder-level problem framing
Translating technical findings into product, engineering, and marketing actions
Prioritizing recommendations by evidence strength and implementation cost
Preventing expensive decisions based on incorrect data

How to Read This Project

If you have 5 minutes: Read outputs/executive_summary.md and outputs/key_findings.md.

If you want the full story: Start with docs/01_problem_framing.md, then follow the numbered docs in order.

If you want to explore interactively: Open notebooks/user_journey_analysis_walkthrough.ipynb.

If you want to see the SQL: All 8 queries are in sql/, BigQuery Standard SQL, ready to run against project.analytics.raw_events.

Data Limitations

All data in this project is fully synthetic, generated by python/generate_synthetic_events.py with a fixed random seed for reproducibility.

The data is designed to demonstrate methodology, not to represent any real product or business. Specifically:

Behavioral distributions (loop rates, conversion rates, platform split) approximate realistic patterns but are not measurements of any real system
The qualitative evidence described in docs/05_qualitative_analysis.md — session replays, support tickets — is illustrative, not drawn from real observations
The 42% → 31% conversion gap is a constructed scenario designed to demonstrate how tracking audits expose measurement errors
Survival analysis results reflect the data generation parameters, not real user behavior

The SQL, Python code, and analytical methodology are production-applicable. The same approach can be applied to real GA4-style event data with minimal modification.

Optional Next Steps

A/B test design for the plan comparison page redesign
Predictive churn model on the subscription cohort
Automated data quality monitoring with dbt tests
Attribution model comparison: last-click vs. data-driven (post UTM fix)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End-to-End User Journey Analysis in a Fragmented SPA & Cross-Device Environment

Key Results

The Problem

Business Context

What This Project Demonstrates

Repository Structure

Setup

Methodology

Key Findings

Decision Impact

Visuals

Dashboard vs. Reconstructed Funnel

Conversion Rate by Pricing Loop Count

Funnel Conversion by Platform

UTM Loss Rate by Traffic Source

Time to Paid Subscription — Paid vs. Organic

Business Recommendations

Skills Demonstrated

How to Read This Project

Data Limitations

Optional Next Steps

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
docs		docs
images		images
notebooks		notebooks
outputs		outputs
python		python
sql		sql
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

End-to-End User Journey Analysis in a Fragmented SPA & Cross-Device Environment

Key Results

The Problem

Business Context

What This Project Demonstrates

Repository Structure

Setup

Methodology

Key Findings

Decision Impact

Visuals

Dashboard vs. Reconstructed Funnel

Conversion Rate by Pricing Loop Count

Funnel Conversion by Platform

UTM Loss Rate by Traffic Source

Time to Paid Subscription — Paid vs. Organic

Business Recommendations

Skills Demonstrated

How to Read This Project

Data Limitations

Optional Next Steps

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages