Role: Senior Digital Data Analyst
Note
The dashboard reported a 42% conversion rate.
After tracking audit and journey reconstruction, the reliable number was 31% — a gap driven by measurement issues, not user behavior alone.
- Conversion corrected: 42% → 31%
- 28% of paid sessions misattributed (UTM loss)
- 19% of users stuck in pricing loops
- –44% conversion for looping users
A subscription-based digital product — web SPA + mobile app (iOS/Android) — was reporting a 42% conversion rate from account creation to paid subscription. The metric looked stable. Stakeholders were not alarmed.
But the business was still missing its growth targets. Support tickets mentioned confusion. The product team noticed high drop-off on the pricing page but couldn't explain it. Marketing saw paid campaigns performing well on last-click attribution but suspected something was off further down the funnel.
The ask: figure out what's actually happening between account creation and subscription.
- Product: NovaPlus — a SaaS platform with a freemium-to-paid conversion model
- Platforms: Web (React SPA), iOS app, Android app
- Data stack: GA4-style event tracking → BigQuery → Looker
- Funnel stages: Registration → Pricing viewed → Plan compared → Payment started → Subscription started
- Reported conversion (dashboard): 42% (account created → subscription started)
- Stakeholders: Head of Product, Head of Growth, Head of Data
This is not a clean-data exercise. The entire point is to work through the mess:
- Reconstructing real user journeys from raw, inconsistent event data
- Identifying and quantifying tracking gaps before drawing any conclusions
- Comparing dashboard metrics against event-level reality
- Detecting behavioral patterns (loops, exits, cross-device handoffs) that aggregated funnels hide
- Connecting quantitative signals to qualitative evidence
- Translating analysis into concrete, prioritized business recommendations
├── README.md
├── requirements.txt
├── .gitignore
├── LICENSE
├── docs/
│ ├── 01_problem_framing.md # Business question, stakeholder context, analysis scope
│ ├── 02_tracking_audit.md # Data quality checks: completeness, accuracy, consistency
│ ├── 03_analysis_methodology.md # Mixed-method approach and theoretical frameworks
│ ├── 04_quantitative_analysis.md # Funnel, journeys, loops, platform comparison, Markov, survival
│ ├── 05_qualitative_analysis.md # Session replays, support tickets, UX friction coding
│ ├── 06_business_recommendations.md # Prioritized actions by tracking / product / marketing
│ └── 07_executive_summary.md # One-page senior summary
├── sql/
│ ├── 01_event_quality_audit.sql # Missing events, null rates, duplicate detection
│ ├── 02_session_reconstruction.sql # Session-level journey rebuild from raw events
│ ├── 03_funnel_baseline.sql # Dashboard funnel vs. reconstructed funnel
│ ├── 04_journey_sequences.sql # Event path sequences per user
│ ├── 05_loop_detection.sql # Pricing loop identification and loop counts
│ ├── 06_web_vs_app_comparison.sql # Platform-level funnel and behavior comparison
│ ├── 07_utm_loss_analysis.sql # UTM parameter persistence across SPA navigation
│ └── 08_markov_transition_matrix.sql # Transition probabilities between journey states
├── python/
│ ├── generate_synthetic_events.py # Generates sample_events.csv (10,000 users)
│ ├── journey_analysis.py # Session reconstruction, loop detection, funnel metrics
│ ├── markov_analysis.py # Markov transition matrix and friction state analysis
│ ├── survival_analysis.py # Time-to-conversion by platform and traffic source
│ └── visualizations.py # Generates all 5 charts → saves to /images/
├── notebooks/
│ └── user_journey_analysis_walkthrough.ipynb # Step-by-step analysis walkthrough
├── data/
│ ├── sample_events.csv # Synthetic event data (generated by Python script)
│ └── data_dictionary.md # Field definitions and accepted values
├── images/
│ ├── funnel_comparison.png
│ ├── pricing_loops.png
│ ├── web_vs_app_conversion.png
│ ├── utm_loss.png
│ └── time_to_conversion.png
└── outputs/
├── executive_summary.md # one-page summary
├── key_findings.md # Numbered findings with supporting metrics
└── recommendations.md # Prioritized action plan
git clone https://github.com/maissabounar/cross-device-journey-analysis.git
cd cross-device-journey-analysis
pip install -r requirements.txt
python python/generate_synthetic_events.py # generates data/sample_events.csv
python python/journey_analysis.py
python python/markov_analysis.py
python python/survival_analysis.py
python python/visualizations.py # saves charts to images/To explore interactively:
jupyter notebook notebooks/user_journey_analysis_walkthrough.ipynbPhase 1 — Tracking Audit Before any analysis, assess whether the data is trustworthy. Check for missing events by platform, duplicate sessions, null user IDs, and UTM drop-off. This step alone changed the headline conversion number from 42% to 31%.
Phase 2 — Quantitative Analysis Reconstruct sessions from raw events. Build event path sequences. Detect behavioral loops. Compare platforms. Measure UTM persistence. Apply Markov chain analysis to identify high-friction transitions. Run survival analysis on time-to-conversion.
Phase 3 — Qualitative Validation Layer in session replay tags, support ticket themes, and UX friction codes. The numbers tell you where things break. The qualitative evidence tells you why.
Phase 4 — Synthesis and Recommendations Bring quant signals and qualitative evidence together into a coherent narrative. Prioritize recommendations by effort-to-impact. Distinguish tracking problems from product problems from marketing problems.
| Finding | Metric |
|---|---|
| Reliable funnel conversion (after data quality fixes) | 31% (dashboard reported 42%) |
Android missing subscription_started events |
~18% of Android subscriptions untracked |
| UTM parameters lost after SPA route change | 28% of paid traffic sessions |
| Users who looped on pricing 3+ times | 19% of total users reaching pricing |
| Conversion rate: 3+ pricing loops vs. linear journey | –44% lower conversion |
| Paid traffic vs. organic: time-to-conversion | Paid: faster (median 2.1 days) / Organic: slower (4.8 days) |
| Paid traffic 90-day retention signal | Weaker than organic |
| Support tickets mentioning plan confusion | 34% of all friction-related tickets |
Important
The primary issue uncovered in this analysis was not product performance.
It was measurement bias introduced by tracking gaps (missing events, UTM loss, platform inconsistencies).
The findings from this analysis changed what the business was about to do — and stopped one expensive mistake.
Before this analysis:
- Conversion reported at 42%. Growth targets and budgets set against this figure.
- Product team planning a full funnel redesign to address the conversion "problem."
- Paid campaigns considered high-performing based on last-click ROAS.
- Android underperformance versus iOS attributed to audience quality differences.
After:
| What changed | Consequence |
|---|---|
| Conversion corrected from 42% to 31% | Growth targets need recalibration. The gap is a measurement problem, not a product problem — which changes the scope and urgency of the product roadmap entirely. |
| 28% of paid sessions were misattributed as direct | Paid ROAS was inflated. Media spend decisions based on current data should be paused until attribution is corrected. |
| Android underreporting identified as a tracking bug | The Android "audience quality gap" was a measurement gap. Fixing server-side tracking will likely close most of the iOS/Android conversion difference. |
| Pricing loop root cause confirmed as confusion, not deliberation | A targeted fix (rewrite plan descriptions, fix tooltip bug, fix price toggle) is sufficient. A full funnel redesign is not needed. Scope and cost reduced significantly. |
| Payment drop-off is trust-driven, not intent-driven | Low-cost trust signals should be tested before any checkout flow restructuring. Users reaching the payment form have already decided — they are abandoning for a different reason. |
| Organic traffic undervalued in the attribution model | Organic attribution was more accurate than paid. Organic users show stronger downstream engagement. The channel mix decision was being made on biased data. |
The single highest-priority action — fixing UTM persistence in the SPA — requires roughly half a day of engineering work. It corrects the most consequential measurement error in the stack.
Charts generated from data/sample_events.csv by python/visualizations.py.
The 11-point gap between the reported and cleaned conversion rate is entirely a measurement problem.
The conversion cliff appears at 3+ loops. Users looping more than twice are not deciding — they are stuck.
iOS shows the cleanest funnel. Android's subscription step gap reflects the client-side tracking bug.
Paid sessions on web lose attribution after SPA navigation. Organic is unaffected.
Paid converts faster early; organic converges to a similar or higher rate by day 30.
Tracking (fix first — these corrupt every other metric)
- Fix Android
subscription_started— implement server-side fallback using transaction ID - Fix UTM persistence on SPA route changes — store UTMs in sessionStorage on first load
- Normalize Android event names —
plan_compare→plan_compared,payment_initiated→payment_started - Add weekly data quality alerts — null rate, event count deviation, UTM persistence rate
Product (reduce friction) 5. Redesign the plan comparison page — rewrite benefit descriptions, fix tooltip bug, fix price toggle, reduce comparison table from 23 to 8 rows 6. Add trust signals to the payment page — SSL badge, cancellation policy, money-back statement 7. Improve payment error messages — map processor error codes to plain-language user messages
Marketing (fix attribution first, then re-evaluate) 8. Pause paid spend optimization decisions until UTM fix is live and 4 weeks of clean data are available 9. Invest in organic evaluation-phase content — FAQs, plan comparison guides, use case examples
Technical
- BigQuery Standard SQL: CTEs, window functions, session reconstruction, Markov matrices
- Python: pandas, event simulation, Markov chains, Kaplan-Meier survival analysis
- GA4 / event-based tracking architecture
- SPA tracking issues: UTM persistence, history API edge cases, cross-device stitching
Analytical
- Data quality auditing before analysis (completeness, accuracy, consistency, timeliness)
- Funnel analysis: reported vs. reconstructed
- Behavioral sequence analysis and loop detection
- Markov chain transition probability analysis
- Survival analysis for time-to-event segmentation
- Mixed-method triangulation: quant signals + qualitative evidence + business context
Strategic
- Stakeholder-level problem framing
- Translating technical findings into product, engineering, and marketing actions
- Prioritizing recommendations by evidence strength and implementation cost
- Preventing expensive decisions based on incorrect data
If you have 5 minutes: Read outputs/executive_summary.md and outputs/key_findings.md.
If you want the full story: Start with docs/01_problem_framing.md, then follow the numbered docs in order.
If you want to explore interactively: Open notebooks/user_journey_analysis_walkthrough.ipynb.
If you want to see the SQL: All 8 queries are in sql/, BigQuery Standard SQL, ready to run against project.analytics.raw_events.
All data in this project is fully synthetic, generated by python/generate_synthetic_events.py with a fixed random seed for reproducibility.
The data is designed to demonstrate methodology, not to represent any real product or business. Specifically:
- Behavioral distributions (loop rates, conversion rates, platform split) approximate realistic patterns but are not measurements of any real system
- The qualitative evidence described in
docs/05_qualitative_analysis.md— session replays, support tickets — is illustrative, not drawn from real observations - The 42% → 31% conversion gap is a constructed scenario designed to demonstrate how tracking audits expose measurement errors
- Survival analysis results reflect the data generation parameters, not real user behavior
The SQL, Python code, and analytical methodology are production-applicable. The same approach can be applied to real GA4-style event data with minimal modification.
- A/B test design for the plan comparison page redesign
- Predictive churn model on the subscription cohort
- Automated data quality monitoring with dbt tests
- Attribution model comparison: last-click vs. data-driven (post UTM fix)




