Product Roadmap ROI Predictor

Business Problem

Product teams at banking apps face a critical challenge: prioritizing which features to build and bugs to fix from thousands of user reviews. With limited engineering resources, teams need to identify high-impact opportunities that will maximize return on investment (ROI) in terms of user satisfaction, retention, and app ratings.

Current Pain Points:

Volume overload: Banking apps receive 25,000+ reviews annually across multiple platforms
Subjective prioritization: Feature requests are often prioritized based on gut feeling or loudest voices, not data
Unclear ROI: No systematic way to measure which fixes will have the biggest impact on user satisfaction
Delayed insights: Manual review analysis is slow, causing teams to miss time-sensitive issues

Business Impact:

A 0.5 star rating increase can improve conversion rates by 20-30% (industry benchmark)
Fixing high-frequency negative issues can reduce churn by 10-15%
Addressing the "right" complaints can increase App Store visibility and organic downloads

Goal: Build a data-driven system to analyze app reviews and predict which product roadmap decisions will deliver the highest ROI based on:

Issue frequency - How many users are affected?
Sentiment intensity - How unhappy/happy are users?
Impact correlation - Which issues correlate with low ratings and churn?

This project extracts 21+ features from banking app reviews (crashes, login issues, transfer problems, etc.) to identify and prioritize high-impact product improvements.

Project Status

Phase 1 Complete: Feature extraction, EDA, and baseline modeling finished

Current Phase: Building ROI prediction model to prioritize product improvements

Current Progress

Completed Work

1. Feature Extraction (21 features)

Pattern matching: 10 issue categories (crash, login, mobile deposit, performance, frustration, etc.)
Sentiment analysis: 4 VADER scores (compound, positive, negative, neutral)
Text statistics: 7 metrics (length, word count, caps ratio, punctuation)

2. Exploratory Data Analysis

Analyzed 25,000 reviews from 5 banking apps (Chase, Citi, BofA, Capital One, Wells Fargo)
Key findings:
- Top issues: Crashes (24%), Login (22%), Frustration (24%)
- Highest negative ROI: Frustration (impact score: 997)
- Sentiment vs star rating correlation: 0.85+
- Identified data quality issues (VADER sarcasm detection, pattern overlap)

3. Baseline Classification Models

Goal: Predict star ratings (1-5) from extracted features
Models tested: Logistic Regression vs Random Forest
Best model: Logistic Regression with class_weight='balanced'
- Accuracy: 50%
- Macro F1: 0.40
- Beats naive baseline (35%) by 43%

Key Findings

What Worked:

Sentiment features most predictive (34-35% total importance)
Feature engineering captured meaningful signals
Logistic Regression outperformed Random Forest despite lower accuracy
Model learned ordinal relationships (predicts adjacent ratings when wrong)

What Didn't Work:

Only 50% accuracy - not production-ready for exact rating prediction
Cannot reliably predict middle ratings (2-4 stars) due to class imbalance
Random Forest achieved higher accuracy (61%) but only by predicting 1 and 5 stars
VADER sentiment analysis fails on sarcasm, mixed reviews, context

Error Analysis (363 big errors analyzed):

60% of errors: Model predicts too high (actual 1-star → predicted 4-5 stars)
- Root cause: VADER scores misleadingly positive
- Example: "After 2 years of fighting this bank" → sentiment +0.97
30% of errors: Model predicts too low (actual 5-star → predicted 1-2 stars)
- Root cause: Mentions problems in passing even when overall positive
10% of errors: Middle ratings confused

Where Models Need Improvement

1. Class Imbalance (Primary Bottleneck)

Classes 2-4 only represent 9-11% of data each
Model defaults to predicting majority classes (1 and 5)
F1-scores for middle ratings: 0.18-0.24 (very poor)

2. Sentiment Analysis Quality

VADER cannot detect sarcasm: "Not the best app!" → +0.89 sentiment
Mixed reviews averaged: "works great BUT crashes" → positive score
Context blindness: "I changed banks" in review about NEW bank → neutral

3. Feature Quality

has_satisfaction captures both "satisfied" and "NOT satisfied"
Pattern features don't distinguish "had crash" vs "has crash"
No n-grams or contextual features

4. Evaluation Limitations

Single train/test split (no cross-validation)
No hyperparameter tuning
Default model parameters used

Possible Next Attempts

Immediate Improvements (Low Effort, High Impact):

Fix pattern matching
- Split has_satisfaction into positive vs negative mentions
- Use negative lookahead regex: (?<!not |un|dis)satisf
Address class imbalance
- SMOTE (Synthetic Minority Over-sampling)
- More aggressive class weights: {1: 1, 2: 5, 3: 5, 4: 5, 5: 1}
Cross-validation
- 5-fold stratified CV to validate results

Future Improvements (Higher Effort, Higher Impact): 4. Better sentiment analysis

Replace VADER with BERT or RoBERTa fine-tuned on app reviews
Aspect-based sentiment (crash sentiment vs overall sentiment)
Would fix 60% of big errors

Ordinal regression
- Treat ratings as ordered (1 < 2 < 3 < 4 < 5) instead of independent classes
- Penalize adjacent errors less
Advanced features
- TF-IDF (top 100-500 words)
- N-grams for phrases ("face id", "customer service")
- Sentence-level sentiment
- App version, device type metadata
Better models
- XGBoost/LightGBM for better class imbalance handling
- Ensemble methods

ROI Prediction Model (Current Focus):

Pivot from exact rating prediction to ROI prediction
Use feature importance + frequency to identify high-impact improvements
Answer: "If we fix X, how much will ratings improve?"

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data_collection		data_collection
feature_extraction		feature_extraction
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
README.md		README.md
full_scraper.py		full_scraper.py
requirements.txt		requirements.txt
test_scraper.py		test_scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Product Roadmap ROI Predictor

Business Problem

Project Status

Current Progress

Completed Work

Key Findings

Where Models Need Improvement

Possible Next Attempts

Table of Contents

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Product Roadmap ROI Predictor

Business Problem

Project Status

Current Progress

Completed Work

Key Findings

Where Models Need Improvement

Possible Next Attempts

Table of Contents

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages