Skip to content

mjyang00001/Roi-predictor

Repository files navigation

Product Roadmap ROI Predictor

Business Problem

Product teams at banking apps face a critical challenge: prioritizing which features to build and bugs to fix from thousands of user reviews. With limited engineering resources, teams need to identify high-impact opportunities that will maximize return on investment (ROI) in terms of user satisfaction, retention, and app ratings.

Current Pain Points:

  • Volume overload: Banking apps receive 25,000+ reviews annually across multiple platforms
  • Subjective prioritization: Feature requests are often prioritized based on gut feeling or loudest voices, not data
  • Unclear ROI: No systematic way to measure which fixes will have the biggest impact on user satisfaction
  • Delayed insights: Manual review analysis is slow, causing teams to miss time-sensitive issues

Business Impact:

  • A 0.5 star rating increase can improve conversion rates by 20-30% (industry benchmark)
  • Fixing high-frequency negative issues can reduce churn by 10-15%
  • Addressing the "right" complaints can increase App Store visibility and organic downloads

Goal: Build a data-driven system to analyze app reviews and predict which product roadmap decisions will deliver the highest ROI based on:

  1. Issue frequency - How many users are affected?
  2. Sentiment intensity - How unhappy/happy are users?
  3. Impact correlation - Which issues correlate with low ratings and churn?

This project extracts 21+ features from banking app reviews (crashes, login issues, transfer problems, etc.) to identify and prioritize high-impact product improvements.


Project Status

Phase 1 Complete: Feature extraction, EDA, and baseline modeling finished

Current Phase: Building ROI prediction model to prioritize product improvements


Current Progress

Completed Work

1. Feature Extraction (21 features)

  • Pattern matching: 10 issue categories (crash, login, mobile deposit, performance, frustration, etc.)
  • Sentiment analysis: 4 VADER scores (compound, positive, negative, neutral)
  • Text statistics: 7 metrics (length, word count, caps ratio, punctuation)

2. Exploratory Data Analysis

  • Analyzed 25,000 reviews from 5 banking apps (Chase, Citi, BofA, Capital One, Wells Fargo)
  • Key findings:
    • Top issues: Crashes (24%), Login (22%), Frustration (24%)
    • Highest negative ROI: Frustration (impact score: 997)
    • Sentiment vs star rating correlation: 0.85+
    • Identified data quality issues (VADER sarcasm detection, pattern overlap)

3. Baseline Classification Models

  • Goal: Predict star ratings (1-5) from extracted features
  • Models tested: Logistic Regression vs Random Forest
  • Best model: Logistic Regression with class_weight='balanced'
    • Accuracy: 50%
    • Macro F1: 0.40
    • Beats naive baseline (35%) by 43%

Key Findings

What Worked:

  • Sentiment features most predictive (34-35% total importance)
  • Feature engineering captured meaningful signals
  • Logistic Regression outperformed Random Forest despite lower accuracy
  • Model learned ordinal relationships (predicts adjacent ratings when wrong)

What Didn't Work:

  • Only 50% accuracy - not production-ready for exact rating prediction
  • Cannot reliably predict middle ratings (2-4 stars) due to class imbalance
  • Random Forest achieved higher accuracy (61%) but only by predicting 1 and 5 stars
  • VADER sentiment analysis fails on sarcasm, mixed reviews, context

Error Analysis (363 big errors analyzed):

  • 60% of errors: Model predicts too high (actual 1-star → predicted 4-5 stars)
    • Root cause: VADER scores misleadingly positive
    • Example: "After 2 years of fighting this bank" → sentiment +0.97
  • 30% of errors: Model predicts too low (actual 5-star → predicted 1-2 stars)
    • Root cause: Mentions problems in passing even when overall positive
  • 10% of errors: Middle ratings confused

Where Models Need Improvement

1. Class Imbalance (Primary Bottleneck)

  • Classes 2-4 only represent 9-11% of data each
  • Model defaults to predicting majority classes (1 and 5)
  • F1-scores for middle ratings: 0.18-0.24 (very poor)

2. Sentiment Analysis Quality

  • VADER cannot detect sarcasm: "Not the best app!" → +0.89 sentiment
  • Mixed reviews averaged: "works great BUT crashes" → positive score
  • Context blindness: "I changed banks" in review about NEW bank → neutral

3. Feature Quality

  • has_satisfaction captures both "satisfied" and "NOT satisfied"
  • Pattern features don't distinguish "had crash" vs "has crash"
  • No n-grams or contextual features

4. Evaluation Limitations

  • Single train/test split (no cross-validation)
  • No hyperparameter tuning
  • Default model parameters used

Possible Next Attempts

Immediate Improvements (Low Effort, High Impact):

  1. Fix pattern matching

    • Split has_satisfaction into positive vs negative mentions
    • Use negative lookahead regex: (?<!not |un|dis)satisf
  2. Address class imbalance

    • SMOTE (Synthetic Minority Over-sampling)
    • More aggressive class weights: {1: 1, 2: 5, 3: 5, 4: 5, 5: 1}
  3. Cross-validation

    • 5-fold stratified CV to validate results

Future Improvements (Higher Effort, Higher Impact): 4. Better sentiment analysis

  • Replace VADER with BERT or RoBERTa fine-tuned on app reviews
  • Aspect-based sentiment (crash sentiment vs overall sentiment)
  • Would fix 60% of big errors
  1. Ordinal regression

    • Treat ratings as ordered (1 < 2 < 3 < 4 < 5) instead of independent classes
    • Penalize adjacent errors less
  2. Advanced features

    • TF-IDF (top 100-500 words)
    • N-grams for phrases ("face id", "customer service")
    • Sentence-level sentiment
    • App version, device type metadata
  3. Better models

    • XGBoost/LightGBM for better class imbalance handling
    • Ensemble methods

ROI Prediction Model (Current Focus):

  • Pivot from exact rating prediction to ROI prediction
  • Use feature importance + frequency to identify high-impact improvements
  • Answer: "If we fix X, how much will ratings improve?"

Table of Contents


Sections below to be completed...

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors