📱 Predicting Teen Mental Health Crisis via Phone Usage

Project Type: Classification (Random Forest) | Client: Health Insurance Provider

🎯 Executive Summary

The Problem: Teen mental health crises are rising and expensive. A single "Crisis Episode" (e.g., ER visit, inpatient care) costs our insurer client $2,673. The current industry standard—flagging teens who use phones >5 hours/day—is reactive and fails to detect 34% of high-risk cases.

The Solution: We built a Random Forest Classifier to predict "High-Risk" teens based on behavioral patterns (not just volume). Our model is optimized for Recall (Safety), prioritizing the detection of at-risk teens over avoiding false alarms.

The Impact:

Recall Lift: Improved detection rate from 66% (Status Quo) → 92% (Our Model).
Financial ROI: On a test set of 600 teens, our model captured $377,512 in net value over the baseline by preventing costly crisis episodes via early intervention ($200/teen).

👥 The Team (TMP 422 - Data & Decision Analytics)

Name	Role	Responsibility
Jeitī Trujillo	Project Manager	Strategy, ROI analysis, stakeholder alignment, deck ownership.
Octavio Gzain	Technical Lead	Architecture decisions, GitHub repo management, code review.
Aishwarya Mundada	Lead Modeler	Algorithm selection (Random Forest), hyperparameter tuning.
Chinmaiye Gandhi	Data Engineer	Data cleaning, pipeline construction, feature engineering.
Maya Parra	Business Analyst	Unit economics, business value proposition, storytelling.
Mohammad Ameen	Quality Assurance	Error analysis, bias auditing, assumption testing.

🔄 Project Lifecycle (CRISP-DM Methodology)

1. Business Understanding

Objective: Minimize the total cost of teen mental health claims.
Success Metric: Net Value = (True Positives * Savings) - (False Positives * Cost).
Constraint: False Negatives (missed crises) are 13x more expensive than False Positives.

2. Data Understanding & Preparation (HW 3)

Dataset: 3,000 rows of teen behavioral data (Usage, Sleep, Grades, Social).
Audit Results: Zero missing values; detected outliers in Phone_Checks_Per_Day (Max: 150+).
Feature Engineering:
- Usage_to_Sleep_Ratio: Measures sleep displacement.
- App_Diversity: (Phone Checks / Apps Used Daily) – A proxy for "compulsive switching."
- Leakage Control: Explicitly removed Addiction_Level (target) from the training set.

3. Modeling & Evaluation (HW 4)

Model: Random Forest Classifier (chosen for non-linear relationships and outlier robustness).
Tuning Strategy: Applied Threshold Tuning. We lowered the decision boundary from 0.50 → 0.40 to create a "Defensive" model.
Performance:
- **Baseline (5-Hour Rule):

📊 The Data

Source: Kaggle: Teen Phone Addiction & Mental Health
Size: ~3,000 Records

Key Features:

Predictors (Usage): Daily_Usage_Hours, Social_Interactions, Screen_Time_Before_Bed, Phone_Checks_Per_Day.
Predictors (Context): Sleep_Hours, Academic_Performance, Parental_Control.
Target Variables: Anxiety_Level (Scale 0-10), Depression_Level (Scale 0-10).

🛠️ Tech Stack & Setup

This project uses Python 3.x and standard data science libraries.

Installation:

# Clone the repo
git clone [https://github.com/amriikk/smartphone-teens.git](https://github.com/amriikk/smartphone-teens.git)

# Install requirements (once file is created)
pip install -r requirements.txt

---

## How to Run the Data Quality Report

This guide will help you run the data quality analysis script, even if you're not familiar with programming.

### What You'll Need

- A computer with macOS, Windows, or Linux
- Python 3.8 or newer installed on your computer

### Step-by-Step Instructions

#### 1. Open the Terminal (or Command Prompt)

**On Mac:**
- Press `Cmd + Space`, type "Terminal", and press Enter

**On Windows:**
- Press `Windows key`, type "Command Prompt" or "PowerShell", and press Enter

#### 2. Navigate to the Project Folder

Type the following command and press Enter (replace the path with where you saved this project):

```bash
cd /path/to/smartphone-teens

For example, if you downloaded it to your Documents folder:

Mac: cd ~/Documents/smartphone-teens
Windows: cd C:\Users\YourName\Documents\smartphone-teens

3. Set Up the Environment (First Time Only)

Run these commands one at a time:

Mac/Linux:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Windows:

python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt

4. Run the Data Quality Report

Make sure your virtual environment is activated (you should see (venv) at the start of your command line), then run:

Mac/Linux:

python data_quality_report.py teen_phone_addiction_dataset.csv

Windows:

python data_quality_report.py teen_phone_addiction_dataset.csv

5. View the Report

After the script finishes, you'll see a summary in the terminal. A detailed HTML report will be created in the same folder called:

teen_phone_addiction_dataset_quality_report.html

To open the report: Double-click the HTML file, and it will open in your web browser.

Running on Your Own Data

To analyze a different CSV file, just replace the filename:

python data_quality_report.py your_data_file.csv

Troubleshooting

Problem	Solution
`command not found: python3`	Python is not installed. Download it from python.org
`No module named 'pandas'`	Run `pip install -r requirements.txt` again
`No such file or directory`	Make sure you're in the correct folder and the CSV file exists

What the Report Shows

The data quality report analyzes your dataset for:

Missing Values - Cells with no data
Outliers - Unusual values that might be errors
Duplicates - Repeated rows
Data Types - What kind of data each column contains
Categorical Values - All unique values for text/category columns

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
data		data
images		images
notebooks		notebooks
reports		reports
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
teen_phone_addiction_dataset_quality_report.html		teen_phone_addiction_dataset_quality_report.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📱 Predicting Teen Mental Health Crisis via Phone Usage

🎯 Executive Summary

👥 The Team (TMP 422 - Data & Decision Analytics)

🔄 Project Lifecycle (CRISP-DM Methodology)

1. Business Understanding

2. Data Understanding & Preparation (HW 3)

3. Modeling & Evaluation (HW 4)

📊 The Data

🛠️ Tech Stack & Setup

3. Set Up the Environment (First Time Only)

4. Run the Data Quality Report

5. View the Report

Running on Your Own Data

Troubleshooting

What the Report Shows

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📱 Predicting Teen Mental Health Crisis via Phone Usage

🎯 Executive Summary

👥 The Team (TMP 422 - Data & Decision Analytics)

🔄 Project Lifecycle (CRISP-DM Methodology)

1. Business Understanding

2. Data Understanding & Preparation (HW 3)

3. Modeling & Evaluation (HW 4)

📊 The Data

🛠️ Tech Stack & Setup

3. Set Up the Environment (First Time Only)

4. Run the Data Quality Report

5. View the Report

Running on Your Own Data

Troubleshooting

What the Report Shows

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages