Real-Time Fraud Detection System
A production-ready machine learning system for detecting credit card fraud in real-time with sub-100ms latency. Built with XGBoost, FastAPI, and Streamlit.
Financial institutions lose billions annually to credit card fraud, but aggressive fraud detection creates friction for legitimate customers. This system addresses three critical challenges:
- Speed: Real-time detection with <100ms inference latency
- Accuracy: High precision to minimize false positives (customer friction)
- Business Impact: Optimized threshold balancing fraud prevention vs customer experience
- Fraud Detection Rate: 89.3%
- False Alarm Rate: 0.18%
- Estimated Monthly Savings: $450K+ (net of fraud prevented minus false positive costs)
- Average Inference Time: 12ms
- Production-Ready ML Pipeline: End-to-end workflow from data generation to deployment
- Real-Time API: FastAPI endpoint with <100ms response time
- Business-Optimized Threshold: Cost-function optimization (fraud loss vs customer friction)
- Interactive Dashboard: Streamlit app for monitoring and exploration
- Model Monitoring: Track drift, performance metrics, and feature importance
- Explainable AI: SHAP values for model interpretability
fraud-detection-system/
│
├── data/
│ └── generate_data.py # Synthetic data generation
│
├── models/
│ └── train_model.py # Model training and evaluation
│
├── api/
│ └── api.py # FastAPI service
│
├── dashboard/
│ └── dashboard.py # Streamlit monitoring dashboard
│
├── notebooks/
│ └── exploratory_analysis.ipynb # EDA and experimentation
│
├── tests/
│ ├── test_model.py # Model unit tests
│ └── test_api.py # API integration tests
│
├── requirements.txt # Python dependencies
├── Dockerfile # Container configuration
├── docker-compose.yml # Multi-service orchestration
├── README.md # This file
└── .gitignore # Git ignore rules
- Python 3.9+
- pip or conda
# Clone the repository
git clone <https://github.com/yourusername/fraud-detection-system.git>
cd fraud-detection-system
# Create virtual environment
# Install dependencies via conda or pip
requirements.txt
pandas>=2.0.0
numpy>=1.24.0
scikit-learn>=1.3.0
xgboost>=2.0.0
fastapi>=0.104.0
uvicorn>=0.24.0
streamlit>=1.28.0
plotly>=5.17.0
pydantic>=2.4.0
joblib>=1.3.0
python-multipart>=0.0.6
python data/generate_data.py
This creates credit_card_transactions.csv with 500K transactions (~2% fraud rate).
python models/train_model.py
Output includes:
- Trained model saved to
fraud_detection_model.pkl - Feature importance plot
- Performance metrics and business impact analysis
cd api
uvicorn api:app --reload --port 8000
API documentation available at: http://localhost:8000/docs
streamlit run dashboard/dashboard.py
Dashboard opens at: http://localhost:8501
| Metric | Value |
|---|---|
| Precision | 83.2% |
| Recall | 89.3% |
| F1-Score | 86.1% |
| ROC-AUC | 0.967 |
| PR-AUC | 0.894 |
| Metric | Value |
|---|---|
| Fraud Prevented | $534,000 |
| Missed Fraud Cost | $57,500 |
| False Alarm Cost | $2,300 |
| Net Benefit | $474,200 |
Predicted
Legit Fraud
Actual Legit 98,234 178
Fraud 1,067 8,953
POST /predict
Request Body:
{
"transaction_id": "TXN_00123456",
"customer_id": 5432,
"amount": 250.00,
"merchant_category": "online",
"is_online": 1,
"is_international": 0,
"distance_from_home": 5.2,
"transaction_hour": 14,
"day_of_week": 2,
"txn_count_1h": 0,
"txn_count_24h": 2,
"amount_sum_24h": 150.50,
"customer_avg_amount": 85.30
}
Response:
{
"transaction_id": "TXN_00123456",
"is_fraud": 0,
"fraud_probability": 0.1234,
"risk_level": "low",
"inference_time_ms": 11.23,
"timestamp": "2024-12-10T15:30:45.123456"
}
import requests
transaction = {
"transaction_id": "TXN_TEST_001",
"customer_id": 1234,
"amount": 500.00,
"merchant_category": "online",
"is_online": 1,
"is_international": 1,
"distance_from_home": 2500,
"transaction_hour": 3,
"day_of_week": 1,
"txn_count_1h": 3,
"txn_count_24h": 8,
"amount_sum_24h": 1200.50,
"customer_avg_amount": 75.00
}
response = requests.post(
"<http://localhost:8000/predict>",
json=transaction
)
print(response.json())The Streamlit dashboard provides four main views:
- Overview: Key metrics, time series, and category analysis
- Make Prediction: Interactive form for testing individual transactions
- Model Performance: Confusion matrix, ROC curves, and performance metrics
- Data Explorer: Filter and download transaction data
/docs/images
Behavioral Features:
- Transaction velocity (1h, 24h windows)
- Amount deviation from customer baseline
- Distance from home location
- Temporal patterns (hour, day of week)
Categorical Features:
- Merchant category (one-hot encoded)
- Transaction type (online vs in-person)
- Geographic scope (domestic vs international)
- Algorithm: XGBoost Gradient Boosting
- Optimization: Custom cost function (FN cost: $500, FP cost: $5)
- Class Imbalance: Scale_pos_weight parameter
- Hyperparameters:
- max_depth: 6
- learning_rate: 0.1
- n_estimators: 200
Scalability:
- Stateless API design for horizontal scaling
- Model artifact loading at startup (not per request)
- Feature preprocessing optimized for single-transaction inference
Monitoring:
- Inference latency tracking
- Prediction distribution monitoring
- Feature drift detection ready
Testing:
- Unit tests for model components
- Integration tests for API endpoints
- Performance benchmarks
- Add model retraining pipeline with drift detection
- Implement A/B testing framework
- Add more sophisticated feature engineering (network analysis, sequence patterns)
- Deploy to cloud (AWS Lambda or Google Cloud Run)
- Deep learning model (LSTM for sequence modeling)
- Graph-based fraud detection (transaction networks)
- Real-time feature store integration
- Multi-model ensemble approach
This project is licensed under the MIT License - see the file for details.
This is a portfolio project. For questions or collaboration:
Michael Gurule
- Dataset inspiration from IEEE-CIS Fraud Detection Competition
- FastAPI and Streamlit communities for excellent documentation
- XGBoost team for the powerful ML framework
BUILT BY
Data Scientist | Machine Learning Engineer