Welcome to my data portfolio! Here I am documenting a summary of my projects.
- Finance and Risk Analytics
- Marketing and Retail Analytics
- Web and Social Media Analytics
- Supply Chain and Logistics Analytics
- Time Series Forecasting
- Data Visualization in Tableau
- Predict the failure of generators for a wind energy company to help reduce machinery maintenance costs. The model predicted the occurrence of failures in wind turbine generators ~87% +/- 2% based on 40 factors (variables) in a ciphered (blind) dataset.
- Determined the most important factors (variables) influencing the prediction of failures. The top 5 accounted for nearly 37% of the relative importance.
- Built 21 classification models using 7 different machine learning algorithms, including Tree-based (Decision Trees, Random Forest) & boosted models (XGBoost). Compared all combinations and permutations of up sampling and down sampling.
- Optimized the model to ensure that the maximum number of generator failures were predicted correctly (i.e., minimize false negatives). Used recall as scorer in cross-validation and hyperparameter tuning.
- Reduce overall machinery maintenance costs by considering costs of inspection, repair, and replacement.
- Skills & Tools Covered: Up and downsampling, Regularization, Hyperparameter tuning
- Built a diversified portfolio by analyzing and clustering stocks based on financial attributes.
- Analyzed stock data for 340 companies, grouped the stocks into 5 clusters based on 15 attributes using both K-Means and Hierarchical clustering analysis techniques.
- Identified similar and dissimilar characteristics within attribute data points (price, volatility, industry sector, and typical financial indicators) by performing EDA, scaling the data, and correlating features.
- Chose the optimal number of groups, compared results of both, and shared insights about the characteristics of each group which helped develop an optimized portfolio for clients with custom risk-return profiles.
- Skills & Tools Covered: EDA, Kmeans Clustering, Hierarchical Clustering, Cluster Profiling
- Used A/B testing to identify the effectiveness of the new landing page of an online news portal.
- Tested statistical hypothesis and determined which user groups (control/treatment) activities were statistically relevant to customer conversion rate. Compared user activities of preferred language and time spent on page.
- The new landing page increased the conversion rate from 43% to 67%.
- Skills & Tools Covered: Hypothesis Testing, a/b testing, Data Visualization, Statistical Inference
- Built a dynamic pricing model for a used and refurbished devices seller using linear regression and identified key factors.
- Analyzed dataset, built a linear regression model for resale price, and identified key factors significantly influencing price prediction.
- The model explained ~84% of the variation in the data and predicted the normalized used price within +/- 4.5% (MAPE on the test data). Compared MAPE vs MAE (+/- 0.18). Charted test vs. training and compared the adjusted R-squared of 84.2% vs. 83.5%.
- Skills & Tools Covered: EDA, Linear Regression, Linear Regression assumptions, Business insights and recommendations
- Determined the cancellation status of hotel bookings and identified its driving factors using classification models.
- Analyzed the data of hotels to find which factors had a strong influence on booking cancellations, built a predictive model that predicted which bookings were most likely to be canceled in advance, and helped in formulating profitable policies for cancellations and refunds.
- Skills & Tools Covered: EDA, Data Pre-processing, Logistic regression, Multicollinearity, Finding optimal threshold using AUC-ROC curve, Decision trees, Pruning
- Performed data analysis and provided actionable insights for a food aggregator company to help improve the business. Performed customer order analysis on data from the customer online portal using Python.
- Skills & Tools Covered: Exploratory Data Analysis, (Variable Identification, Univariate analysis, Bi-Variate analysis), Python
- Analyzed the data of Visa applicants, built a predictive model to facilitate the process of visa approvals by interpreting the most important factors significantly influencing approved or denied visa applications.
- Skills & Tools Covered: EDA, Data Preprocessing, Customer Profiling, Bagging Classifiers (Bagging and Random Forest), Boosting Classifier (AdaBoost, Gradient Boosting, XGBoost), Stacking Classifier, Hyperparameter Tuning using GridSearchCV, Business insights