Skip to content

cnstuart1/portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🗺 Joe's Protfolio

Welcome to my data portfolio! Here I am documenting a summary of my projects.

Technical Skills: Python, SQL, Tableau, GIS Enterprise and WebGIS, Full Stack Data Science

Using Analytics

  • Finance and Risk Analytics
  • Marketing and Retail Analytics
  • Web and Social Media Analytics
  • Supply Chain and Logistics Analytics
  • Time Series Forecasting
  • Data Visualization in Tableau

Data Science Projects

(1) Classification Predictive Maintenance | Renewable Energy

  • Predict the failure of generators for a wind energy company to help reduce machinery maintenance costs. The model predicted the occurrence of failures in wind turbine generators ~87% +/- 2% based on 40 factors (variables) in a ciphered (blind) dataset.
  • Determined the most important factors (variables) influencing the prediction of failures. The top 5 accounted for nearly 37% of the relative importance.
  • Built 21 classification models using 7 different machine learning algorithms, including Tree-based (Decision Trees, Random Forest) & boosted models (XGBoost). Compared all combinations and permutations of up sampling and down sampling.
  • Optimized the model to ensure that the maximum number of generator failures were predicted correctly (i.e., minimize false negatives). Used recall as scorer in cross-validation and hyperparameter tuning.
  • Reduce overall machinery maintenance costs by considering costs of inspection, repair, and replacement.
    • Skills & Tools Covered: Up and downsampling, Regularization, Hyperparameter tuning

(2) Clustering Stock Analysis | Finance

  • Built a diversified portfolio by analyzing and clustering stocks based on financial attributes.
  • Analyzed stock data for 340 companies, grouped the stocks into 5 clusters based on 15 attributes using both K-Means and Hierarchical clustering analysis techniques.
  • Identified similar and dissimilar characteristics within attribute data points (price, volatility, industry sector, and typical financial indicators) by performing EDA, scaling the data, and correlating features.
  • Chose the optimal number of groups, compared results of both, and shared insights about the characteristics of each group which helped develop an optimized portfolio for clients with custom risk-return profiles.
    • Skills & Tools Covered: EDA, Kmeans Clustering, Hierarchical Clustering, Cluster Profiling

(3) Business Statistics | Tech Website Optimization

  • Used A/B testing to identify the effectiveness of the new landing page of an online news portal.
  • Tested statistical hypothesis and determined which user groups (control/treatment) activities were statistically relevant to customer conversion rate. Compared user activities of preferred language and time spent on page.
  • The new landing page increased the conversion rate from 43% to 67%.
    • Skills & Tools Covered: Hypothesis Testing, a/b testing, Data Visualization, Statistical Inference

(4) Linear Regression | Retail

  • Built a dynamic pricing model for a used and refurbished devices seller using linear regression and identified key factors.
  • Analyzed dataset, built a linear regression model for resale price, and identified key factors significantly influencing price prediction.
  • The model explained ~84% of the variation in the data and predicted the normalized used price within +/- 4.5% (MAPE on the test data). Compared MAPE vs MAE (+/- 0.18). Charted test vs. training and compared the adjusted R-squared of 84.2% vs. 83.5%.
    • Skills & Tools Covered: EDA, Linear Regression, Linear Regression assumptions, Business insights and recommendations

(5) Classification | Consumer Discretionary

  • Determined the cancellation status of hotel bookings and identified its driving factors using classification models.
  • Analyzed the data of hotels to find which factors had a strong influence on booking cancellations, built a predictive model that predicted which bookings were most likely to be canceled in advance, and helped in formulating profitable policies for cancellations and refunds.
    • Skills & Tools Covered: EDA, Data Pre-processing, Logistic regression, Multicollinearity, Finding optimal threshold using AUC-ROC curve, Decision trees, Pruning

(6) Data Mining Analysis | Transportation

  • Performed data analysis and provided actionable insights for a food aggregator company to help improve the business. Performed customer order analysis on data from the customer online portal using Python.
    • Skills & Tools Covered: Exploratory Data Analysis, (Variable Identification, Univariate analysis, Bi-Variate analysis), Python

(7) Ensemble Algorithms | Techniques

  • Analyzed the data of Visa applicants, built a predictive model to facilitate the process of visa approvals by interpreting the most important factors significantly influencing approved or denied visa applications.
    • Skills & Tools Covered: EDA, Data Preprocessing, Customer Profiling, Bagging Classifiers (Bagging and Random Forest), Boosting Classifier (AdaBoost, Gradient Boosting, XGBoost), Stacking Classifier, Hyperparameter Tuning using GridSearchCV, Business insights

End

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published