Skip to content

πŸ“‚ A collection of data analysis projects demonstrating end-to-end analytics capabilities and business impact.

License

Notifications You must be signed in to change notification settings

PavelGrigoryevDS/data-analysis-portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“‚ Pavel Grigoryev - Data Analysis Portfolio

Data Analytics MIT License

A collection of data analysis projects demonstrating end-to-end analytics capabilities and business impact.

πŸ“‘ Contents


πŸ§‘β€πŸ’» About me

  • I hold a higher technical education.
  • I specialize in data analysis with a focus on empowering informed decision-making.
  • By extracting insights from complex data sets, I help organizations make data-driven decisions that drive business growth and improvement.

πŸ› οΈ Languages and Tools

  • Programming Languages: Python, SQL (PostgreSQL, MySQL, ClickHouse), NoSQL (MongoDB).
  • Data Analysis & Visualization:
    • Libraries: Pandas, NumPy, SciPy, Statsmodels, Pingouin, Plotly, Matplotlib, Seaborn.
    • Tools & Frameworks: Dash, Power BI, Tableau, Redash, DataLens, Superset.
  • Big Data & Distributed Computing: Apache Spark, Apache Airflow.
  • Machine learning and AI: Scikit-learn, MLlib.
  • Time Series Forecasting: Facebook Prophet, Uber Orbit.
  • Natural Language Processing: NLTK, SpaCy, TextBlob.
  • Web scraping: BeautifulSoup, Selenium, Scrapy.
  • DevOps: Linux, Git, Docker.
  • IDEs: VS Code, Google Colab, Jupyter Notebook, Zeppelin, PyCharm.

⬆ back to contents


🎯 Skills

  • Deep data analysis:
    • Preprocessing, cleaning, and identifying patterns using visualization to support decision-making.
  • Writing complex SQL queries:
    • Working with nested queries, window functions, CASE and WITH statements for data extraction and analysis.
  • Understanding product strategy:
    • Knowledge of product development and improvement principles, including analyzing user needs and formulating recommendations for its growth.
  • Product metrics analysis:
    • LTV, RR, CR, ARPU, ARPPU, MAU, DAU, and other key performance indicators.
  • Conducting A/B testing:
    • Analyzing results using statistical methods to evaluate the effectiveness of changes.
  • Cohort analysis and RFM segmentation:
    • Identifying user behavior patterns to optimize marketing strategies.
  • End-to-End Data Pipelines:
    • Building automated ETL processes from databases to dashboards with Airflow orchestration.
  • Data visualization and dashboard development:
    • Creating interactive reports in Tableau, Redash, Power BI, and other tools for presenting analytics.
  • Web scraping:
    • Experience in extracting data from websites using tools and libraries such as BeautifulSoup, Scrapy, and Selenium for information gathering and data analysis.
  • Working with big data:
    • Experience with tools and technologies for processing large volumes of data (e.g., Hadoop, Spark).
  • Machine Learning Applications:
    • Capable of building and applying machine learning models for data analysis tasks, including forecasting, classification, and clustering, to uncover deeper insights and enhance decision-making processes.
  • Business and Metric Forecasting:
    • Building and interpreting time series forecasts for key business metrics using libraries like Uber Orbit and Facebook Prophet for intuitive, robust forecasting to support strategic planning and goal-setting.
  • Working with APIs:
    • Integrating and extracting data from various sources via APIs.
  • Process Automation:
    • Automating data workflows and routine tasks using Linux scripting, Apache Airflow and other DevOps tools.

⬆ back to contents


😎 Awesome Data Analysis

500+ curated resources for data analysis and data science: tools, libraries, roadmaps, cheatsheets, and interview guides.

View Repository

Key Methods:

  • Knowledge Management & Information Architecture:
    • Systematic content curation, resource classification, and learning path development
  • Research & Critical Thinking:
    • Technical content evaluation, accuracy validation, and relevance assessment
  • Content Strategy & Curation:
    • Quality control implementation, information synthesis, and accessibility optimization

Project Description:

  • A curated knowledge hub demonstrating systematic approach to data analysis, reflecting expertise in structuring complex information and evaluating technical content.

Project Goal:

  • To create a comprehensive, well-organized resource collection that facilitates learning and professional development in data analysis and data science.

Key Achievements:

  • Systematized 500+ resources into logical learning paths and competency areas
  • Implemented rigorous quality control by selecting materials based on accuracy and relevance
  • Optimized information architecture for quick navigation and knowledge discovery
  • Enhanced accessibility through web version development
  • Synthesized fragmented knowledge into unified, actionable framework

Business Impact:

  • Established trusted reference platform that accelerates learning curve for data professionals and demonstrates expertise in information architecture and knowledge management.

⬆ back to contents


🧩 Building Startup Analytics

Building analytics process for startup: infrastructure, dashboards, A/B testing, forecasting, automated reports, and anomaly detection.

View Repository

Stack:

  • Data & DB: Python Pandas ClickHouse
  • Viz & BI: Superset Yandex DataLens Plotly
  • ML & Stats: StatsModels SciPy Pingouin Uber Orbit
  • Automation: Apache Airflow Telegram API

Key Methods:

  • Data Infrastructure Design:
    • Star schema modeling, ETL pipeline development, and data quality validation
  • Product Analytics:
    • Retention analysis, cohort analysis, and engagement metrics tracking
  • Business Intelligence:
    • Real-time dashboard design, KPI definition, and self-service reporting implementation
  • Statistical Hypothesis Testing:
    • A/A and A/B test analysis, sample size calculation, and statistical power analysis
  • Time Series Forecasting:
    • Bayesian structural models, trend/seasonality decomposition, and model validation
  • Anomaly Detection:
    • MAD-based outlier detection, alert threshold optimization, and real-time monitoring
  • Automation Engineering:
    • DAG orchestration, API integration, and scheduled reporting systems
  • Monte Carlo Simulation:
    • Statistical power estimation and sample size determination through simulation

Project Description:

  • This project demonstrates the implementation of a complete product analytics system for an early-stage startup that has developed an application merging a messenger with a personalized news feed.
  • In this ecosystem, users can browse and interact with posts (views, likes) while simultaneously communicating with each other through direct messages.
  • The core challenge was to build the entire analytical infrastructure from scratch to understand user behavior across both features and enable data-driven decision-making.

Project Goal:

  • To build complete analytics infrastructure from scratch enabling data-driven product decisions through automated reporting, experimentation, and monitoring.

Key Achievements:

  • Built scalable data infrastructure with optimized analytical database in ClickHouse
  • Designed interactive dashboards for real-time monitoring of user engagement and retention
  • Implemented rigorous A/B testing pipeline with statistical validation framework
  • Developed forecasting models for server load prediction and capacity planning
  • Created automated reporting system with daily Telegram delivery to stakeholders
  • Established real-time anomaly detection for proactive issue resolution

Business Impact:

  • Enabled data-driven product decisions and reduced manual reporting overhead through comprehensive analytics ecosystem.

⬆ back to contents


🌊 Deep Sales Analysis of Olist E-Commerce

Comprehensive analysis of Brazilian e-commerce data, uncovering key insights and actionable business recommendations.

View Repository

Stack:

  • Data Analysis: Python Pandas NumPy
  • Visualization: Plotly Tableau
  • Statistics & ML: StatsModels SciPy Sklearn Pingouin
  • NLP & Text Processing: NLTK TextBlob

Key Methods:

  • Exploratory Data Analysis (EDA):
    • Statistical summaries, missing value analysis, and outlier detection
  • Data Preprocessing:
    • Feature engineering, missing value handling, and creation of new metrics and dimensions
  • Time Series Analysis:
    • Revenue/order trends, seasonality decomposition
  • RFM Segmentation:
    • Customer value clustering (Recency, Frequency, Monetary)
  • Clustering:
    • sklearn-based customer behavior segmentation
  • Geospatial Analysis:
    • Sales heatmaps and delivery performance by region
  • NLP Sentiment Analysis:
    • Review text processing with NLTK and TextBlob
  • Statistical Testing:
    • correlation analysis and hypothesis testing

Project Description:

  • Comprehensive analysis of Brazilian e-commerce platform Olist, identifying growth opportunities and operational improvements through data-driven insights.

Project Goal:

  • To perform deep-dive analysis identifying growth opportunities, operational improvements, and customer behavior patterns.

Key Achievements:

  • Conducted time-series analysis of sales dynamics, seasonality, and trend decomposition
  • Implemented anomaly detection in orders, payments, and delivery times
  • Developed customer profiling through RFM segmentation and clustering analysis
  • Performed cohort analysis to track customer retention and lifetime value (LTV)
  • Processed customer reviews using NLP for sentiment analysis and insights
  • Validated business hypotheses through statistical testing
  • Delivered strategic recommendations for logistics optimization and sales growth

Business Impact:

  • Provided data-backed insights to optimize logistics, enhance customer retention strategies, and drive revenue growth through targeted improvements.

⬆ back to contents


🌐 WWI Data Pipeline and Dashboard

End-to-end data pipeline and interactive dashboard for Wide World Importers.

View Repository

Stack:

  • Data & Databases: Python SQL PostgreSQL Sqlalchemy DBLink
  • Analytics & BI: Yandex DataLens
  • Automation: Airflow

Key Methods:

  • Database Management:
    • PostgreSQL with OLTP to OLAP transformation
  • ETL Pipeline Development:
    • Automated data extraction, transformation, and loading processes
  • Data Warehouse Design:
    • Star schema implementation for analytical queries
  • SQL Optimization:
    • Complex queries, materialized views, and index optimization
  • Business Intelligence:
    • Interactive dashboard development in Yandex DataLens
  • Automation:
    • Airflow DAG design for daily data pipeline execution

Project Description:

  • End-to-end data pipeline and business intelligence solution for global distributor Wide World Importers.

Project Goal:

  • To transform siloed operational data into unified business intelligence platform supporting sales, procurement, and logistics decisions.

Key Achievements:

  • Built automated ETL pipeline transforming OLTP data into optimized star schema data mart
  • Designed and implemented interactive dashboard for sales, logistics, and customer analytics
  • Developed daily automated data updates with Airflow DAG orchestration
  • Created comprehensive business intelligence platform with specialized dashboards
  • Enabled cross-departmental data-driven decision making

Business Impact:

  • Reduced manual reporting time and provided single source of truth for business performance monitoring across departments.

⬆ back to contents


πŸš€ Frameon - Advanced Analytics for Pandas

Frameon extends pandas DataFrame with analysis methods while keeping all original functionality intact.

View Repository

Stack:

  • Core Technologies: Python Pandas NumPy
  • Statistics & ML: Statsmodels Scikit-learn SciPy Pingouin
  • Visualization: Plotly
  • NLP & Text: TextBlob
  • Documentation: Sphinx

Key Methods:

  • Package Development:
    • End-to-end Python package creation and distribution workflow
  • Software Engineering:
    • Object-oriented programming and pandas extension development
  • Testing & Quality:
    • Automated testing with GitHub Actions and code quality enforcement
  • Documentation:
    • Comprehensive documentation generation with Sphinx
  • Visualization:
    • Automated chart generation and interactive plotting
  • Machine Learning:
    • Feature analysis and model evaluation techniques
  • Text Processing:
    • NLP methods for text analysis and sentiment detection

Project Description:

  • Powerful pandas extension that enhances DataFrames with production-ready analytics while maintaining native functionality.

Project Goal:

  • To create comprehensive analytics toolkit that streamlines exploratory analysis and statistical workflows within pandas ecosystem.

Key Achievements:

  • Seamlessly integrates exploratory analysis, statistical testing and visualization into pandas workflows
  • Provides instant insights through automated data profiling and quality checks
  • Enables cohort analysis with flexible periodization and metric customization
  • Offers built-in statistical methods (bootstrap, effect sizes, group comparisons)
  • Generates interactive visualizations with single-command access
  • Supports both DataFrame-level and column-specific analysis
  • Maintains full backward compatibility with native pandas functionality

Business Impact:

  • Accelerates data analysis workflows and standardizes analytical methodologies across teams and projects.

⬆ back to contents


πŸ“œ License

This project is shared under MIT License.

About

πŸ“‚ A collection of data analysis projects demonstrating end-to-end analytics capabilities and business impact.

Topics

Resources

License

Stars

Watchers

Forks