📂 Pavel Grigoryev - Data Analysis Portfolio

A collection of data analysis projects demonstrating end-to-end analytics capabilities and business impact.

📑 Contents

🧑‍💻 About me
🛠️ Languages and Tools
🎯 Skills
😎 Awesome Data Analysis
🧩 Building Startup Analytics
🌊 Deep Sales Analysis of Olist E-Commerce
🌐 WWI Data Pipeline and Dashboard
🚀 Frameon - Advanced Analytics for Pandas
📜 License

🧑‍💻 About me

I hold a higher technical education.
I specialize in data analysis with a focus on empowering informed decision-making.
By extracting insights from complex data sets, I help organizations make data-driven decisions that drive business growth and improvement.

🛠️ Languages and Tools

Programming Languages: Python, SQL (PostgreSQL, MySQL, ClickHouse), NoSQL (MongoDB).
Data Analysis & Visualization:
- Libraries: Pandas, NumPy, SciPy, Statsmodels, Pingouin, Plotly, Matplotlib, Seaborn.
- Tools & Frameworks: Dash, Power BI, Tableau, Redash, DataLens, Superset.
Big Data & Distributed Computing: Apache Spark, Apache Airflow.
Machine learning and AI: Scikit-learn, MLlib.
Time Series Forecasting: Facebook Prophet, Uber Orbit.
Natural Language Processing: NLTK, SpaCy, TextBlob.
Web scraping: BeautifulSoup, Selenium, Scrapy.
DevOps: Linux, Git, Docker.
IDEs: VS Code, Google Colab, Jupyter Notebook, Zeppelin, PyCharm.

⬆ back to contents

🎯 Skills

Deep data analysis:
- Preprocessing, cleaning, and identifying patterns using visualization to support decision-making.
Writing complex SQL queries:
- Working with nested queries, window functions, CASE and WITH statements for data extraction and analysis.
Understanding product strategy:
- Knowledge of product development and improvement principles, including analyzing user needs and formulating recommendations for its growth.
Product metrics analysis:
- LTV, RR, CR, ARPU, ARPPU, MAU, DAU, and other key performance indicators.
Conducting A/B testing:
- Analyzing results using statistical methods to evaluate the effectiveness of changes.
Cohort analysis and RFM segmentation:
- Identifying user behavior patterns to optimize marketing strategies.
End-to-End Data Pipelines:
- Building automated ETL processes from databases to dashboards with Airflow orchestration.
Data visualization and dashboard development:
- Creating interactive reports in Tableau, Redash, Power BI, and other tools for presenting analytics.
Web scraping:
- Experience in extracting data from websites using tools and libraries such as BeautifulSoup, Scrapy, and Selenium for information gathering and data analysis.
Working with big data:
- Experience with tools and technologies for processing large volumes of data (e.g., Hadoop, Spark).
Machine Learning Applications:
- Capable of building and applying machine learning models for data analysis tasks, including forecasting, classification, and clustering, to uncover deeper insights and enhance decision-making processes.
Business and Metric Forecasting:
- Building and interpreting time series forecasts for key business metrics using libraries like Uber Orbit and Facebook Prophet for intuitive, robust forecasting to support strategic planning and goal-setting.
Working with APIs:
- Integrating and extracting data from various sources via APIs.
Process Automation:
- Automating data workflows and routine tasks using Linux scripting, Apache Airflow and other DevOps tools.

⬆ back to contents

😎 Awesome Data Analysis

500+ curated resources for data analysis and data science: tools, libraries, roadmaps, cheatsheets, and interview guides.

View Repository

Key Methods:

Knowledge Management & Information Architecture:
- Systematic content curation, resource classification, and learning path development
Research & Critical Thinking:
- Technical content evaluation, accuracy validation, and relevance assessment
Content Strategy & Curation:
- Quality control implementation, information synthesis, and accessibility optimization

Project Description:

A curated knowledge hub demonstrating systematic approach to data analysis, reflecting expertise in structuring complex information and evaluating technical content.

Project Goal:

To create a comprehensive, well-organized resource collection that facilitates learning and professional development in data analysis and data science.

Key Achievements:

Systematized 500+ resources into logical learning paths and competency areas
Implemented rigorous quality control by selecting materials based on accuracy and relevance
Optimized information architecture for quick navigation and knowledge discovery
Enhanced accessibility through web version development
Synthesized fragmented knowledge into unified, actionable framework

Business Impact:

Established trusted reference platform that accelerates learning curve for data professionals and demonstrates expertise in information architecture and knowledge management.

⬆ back to contents

🧩 Building Startup Analytics

Building analytics process for startup: infrastructure, dashboards, A/B testing, forecasting, automated reports, and anomaly detection.

View Repository

Stack:

Data & DB: Python Pandas ClickHouse
Viz & BI: Superset Yandex DataLens Plotly
ML & Stats: StatsModels SciPy Pingouin Uber Orbit
Automation: Apache Airflow Telegram API

Key Methods:

Data Infrastructure Design:
- Star schema modeling, ETL pipeline development, and data quality validation
Product Analytics:
- Retention analysis, cohort analysis, and engagement metrics tracking
Business Intelligence:
- Real-time dashboard design, KPI definition, and self-service reporting implementation
Statistical Hypothesis Testing:
- A/A and A/B test analysis, sample size calculation, and statistical power analysis
Time Series Forecasting:
- Bayesian structural models, trend/seasonality decomposition, and model validation
Anomaly Detection:
- MAD-based outlier detection, alert threshold optimization, and real-time monitoring
Automation Engineering:
- DAG orchestration, API integration, and scheduled reporting systems
Monte Carlo Simulation:
- Statistical power estimation and sample size determination through simulation

Project Description:

This project demonstrates the implementation of a complete product analytics system for an early-stage startup that has developed an application merging a messenger with a personalized news feed.
In this ecosystem, users can browse and interact with posts (views, likes) while simultaneously communicating with each other through direct messages.
The core challenge was to build the entire analytical infrastructure from scratch to understand user behavior across both features and enable data-driven decision-making.

Project Goal:

To build complete analytics infrastructure from scratch enabling data-driven product decisions through automated reporting, experimentation, and monitoring.

Key Achievements:

Built scalable data infrastructure with optimized analytical database in ClickHouse
Designed interactive dashboards for real-time monitoring of user engagement and retention
Implemented rigorous A/B testing pipeline with statistical validation framework
Developed forecasting models for server load prediction and capacity planning
Created automated reporting system with daily Telegram delivery to stakeholders
Established real-time anomaly detection for proactive issue resolution

Business Impact:

Enabled data-driven product decisions and reduced manual reporting overhead through comprehensive analytics ecosystem.

⬆ back to contents

🌊 Deep Sales Analysis of Olist E-Commerce

Comprehensive analysis of Brazilian e-commerce data, uncovering key insights and actionable business recommendations.

View Repository

Stack:

Data Analysis: Python Pandas NumPy
Visualization: Plotly Tableau
Statistics & ML: StatsModels SciPy Sklearn Pingouin
NLP & Text Processing: NLTK TextBlob

Key Methods:

Exploratory Data Analysis (EDA):
- Statistical summaries, missing value analysis, and outlier detection
Data Preprocessing:
- Feature engineering, missing value handling, and creation of new metrics and dimensions
Time Series Analysis:
- Revenue/order trends, seasonality decomposition
RFM Segmentation:
- Customer value clustering (Recency, Frequency, Monetary)
Clustering:
- sklearn-based customer behavior segmentation
Geospatial Analysis:
- Sales heatmaps and delivery performance by region
NLP Sentiment Analysis:
- Review text processing with NLTK and TextBlob
Statistical Testing:
- correlation analysis and hypothesis testing

Project Description:

Comprehensive analysis of Brazilian e-commerce platform Olist, identifying growth opportunities and operational improvements through data-driven insights.

Project Goal:

To perform deep-dive analysis identifying growth opportunities, operational improvements, and customer behavior patterns.

Key Achievements:

Conducted time-series analysis of sales dynamics, seasonality, and trend decomposition
Implemented anomaly detection in orders, payments, and delivery times
Developed customer profiling through RFM segmentation and clustering analysis
Performed cohort analysis to track customer retention and lifetime value (LTV)
Processed customer reviews using NLP for sentiment analysis and insights
Validated business hypotheses through statistical testing
Delivered strategic recommendations for logistics optimization and sales growth

Business Impact:

Provided data-backed insights to optimize logistics, enhance customer retention strategies, and drive revenue growth through targeted improvements.

⬆ back to contents

🌐 WWI Data Pipeline and Dashboard

End-to-end data pipeline and interactive dashboard for Wide World Importers.

View Repository

Stack:

Data & Databases: Python SQL PostgreSQL Sqlalchemy DBLink
Analytics & BI: Yandex DataLens
Automation: Airflow

Key Methods:

Database Management:
- PostgreSQL with OLTP to OLAP transformation
ETL Pipeline Development:
- Automated data extraction, transformation, and loading processes
Data Warehouse Design:
- Star schema implementation for analytical queries
SQL Optimization:
- Complex queries, materialized views, and index optimization
Business Intelligence:
- Interactive dashboard development in Yandex DataLens
Automation:
- Airflow DAG design for daily data pipeline execution

Project Description:

End-to-end data pipeline and business intelligence solution for global distributor Wide World Importers.

Project Goal:

To transform siloed operational data into unified business intelligence platform supporting sales, procurement, and logistics decisions.

Key Achievements:

Built automated ETL pipeline transforming OLTP data into optimized star schema data mart
Designed and implemented interactive dashboard for sales, logistics, and customer analytics
Developed daily automated data updates with Airflow DAG orchestration
Created comprehensive business intelligence platform with specialized dashboards
Enabled cross-departmental data-driven decision making

Business Impact:

Reduced manual reporting time and provided single source of truth for business performance monitoring across departments.

⬆ back to contents

🚀 Frameon - Advanced Analytics for Pandas

Frameon extends pandas DataFrame with analysis methods while keeping all original functionality intact.

View Repository

Stack:

Core Technologies: Python Pandas NumPy
Statistics & ML: Statsmodels Scikit-learn SciPy Pingouin
Visualization: Plotly
NLP & Text: TextBlob
Documentation: Sphinx

Key Methods:

Package Development:
- End-to-end Python package creation and distribution workflow
Software Engineering:
- Object-oriented programming and pandas extension development
Testing & Quality:
- Automated testing with GitHub Actions and code quality enforcement
Documentation:
- Comprehensive documentation generation with Sphinx
Visualization:
- Automated chart generation and interactive plotting
Machine Learning:
- Feature analysis and model evaluation techniques
Text Processing:
- NLP methods for text analysis and sentiment detection

Project Description:

Powerful pandas extension that enhances DataFrames with production-ready analytics while maintaining native functionality.

Project Goal:

To create comprehensive analytics toolkit that streamlines exploratory analysis and statistical workflows within pandas ecosystem.

Key Achievements:

Seamlessly integrates exploratory analysis, statistical testing and visualization into pandas workflows
Provides instant insights through automated data profiling and quality checks
Enables cohort analysis with flexible periodization and metric customization
Offers built-in statistical methods (bootstrap, effect sizes, group comparisons)
Generates interactive visualizations with single-command access
Supports both DataFrame-level and column-specific analysis
Maintains full backward compatibility with native pandas functionality

Business Impact:

Accelerates data analysis workflows and standardizes analytical methodologies across teams and projects.

⬆ back to contents

📜 License

This project is shared under MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📂 Pavel Grigoryev - Data Analysis Portfolio

📑 Contents

🧑‍💻 About me

🛠️ Languages and Tools

🎯 Skills

😎 Awesome Data Analysis

🧩 Building Startup Analytics

🌊 Deep Sales Analysis of Olist E-Commerce

🌐 WWI Data Pipeline and Dashboard

🚀 Frameon - Advanced Analytics for Pandas

📜 License

About

Uh oh!

License

PavelGrigoryevDS/data-analysis-portfolio

Folders and files

Latest commit

History

Repository files navigation

📂 Pavel Grigoryev - Data Analysis Portfolio

📑 Contents

🧑‍💻 About me

🛠️ Languages and Tools

🎯 Skills

😎 Awesome Data Analysis

🧩 Building Startup Analytics

🌊 Deep Sales Analysis of Olist E-Commerce

🌐 WWI Data Pipeline and Dashboard

🚀 Frameon - Advanced Analytics for Pandas

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks