This project explores the Red Wine Quality dataset using a combination of statistical analysis, machine learning, and data visualization. The objective is to uncover the key factors that influence wine quality and to develop predictive models capable of estimating wine quality based on physicochemical attributes.
The main file in this repository is:
Red_Wine_Quality.ipynb: a Google Colab notebook containing all stages of the analysis, from data loading and preprocessing to modeling and evaluation.
The dataset used is publicly available on Kaggle:
π Red Wine Quality Dataset (UCI)
It consists of 1,599 samples of Portuguese red wine from the Vinho Verde region. Each sample includes 11 physicochemical properties (e.g., acidity, sugar, pH) and a quality score rated by wine tasters (scale: 0 to 10).
- Platform: Google Colab
- Language: Python 3
- Libraries: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, TensorFlow/Keras
- Supervised Learning:
- Decision Trees & Random Forests (Ensemble Methods)
- Support Vector Machines (SVM)
- Artificial Neural Networks (ANN)
- Unsupervised Learning:
- K-Means Clustering for pattern discovery and grouping
- Identify which chemical properties have the greatest impact on wine quality.
- Compare the effectiveness of various machine learning algorithms.
- Explore the potential of clustering to segment wines based on shared characteristics.
- Provide actionable insights that can inform wine production and quality control.