Project Title: Image Compression using Principal Component Analysis (PCA) on the MNIST Dataset
Objective:
To apply Principal Component Analysis (PCA) on the MNIST handwritten digit dataset to effectively compress image size while preserving crucial information.
Methodology:
-
Data Acquisition: Acquire the MNIST dataset, consisting of 60,000 grayscale images of handwritten digits (0-9). Each image is represented by a 28x28 pixel matrix, resulting in a total of 784 features per image.
-
Data Preprocessing: Standardize the pixel intensities to ensure a mean of zero and a standard deviation of one for each feature. This facilitates PCA in identifying the most significant patterns in the data.
-
Principal Component Analysis (PCA): Implement PCA on the standardized data, projecting it onto a lower-dimensional subspace while maintaining the maximum variance in the original dataset.
-
Image Compression: Reconstruct the compressed images from the reduced set of principal components. Evaluate the trade-off between compression rate and image quality.
Technologies Utilized:
-
Python: Programming language for data manipulation and algorithm implementation.
-
MongoDB: NoSQL database for storing and retrieving the MNIST dataset.
-
Matplotlib: Python library for data visualization and generating plots.
-
NumPy: Numerical computing library for efficient array operations.
-
Pandas: Data analysis and manipulation library for processing and handling tabular data.
-
scikit-learn: Machine learning library, specifically utilizing the PCA module for dimensionality reduction.
-
Keras: Deep learning library for neural network modeling.