This project implements a two-layer neural network from scratch (using only NumPy) for classifying handwritten digits from the MNIST dataset. The workflow includes data loading, preprocessing, model implementation, training, evaluation, and visualization of results.
- Overview
- Dataset
- Model Architecture
- Training Details
- Results
- How to Run
- Limitations and Improvements
This repository demonstrates a simple feedforward neural network (multi-layer perceptron) built from scratch to recognize digits (0-9) from grayscale 28x28 pixel images. The implementation covers the entire pipeline, including manual forward and backward propagation, weight updates, and performance evaluation without using high-level machine learning libraries.
- Source: Kaggle Digit Recognizer
- Format: CSV, with each row representing a flattened 28x28 image (784 pixels) and its label.
- Preprocessing: Images are normalized to [0, 1], and the data is split into training and validation sets.
Layer | Details |
---|---|
Input | 784 neurons (28x28 pixels) |
Hidden Layer 1 | 64 neurons, ReLU activation |
Hidden Layer 2 | 64 neurons, ReLU activation |
Output | 10 neurons, Softmax |
- Loss Function: Categorical Cross-Entropy
- Optimizer: Mini-batch Gradient Descent
- Epochs: 1000
- Validation Split: First 1000 samples held out for validation
- Learning Rate, Batch Size: Set in the notebook
- Shuffling: Data shuffled before training
- Final Validation Accuracy: 97.7%
- Initial Validation Accuracy: 79.9%
- Training Loss: Decreased smoothly to near zero
- Clone the repository and open the notebook:
git clone [email protected]:siddddd17/mnist-nn-from-scratch.git
cd mnist-nn-from-scratch
jupyter notebook 2-layer-nueral-network-from-scratch.ipynb
-
Download the dataset from Kaggle Digit Recognizer and place
train.csv
in the notebook directory. -
Run all cells in the notebook.
-
Current Model:
-
Simple architecture, suitable for educational purposes.
-
Achieves strong accuracy for a basic neural network.
-
Potential Improvements:
-
Add convolutional layers (CNN) for better spatial feature extraction.
-
Implement regularization (dropout, batch normalization).
-
Experiment with optimizers (Adam, RMSprop).
-
Data augmentation for robustness.