A hands-on, problem-based learning curriculum for implementing GPT-2 from scratch using PyTorch. Learn transformer architecture through 12 progressive problems that build on each other.
By completing this learning path, you will:
- ✅ Understand transformer architecture deeply
- ✅ Implement GPT-2 from scratch in PyTorch
- ✅ Master attention mechanisms (self-attention, multi-head, causal masking)
- ✅ Learn layer normalization and residual connections
- ✅ Load and use pretrained GPT-2 weights from HuggingFace
- ✅ Gain practical PyTorch implementation experience
The curriculum consists of 12 progressive problems:
| # | Problem | Difficulty | Status |
|---|---|---|---|
| 1 | Token & Position Embeddings | ⭐ Easy | ✅ Complete |
| 2 | Attention Basics | ⭐⭐ Medium | ✅ Complete |
| 3 | Scaled Dot-Product Attention | ⭐⭐ Medium | ✅ Complete |
| 4 | Multi-Head Attention | ⭐⭐⭐ Hard | ✅ Complete |
| 5 | Causal Masking | ⭐⭐ Medium | ✅ Complete |
| 6 | Feedforward Network | ⭐ Easy | ✅ Complete |
| 7 | Layer Normalization & Residuals | ⭐⭐ Medium | ✅ Complete |
| 8 | Complete Transformer Block | ⭐⭐⭐ Hard | ✅ Complete |
| 9 | GPT-2 Configuration | ⭐ Easy | ✅ Complete |
| 10 | Full GPT-2 Model Assembly | ⭐⭐⭐ Hard | ✅ Complete |
| 11 | Weight Initialization | ⭐⭐ Medium | ✅ Complete |
| 12 | Loading Pretrained Weights | ⭐⭐⭐⭐ Very Hard | ✅ Complete |
- Python 3.11 or higher
- uv package manager
- Install uv if you haven't already:
curl -LsSf https://astral.sh/uv/install.sh | sh- Clone the repository and install dependencies:
git clone <repository-url>
cd learn-gpt2
uv sync --all-extrasThis will install all dependencies including PyTorch, transformers, matplotlib, seaborn, jupyter, and pytest.
Each problem directory contains:
problems/XX-problem-name/
├── README.md # Learning objectives, concepts, and hints
├── problem.py # Skeleton code with TODOs for you to implement
├── solution.py # Complete working solution (don't peek!)
├── test_*.py # Unit tests to validate your implementation
└── notebook.ipynb # Interactive notebook for exploration
For each problem, follow this workflow:
# Navigate to the problem directory
cd problems/01-embeddings
# Read the README to understand concepts
cat README.mdThe README contains:
- Learning objectives
- Background concepts
- Your task description
- Implementation hints
- Common pitfalls
- Resources and references
Edit problem.py and implement the TODOs:
Tips:
- Read all the hints in the docstrings
- Start with the
__init__method, thenforward - Print shapes to debug:
print(f"Shape: {tensor.shape}") - Don't worry about getting it perfect - iterate!
Run the tests to validate your solution:
# Run tests for the current problem
uv run pytest test_*.py -v
# Run with more details
uv run pytest test_*.py -v -sTests will show you:
- ✅ What's working correctly
- ❌ What needs to be fixed
- Helpful error messages
Use the Jupyter notebook to visualize and experiment:
# Start Jupyter (from project root)
uv run jupyter lab
# Or start from the problem directory
cd problems/01-embeddings
uv run jupyter notebook notebook.ipynbImportant: When the notebook opens, make sure to select the "Python (learn-gpt2)" kernel from the kernel selector in the top-right corner.
The notebooks include:
- Interactive visualizations
- Step-by-step execution
- Experimentation with different parameters
- Visual understanding of concepts
If you get stuck or want to compare your implementation:
# View the solution (try not to peek too early!)
cat solution.py
# Or run tests against the solution to see expected behavior
uv run pytest test_*.py -vOnce all tests pass, move to the next problem:
cd ../02-attention-basicsEach problem builds on the previous ones, so it's important to complete them in order.
# Test a specific problem
cd problems/01-embeddings
uv run pytest test_embeddings.py -v
# Test with coverage
uv run pytest test_embeddings.py -v --cov=. --cov-report=term-missingFrom the project root:
# Run all problem tests
uv run pytest problems/ -v
# Run tests in parallel (faster)
uv run pytest problems/ -v -n auto# Test the complete reference implementation
uv run pytest tests/ -vAll problems include interactive Jupyter notebooks for exploration and visualization.
# Start JupyterLab (recommended)
uv run jupyter lab
# Or start Jupyter Notebook
uv run jupyter notebook- Navigate to the problem directory (e.g.,
problems/01-embeddings/) - Open
notebook.ipynb - Select the "Python (learn-gpt2)" kernel from the top-right corner
- Run cells with
Shift+Enter
The notebooks include:
- 📊 Visualizations of attention patterns, embeddings, and activations
- 🔬 Interactive experiments with different parameters
- 📈 Plots and heatmaps to understand what's happening
- ✅ Verification cells to test your implementation
Format code with ruff:
uv run ruff format .Check code quality:
uv run ruff check .learn-gpt2/
├── problems/ # Progressive learning problems
│ ├── 01-embeddings/
│ ├── 02-attention-basics/
│ ├── ...
│ └── 12-pretrained-loading/
├── pyproject.toml # Project dependencies
└── README.md # This file
- Don't skip problems - Each builds on previous concepts
- Read the hints - Docstrings contain valuable implementation tips
- Start simple - Get basic functionality working, then refine
- Use the notebooks - Visual feedback helps understanding
- Compare with tests - Tests show expected behavior
- Try before peeking - Attempt implementation before checking solution
- Experiment - Modify parameters, try different approaches
- Read the papers - Links provided in each README
- Optimize - Think about efficiency and best practices
- Implement variations - Try different architectures (post-norm, etc.)
- Profile performance - Use PyTorch profiler to optimize
- Add features - Implement additional functionality
- Contribute - Submit improvements or additional problems
If you see ModuleNotFoundError in Jupyter notebooks:
- Make sure you've run
uv sync --all-extras - Select the correct kernel: "Python (learn-gpt2)" in the top-right corner of the notebook
- Restart the kernel: Kernel → Restart Kernel
- Make sure you've implemented all TODOs in
problem.py - Check that tensor shapes match expected dimensions
- Print intermediate shapes:
print(f"x.shape: {x.shape}") - Read the test error messages carefully - they often point to the issue
Install uv:
curl -LsSf https://astral.sh/uv/install.sh | shThen restart your terminal.
- Attention Is All You Need - Original Transformer paper
- Language Models are Unsupervised Multitask Learners - GPT-2 paper
- The Illustrated Transformer - Visual guide
- The Annotated Transformer - Line-by-line implementation
- The Annotated GPT-2 - GPT-2 walkthrough
- Let's reproduce GPT-2 - Andrej Karpathy's video tutorial
Contributions are welcome! Whether it's:
- Fixing typos or improving explanations
- Adding more tests or examples
- Creating new problems or variations
- Improving visualizations in notebooks
Please open an issue or submit a pull request.
This project is a learning resource and is not intended for production use.
- Andrej Karpathy for his inspiring YouTube channel and "Neural Networks: Zero to Hero" course
- OpenAI for GPT-2
- HuggingFace for the transformers library
- The PyTorch team
- All the excellent transformer tutorials and papers that made this possible
Happy learning! 🚀 If you find this helpful, please give it a ⭐!