Learn GPT-2: A Progressive Learning Path

A hands-on, problem-based learning curriculum for implementing GPT-2 from scratch using PyTorch. Learn transformer architecture through 12 progressive problems that build on each other.

🎯 What You'll Learn

By completing this learning path, you will:

✅ Understand transformer architecture deeply
✅ Implement GPT-2 from scratch in PyTorch
✅ Master attention mechanisms (self-attention, multi-head, causal masking)
✅ Learn layer normalization and residual connections
✅ Load and use pretrained GPT-2 weights from HuggingFace
✅ Gain practical PyTorch implementation experience

📚 Learning Path Overview

The curriculum consists of 12 progressive problems:

#	Problem	Difficulty	Status
1	Token & Position Embeddings	⭐ Easy	✅ Complete
2	Attention Basics	⭐⭐ Medium	✅ Complete
3	Scaled Dot-Product Attention	⭐⭐ Medium	✅ Complete
4	Multi-Head Attention	⭐⭐⭐ Hard	✅ Complete
5	Causal Masking	⭐⭐ Medium	✅ Complete
6	Feedforward Network	⭐ Easy	✅ Complete
7	Layer Normalization & Residuals	⭐⭐ Medium	✅ Complete
8	Complete Transformer Block	⭐⭐⭐ Hard	✅ Complete
9	GPT-2 Configuration	⭐ Easy	✅ Complete
10	Full GPT-2 Model Assembly	⭐⭐⭐ Hard	✅ Complete
11	Weight Initialization	⭐⭐ Medium	✅ Complete
12	Loading Pretrained Weights	⭐⭐⭐⭐ Very Hard	✅ Complete

🚀 Quick Start

Prerequisites

Python 3.11 or higher
uv package manager

Installation

Install uv if you haven't already:

curl -LsSf https://astral.sh/uv/install.sh | sh

Clone the repository and install dependencies:

git clone <repository-url>
cd learn-gpt2
uv sync --all-extras

This will install all dependencies including PyTorch, transformers, matplotlib, seaborn, jupyter, and pytest.

📖 How to Use This Learning Path

Problem Structure

Each problem directory contains:

problems/XX-problem-name/
├── README.md              # Learning objectives, concepts, and hints
├── problem.py             # Skeleton code with TODOs for you to implement
├── solution.py            # Complete working solution (don't peek!)
├── test_*.py              # Unit tests to validate your implementation
└── notebook.ipynb         # Interactive notebook for exploration

Learning Workflow

For each problem, follow this workflow:

1️⃣ Read the Problem

# Navigate to the problem directory
cd problems/01-embeddings

# Read the README to understand concepts
cat README.md

The README contains:

Learning objectives
Background concepts
Your task description
Implementation hints
Common pitfalls
Resources and references

2️⃣ Implement Your Solution

Edit problem.py and implement the TODOs:

Tips:

Read all the hints in the docstrings
Start with the __init__ method, then forward
Print shapes to debug: print(f"Shape: {tensor.shape}")
Don't worry about getting it perfect - iterate!

3️⃣ Test Your Implementation

Run the tests to validate your solution:

# Run tests for the current problem
uv run pytest test_*.py -v

# Run with more details
uv run pytest test_*.py -v -s

Tests will show you:

✅ What's working correctly
❌ What needs to be fixed
Helpful error messages

4️⃣ Explore Interactively

Use the Jupyter notebook to visualize and experiment:

# Start Jupyter (from project root)
uv run jupyter lab

# Or start from the problem directory
cd problems/01-embeddings
uv run jupyter notebook notebook.ipynb

Important: When the notebook opens, make sure to select the "Python (learn-gpt2)" kernel from the kernel selector in the top-right corner.

The notebooks include:

Interactive visualizations
Step-by-step execution
Experimentation with different parameters
Visual understanding of concepts

5️⃣ Compare with Solution

If you get stuck or want to compare your implementation:

# View the solution (try not to peek too early!)
cat solution.py

# Or run tests against the solution to see expected behavior
uv run pytest test_*.py -v

6️⃣ Move to Next Problem

Once all tests pass, move to the next problem:

cd ../02-attention-basics

Each problem builds on the previous ones, so it's important to complete them in order.

🧪 Running Tests

Test Individual Problems

# Test a specific problem
cd problems/01-embeddings
uv run pytest test_embeddings.py -v

# Test with coverage
uv run pytest test_embeddings.py -v --cov=. --cov-report=term-missing

Test All Problems

From the project root:

# Run all problem tests
uv run pytest problems/ -v

# Run tests in parallel (faster)
uv run pytest problems/ -v -n auto

Test the Reference Implementation

# Test the complete reference implementation
uv run pytest tests/ -v

📓 Interactive Notebooks

All problems include interactive Jupyter notebooks for exploration and visualization.

Starting Jupyter

# Start JupyterLab (recommended)
uv run jupyter lab

# Or start Jupyter Notebook
uv run jupyter notebook

Using the Notebooks

Navigate to the problem directory (e.g., problems/01-embeddings/)
Open notebook.ipynb
Select the "Python (learn-gpt2)" kernel from the top-right corner
Run cells with Shift+Enter

The notebooks include:

📊 Visualizations of attention patterns, embeddings, and activations
🔬 Interactive experiments with different parameters
📈 Plots and heatmaps to understand what's happening
✅ Verification cells to test your implementation

🔧 Development Tools

Code Formatting

Format code with ruff:

uv run ruff format .

Linting

Check code quality:

uv run ruff check .

📚 Project Structure

learn-gpt2/
├── problems/              # Progressive learning problems
│   ├── 01-embeddings/
│   ├── 02-attention-basics/
│   ├── ...
│   └── 12-pretrained-loading/
├── pyproject.toml         # Project dependencies
└── README.md              # This file

🎓 Learning Tips

For Beginners

Don't skip problems - Each builds on previous concepts
Read the hints - Docstrings contain valuable implementation tips
Start simple - Get basic functionality working, then refine
Use the notebooks - Visual feedback helps understanding
Compare with tests - Tests show expected behavior

For Intermediate Learners

Try before peeking - Attempt implementation before checking solution
Experiment - Modify parameters, try different approaches
Read the papers - Links provided in each README
Optimize - Think about efficiency and best practices

For Advanced Learners

Implement variations - Try different architectures (post-norm, etc.)
Profile performance - Use PyTorch profiler to optimize
Add features - Implement additional functionality
Contribute - Submit improvements or additional problems

🐛 Troubleshooting

Import Errors in Notebooks

If you see ModuleNotFoundError in Jupyter notebooks:

Make sure you've run uv sync --all-extras
Select the correct kernel: "Python (learn-gpt2)" in the top-right corner of the notebook
Restart the kernel: Kernel → Restart Kernel

Tests Failing

Make sure you've implemented all TODOs in problem.py
Check that tensor shapes match expected dimensions
Print intermediate shapes: print(f"x.shape: {x.shape}")
Read the test error messages carefully - they often point to the issue

uv Command Not Found

Install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh

Then restart your terminal.

📖 Additional Resources

Papers

Attention Is All You Need - Original Transformer paper
Language Models are Unsupervised Multitask Learners - GPT-2 paper

Tutorials

The Illustrated Transformer - Visual guide
The Annotated Transformer - Line-by-line implementation
The Annotated GPT-2 - GPT-2 walkthrough
Let's reproduce GPT-2 - Andrej Karpathy's video tutorial

Documentation

🤝 Contributing

Contributions are welcome! Whether it's:

Fixing typos or improving explanations
Adding more tests or examples
Creating new problems or variations
Improving visualizations in notebooks

Please open an issue or submit a pull request.

📄 License

This project is a learning resource and is not intended for production use.

🙏 Acknowledgments

Andrej Karpathy for his inspiring YouTube channel and "Neural Networks: Zero to Hero" course
OpenAI for GPT-2
HuggingFace for the transformers library
The PyTorch team
All the excellent transformer tutorials and papers that made this possible

Happy learning! 🚀 If you find this helpful, please give it a ⭐!

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.github/workflows		.github/workflows
problems		problems
src/learn_gpt2		src/learn_gpt2
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

License

msaelices/learn-gpt2

Folders and files

Latest commit

History

Repository files navigation