Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions 2026-04-ai-engineer/.claude/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"permissions": {
"allow": [
"Bash(uv run:*)"
]
}
}
3 changes: 3 additions & 0 deletions 2026-04-ai-engineer/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
mps/
mps.tar.gz
download.py
1 change: 1 addition & 0 deletions 2026-04-ai-engineer/.python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.14
38 changes: 38 additions & 0 deletions 2026-04-ai-engineer/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
.DEFAULT_GOAL := main

.PHONY: .uv
.uv: ## Check that uv is installed
@uv --version || echo 'Please install uv: https://docs.astral.sh/uv/getting-started/installation/'

.PHONY: install
install: .uv ## Install python dependencies
# --only-dev to avoid building the python package, use make dev for that
uv sync

.PHONY: format
format: ## Format Python code
uv run ruff format
uv run ruff check --fix --fix-only

.PHONY: lint
lint: ## Lint Python code with ruff
uv run ruff format --check
uv run ruff check
uv run basedpyright

.PHONY: main
main: format lint ## run formatting and linting

# (must stay last!)
.PHONY: help
help: ## Show this help (usage: make help)
@echo "Usage: make [recipe]"
@echo "Recipes:"
@awk '/^[a-zA-Z0-9_-]+:.*?##/ { \
helpMessage = match($$0, /## (.*)/); \
if (helpMessage) { \
recipe = $$1; \
sub(/:/, "", recipe); \
printf " \033[36mmake %-20s\033[0m %s\n", recipe, substr($$0, RSTART + 3, RLENGTH); \
} \
}' $(MAKEFILE_LIST)
53 changes: 53 additions & 0 deletions 2026-04-ai-engineer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Prompt Optimization with GEPA and Cached Wikipedia Pages

This example demonstrates automated prompt optimization using GEPA (Genetic-Pareto Prompt Evolution)
with pydantic-ai and pydantic-evals, using cached Wikipedia pages for all 650 UK MPs.

## Overview

The example shows how to:

- Generate a golden evaluation dataset from cached MP pages with a strong model
- Load evaluation cases from JSON instead of hand-authoring them in code
- Evaluate either all political relatives or just ancestor/parent-generation relatives
- Build a GEPA adapter that integrates pydantic-evals with GEPA
- Resolve agent instructions from a Logfire managed variable
- Use a local Logfire variable provider so GEPA can optimize the same runtime-configurable value
- Run automated prompt optimization that improves based on evaluation feedback

## Running the Example

```bash
# Sync dependencies
uv sync

# Generate a golden dataset for the first 100 MPs
uv run -m main generate-cases --limit 100 --model openai:gpt-5

# Evaluate ancestor-only extraction on the test split
uv run -m main eval --split test --focus ancestors --prompt-style initial

# Compare initial vs expert prompts on the same task/model
uv run -m main compare --split test --focus ancestors

# Run optimization using train/val splits from the generated file
uv run -m main optimize --train-split train --val-split val --focus ancestors --max-calls 50
```

## Files

- `task.py` - Extraction schema, page preprocessing, relation filtering, and agent definition
- `cases.py` - Golden dataset generation and JSON persistence
- `evals.py` - Dataset loading and evaluation metrics
- `adapter.py` - GEPA adapter that bridges pydantic-evals with GEPA
- `main.py` - CLI script for running evaluation and optimization

## Notes

- The MP pages are read from the local `mps/` archive, not from live Wikipedia requests.
- `generate-cases` is resumable. Re-run it with a higher `--limit`, `--all`, or a different `--output`.
- The generated file stores the full set of political relatives. Evaluation can then filter to
`--focus ancestors` without regenerating the golden data.
- The CLI configures Logfire with a local managed-variable provider. The agent reads its
instructions from `relations_instructions`, so the same code path can use a remote managed
variable in a deployed server and a local provider during GEPA optimization.
Loading