Cold-start promise ingestion for Wordeeds — a Python service that pulls candidate political promises from official sources (whitehouse.gov briefings & remarks today; press galleries, the Congressional Record, and others to follow), runs LLM-based extraction, and POSTs survivors to the Spring app's POST /api/claim/ingest endpoint.
Multiple sources can run in a single pass (SOURCES=whitehouse,foo,bar). Each source is isolated, so one failing fetch will not stop the others.
All ingested claims are attributed to the seeded system@wordeeds.internal user on the Spring side.
Scaffolded end-to-end with one real source (whitehouse). Full design lives in the sibling repo at wordeeds-api/docs/cold-start-ingestion.md.
| name | what it pulls | env knobs |
|---|---|---|
whitehouse |
whitehouse.gov press briefings, speeches & remarks, statements | WHITEHOUSE_BASE_URL, WHITEHOUSE_SECTIONS, WHITEHOUSE_MAX_PER_SECTION |
hansard |
UK Parliament (JSON API) + Canada (HTML). AU opt-in. | HANSARD_JURISDICTIONS, HANSARD_MAX_PER_INDEX, HANSARD_UK_SEARCH_TERM, HANSARD_<UK|CA|AU>_BASE_URL, HANSARD_<UK|CA|AU>_INDEX_PATHS, HANSARD_<UK|CA|AU>_ARTICLE_PREFIXES |
Add new sources by dropping a module in src/wordeeds_ingest/sources/ and registering it in sources/__init__.py.
- Target:
POST {SPRING_BASE_URL}/api/claim/ingest - Auth: static API key in
X-Ingest-Keyheader (env:INGEST_API_KEY, shared with the Spring app) - Body:
{ politicianId, claim, context, claimedAt, deadline?, videoUrl?, verifiedNewsUrl? }— seewordeeds-api/src/main/java/com/wordeeds/api/controller/IngestController.javafor the canonical DTO.
- Python 3.12+,
uvfor env/deps httpx,pydantic+pydantic-settings,feedparser,trafilatura,anthropic,structlogpytest+respxfor tests