Skip to content

mehdone/wordeeds-ingest

Repository files navigation

wordeeds-ingest

Cold-start promise ingestion for Wordeeds — a Python service that pulls candidate political promises from official sources (whitehouse.gov briefings & remarks today; press galleries, the Congressional Record, and others to follow), runs LLM-based extraction, and POSTs survivors to the Spring app's POST /api/claim/ingest endpoint.

Multiple sources can run in a single pass (SOURCES=whitehouse,foo,bar). Each source is isolated, so one failing fetch will not stop the others.

All ingested claims are attributed to the seeded system@wordeeds.internal user on the Spring side.

Status

Scaffolded end-to-end with one real source (whitehouse). Full design lives in the sibling repo at wordeeds-api/docs/cold-start-ingestion.md.

Sources

name what it pulls env knobs
whitehouse whitehouse.gov press briefings, speeches & remarks, statements WHITEHOUSE_BASE_URL, WHITEHOUSE_SECTIONS, WHITEHOUSE_MAX_PER_SECTION
hansard UK Parliament (JSON API) + Canada (HTML). AU opt-in. HANSARD_JURISDICTIONS, HANSARD_MAX_PER_INDEX, HANSARD_UK_SEARCH_TERM, HANSARD_<UK|CA|AU>_BASE_URL, HANSARD_<UK|CA|AU>_INDEX_PATHS, HANSARD_<UK|CA|AU>_ARTICLE_PREFIXES

Add new sources by dropping a module in src/wordeeds_ingest/sources/ and registering it in sources/__init__.py.

Contract

  • Target: POST {SPRING_BASE_URL}/api/claim/ingest
  • Auth: static API key in X-Ingest-Key header (env: INGEST_API_KEY, shared with the Spring app)
  • Body: { politicianId, claim, context, claimedAt, deadline?, videoUrl?, verifiedNewsUrl? } — see wordeeds-api/src/main/java/com/wordeeds/api/controller/IngestController.java for the canonical DTO.

Planned stack

  • Python 3.12+, uv for env/deps
  • httpx, pydantic + pydantic-settings, feedparser, trafilatura, anthropic, structlog
  • pytest + respx for tests

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors