Skip to content

Sync: Extraction API change (from #10653) #122

@RapierCraft

Description

@RapierCraft

Ecosystem Sync

Source: RapierCraftStudios/AlterLab @ main
PR: #10653
Commits:

4fdd09c Merge pull request #10653 from RapierCraftStudios/staging
45a20eb Merge pull request #10655 from RapierCraftStudios/fix/ci-tests-recovery-openapi
b410f35 fix(ci): update recovery test assertions and add Stripe stubs to OpenAPI export
2c4f1a7 Merge pull request #10654 from RapierCraftStudios/fix/ci-staging-isort-openapi
cff897a fix(ci): resolve isort violation and OpenAPI export path for CI compliance
07a19df Merge pull request #10643 from RapierCraftStudios/milestone/api-docs-auto-sync
e5738d8 merge: resolve import conflict in scrape schema
ae63c9f Merge pull request #10639 from RapierCraftStudios/fix/scan-concurrency-cap
6d16b2e fix(recovery): add concurrency cap to cancel-refund sweep (#10572)
b3f1d10 Merge pull request #10631 from RapierCraftStudios/fix/refund-sanity-check
d5b06aa Merge pull request #10630 from RapierCraftStudios/feat/ci-spec-export
9802184 fix(recovery): add upper-bound sanity check on refund amount from Redis (#10571)
1eb7e55 feat(ci): add docs validation and spec export to test-web job (#10559)
7191407 feat(ci): export OpenAPI spec before web Docker build (#10559)
729ae30 feat(ci): add shell wrapper for OpenAPI spec export (#10559)
c74c08c Merge pull request #10627 from RapierCraftStudios/fix/cancel-ttl-refresh
65bb85e fix(crawl): refresh metadata TTL at cancel start to prevent credit stranding (#10575)
894fa90 Merge pull request #10623 from RapierCraftStudios/fix/crawlhistory-sweep-columns
e922fe2 Merge pull request #10621 from RapierCraftStudios/feat/sdk-sync-openapi
03c98d5 fix(recovery): add missing credits_used/completed_pages to sweep CrawlHistory update (#10569)
3c23a43 Merge pull request #10614 from RapierCraftStudios/feat/x-internal-markers
61c7ef5 feat(ci): update ecosystem-sync to use OpenAPI spec for SDK drift (#10558)
bbc175e feat(sync): add SDK drift detection script against OpenAPI spec (#10558)
9c036bc feat(schemas): add x-internal markers to business-sensitive fields (#10557)
776bc2a Merge pull request #10613 from RapierCraftStudios/feat/makefile-sync-validate
77ec1cd feat(make): add sync-docs and validate-docs targets (#10555)
e21c2ce Merge pull request #10612 from RapierCraftStudios/fix/blocked-headers-diverge
256f1b6 Merge pull request #10611 from RapierCraftStudios/fix/patchright-counter-success
480417b Merge pull request #10610 from RapierCraftStudios/fix/recovery-previous-status
05d91cb fix(schemas): import canonical BLOCKED_HEADERS from scrape.py (#10570)
af6a1cc fix(scraper): count patchright fallback successes not attempts (#10574)
487099b fix(recovery): capture previous_status before mutation (#10568)
f0df7fd Merge pull request #10609 from RapierCraftStudios/feat/enrich-decorators
f83a8b4 Merge pull request #10608 from RapierCraftStudios/feat/doc-undocumented-endpoints
093f972 feat(api): enrich FastAPI decorators with OpenAPI metadata (#10551)
e78f01d Merge pull request #10607 from RapierCraftStudios/fix/active-crawls-slot-leak
bea7b39 docs(api): document undocumented public endpoints (#10556)
00977ac Merge pull request #10594 from RapierCraftStudios/fix/blocked-launch-args
8a29fd5 Merge pull request #10596 from RapierCraftStudios/fix/double-db-session-sweep
1d9a0de fix(recovery): release active_crawls slot when crawl metadata expires (#10566)
128ef6d Merge pull request #10595 from RapierCraftStudios/fix/js-blocklist-bypass
a625489 Merge pull request #10593 from RapierCraftStudios/fix/c1-injection-chars
4d695d6 Merge pull request #10592 from RapierCraftStudios/feat/validate-docs
9862047 fix(recovery): use single DB session for refund and history update (#10567)
23b0efb fix(scraper): filter dangerous Chromium flags from launch_options args (#10565)
7cfd0f8 fix(shared): block String.fromCharCode/Reflect.apply/Reflect.construct/Proxy JS bypass vectors (#10564)
164c5f9 feat(scripts): add docs-vs-OpenAPI-spec consistency validator (#10553)
6810e5f fix(shared): add C1 control chars to worker _INJECTION_CHARS (#10563)
0e840cc Merge pull request #10591 from RapierCraftStudios/feat/serve-generated-spec
8f080fc feat(web): serve generated OpenAPI spec instead of hand-written object (#10550)
b198d17 Merge pull request #10577 from RapierCraftStudios/feat/openapi-export-script
a07099b feat(scripts): add OpenAPI export script with admin/business data filtering (#10549)
019abea Merge pull request #10561 from RapierCraftStudios/feat/admin-include-in-schema
2a049ab Merge pull request #10562 from RapierCraftStudios/feat/doc-inaccuracy-fixes
af86b00 feat(api): add include_in_schema=False to all admin and internal routers (#10552)
f892e6b fix(docs): correct timeout default, formats list, and webhook path (#10554)

Detected by: ecosystem-sync.yml workflow


What to Update in MCP Server

  1. Extract tool — Update schema in src/tools/extract.ts
  2. Types — New profiles or schema types in src/types.ts

Files to modify

  • src/tools/extract.ts
  • src/types.ts

API Changes (Diff)

services/api/app/routers/extract.py

diff --git a/services/api/app/routers/extract.py b/services/api/app/routers/extract.py
index 3045574..91b9f66 100644
--- a/services/api/app/routers/extract.py
+++ b/services/api/app/routers/extract.py
@@ -431,7 +431,24 @@ def _wrap_as_html(content: str, content_type: str) -> str:
 
 
 @router.post(
-    "", response_model=ExtractResponse, dependencies=[Depends(require_scopes_optimized(["write"]))]
+    "",
+    response_model=ExtractResponse,
+    summary="Extract structured data from content",
+    description=(
+        "Extract structured data from raw HTML, text, or markdown without "
+        "scraping. Supports multiple extraction pipelines: format conversion "
+        "(text, json, json_v2, html, markdown, rag), schema-based filtering, "
+        "LLM extraction via prompt, and profile-based extraction (product, "
+        "article, job_posting, etc.).\n\n"
+        "**Cost**: 1 credit per extraction request."
+    ),
+    response_description="Extracted content in the requested format(s)",
+    responses={
+        200: {"description": "Extraction completed successfully"},
+        402: {"description": "Insufficient credits"},
+        422: {"description": "Invalid content or extraction parameters"},
+    },
+    dependencies=[Depends(require_scopes_optimized(["write"]))],
 )
 async def extract_content(
     request: ExtractRequest,

Source Files Changed

  • services/api/app/routers/extract.py

Acceptance Criteria

  • Parameter/endpoint parity with AlterLab API
  • TypeScript types updated
  • Build passes (npm run build)
  • Tested against local AlterLab instance

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium prioritysyncSync with AlterLab API changes

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions