Skip to content

Conversation

@MUDASSIR-75
Copy link

Description

This PR adds a new SupadataLoader to langchain_community.document_loaders that integrates with the Supadata Web & Video Data API.

Key points:

  • New SupadataLoader dataclass in libs/community/langchain_community/document_loaders/supadata.py.
  • Supports two operations:
    • operation="transcript" – fetches media transcripts with options:
      • lang (preferred language)
      • text (plain text vs structured)
      • mode ("native" | "auto" | "generate")
    • operation="metadata" – fetches structured media metadata.
  • API key handling:
    • Accepts api_key argument, or falls back to the SUPADATA_API_KEY environment variable.
  • Runtime dependency on supadata is optional:
    • The loader imports the Supadata SDK lazily inside _get_client.
    • A clear ImportError is raised with an install hint if supadata is not installed.
  • Returns Document objects with:
    • page_content as transcript text or JSON-serialized metadata.
    • metadata including source, supadata_operation, and optional lang / mode.
  • For long-running transcripts, handles job-based responses by returning a Document with an empty page_content and a job_id stored in metadata.

Tests:

  • Adds libs/community/tests/unit_tests/document_loaders/test_supadata_loader.py.
  • Tests patch the Supadata client and do not hit the real Supadata API.
  • Verified that:
    • metadata operation calls Supadata.metadata with the expected URL and params.
    • transcript operation handles immediate transcript results (with content).
    • transcript operation handles job-style results (with job_id).
    • API key resolution prefers explicit api_key over SUPADATA_API_KEY.

Issue

N/A – this is a new integration; not linked to an existing issue.

Dependencies

  • No new hard dependencies added to pyproject.toml.
  • At runtime, users must install the official Supadata SDK:
    • pip install supadata
  • Unit tests use a patched supadata module and do not require the real service.

Testing

From the libs/community directory:

cd libs/community/tests/unit_tests/document_loaders
uv run pytest test_supadata_loader.py

@MUDASSIR-75
Copy link
Author

Hi @mdrxy , could you please help approve the workflows for this PR? Thank you!

@MUDASSIR-75 MUDASSIR-75 force-pushed the feature/supadata-loader branch from d7b8d49 to 1bf9a6e Compare December 4, 2025 05:32
@MUDASSIR-75
Copy link
Author

Hi @ccurme,

I submitted this PR, adding a SupadataLoader integration. All tests have passed, but the workflows are awaiting maintainer approval to run.

Could you please:

  1. Approve the workflows so they can run
  2. Review the PR when you have a chance

This PR adds a new document loader for the Supadata Web & Video Data API with full tests and documentation.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant