Skip to content

[Solutions] Crawl4ai integration #319

@jstanden

Description

@jstanden

See: https://docs.crawl4ai.com/core/docker-deployment/

This is an API for crawling HTML websites and converting them to Markdown for LLM/AI. It can be self-hosted locally in Docker.

This is a Bearer Token connected account type. The token is defined in an environment variable.

We need a solutions integration page, social logo, and connected service package in Cerb.

When you start a crawl task via API you have to follow up again to check for task completion.

Code examples:

start:
  http.request/crawl:
    output: http_response
    inputs:
      method: POST
      url: http://host.docker.internal:11235/crawl
      headers:
        Content-Type: application/json
        Authorization: Bearer s3cr3t
      body:
        urls@list:
          https://cerb.ai/
        priority: 10
        css_selector: div.page-content
        #browser_config:
        #  headless@bool: yes
        #crawler_config:
        #  stream@bool: no
    on_success:
    on_error:
start:
  set:
    task_id: dac1e293-812b-472f-8ab4-8bf275e87da3
  
  http.request/task:
    output: http_response
    inputs:
      method: GET
      url: http://host.docker.internal:11235/task/{{task_id}}
      headers:
        Authorization: Bearer s3cr3t
    on_success:
      set:
        json_response@json: {{http_response.body}}

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions