Skip to content

cobibean/ThirdwebDocsScraper

Repository files navigation

Thirdweb Documentation Scraper

A comprehensive tool for scraping, processing, and organizing Thirdweb TypeScript API documentation into a structured local markdown repository.

Features

  • Web Scraping: Traverses the Thirdweb documentation site to extract content
  • Markdown Conversion: Converts HTML content to clean, well-formatted Markdown
  • Intelligent Categorization: Organizes documentation into meaningful categories:
    • UI Components
    • React Hooks
    • Core Functions
    • Advanced Topics
  • Index Generation: Creates navigation indexes for each category
  • Content Cleaning: Removes unnecessary boilerplate and formats code blocks

Project Components

  • Improved Scraper (improved_scraper.py): Main scraper with enhanced functionality
  • Reorganization Tool (reorganize_docs.py): Sorts and categorizes documentation files
  • Markdown Cleaner (markdown_cleaner.py): Cleans and formats scraped Markdown files

Requirements

  • Python 3.x
  • Required libraries listed in requirements.txt

Setup and Usage

Setup Environment

./setup_venv.sh

Run Full Documentation Pipeline

For the complete process (scraping, cleaning, and organizing):

./run_improved_scraper.sh

Run Only Reorganization

If you already have scraped documentation and want to reorganize it:

python reorganize_docs.py

Directory Structure

The scraped content is organized as follows:

thirdweb_typescript_docs/
├── UI Components/
│   ├── 00_index.md
│   ├── Component1.md
│   └── ...
├── React Hooks/
│   ├── 00_index.md
│   ├── Hook1.md
│   └── ...
├── Core Functions/
│   ├── 00_index.md
│   ├── Function1.md
│   └── ...
└── Advanced Topics/
    ├── 00_index.md
    ├── Topic1.md
    └── ...

Additional Resources

  • ScraperBuildGuide.md: Detailed guide for building similar documentation scrapers
  • reorganize_docs.py: Script for categorizing documentation based on content patterns

Why This Project

This project helps developers maintain an up-to-date local copy of Thirdweb documentation for:

  1. Offline access
  2. Training AI models on the Thirdweb TypeScript SDK
  3. Creating customized knowledge bases
  4. Enhancing developer workflows with searchable documentation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published