Thirdweb Documentation Scraper

A comprehensive tool for scraping, processing, and organizing Thirdweb TypeScript API documentation into a structured local markdown repository.

Features

Web Scraping: Traverses the Thirdweb documentation site to extract content
Markdown Conversion: Converts HTML content to clean, well-formatted Markdown
Intelligent Categorization: Organizes documentation into meaningful categories:
- UI Components
- React Hooks
- Core Functions
- Advanced Topics
Index Generation: Creates navigation indexes for each category
Content Cleaning: Removes unnecessary boilerplate and formats code blocks

Project Components

Improved Scraper (improved_scraper.py): Main scraper with enhanced functionality
Reorganization Tool (reorganize_docs.py): Sorts and categorizes documentation files
Markdown Cleaner (markdown_cleaner.py): Cleans and formats scraped Markdown files

Requirements

Python 3.x
Required libraries listed in requirements.txt

Setup and Usage

Setup Environment

./setup_venv.sh

Run Full Documentation Pipeline

For the complete process (scraping, cleaning, and organizing):

./run_improved_scraper.sh

Run Only Reorganization

If you already have scraped documentation and want to reorganize it:

python reorganize_docs.py

Directory Structure

The scraped content is organized as follows:

thirdweb_typescript_docs/
├── UI Components/
│   ├── 00_index.md
│   ├── Component1.md
│   └── ...
├── React Hooks/
│   ├── 00_index.md
│   ├── Hook1.md
│   └── ...
├── Core Functions/
│   ├── 00_index.md
│   ├── Function1.md
│   └── ...
└── Advanced Topics/
    ├── 00_index.md
    ├── Topic1.md
    └── ...

Additional Resources

ScraperBuildGuide.md: Detailed guide for building similar documentation scrapers
reorganize_docs.py: Script for categorizing documentation based on content patterns

Why This Project

This project helps developers maintain an up-to-date local copy of Thirdweb documentation for:

Offline access
Training AI models on the Thirdweb TypeScript SDK
Creating customized knowledge bases
Enhancing developer workflows with searchable documentation

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
archive		archive
thirdweb_typescript_docs		thirdweb_typescript_docs
thirdweb_typescript_docs_backup		thirdweb_typescript_docs_backup
.gitignore		.gitignore
DocScraperGuide.md		DocScraperGuide.md
README.md		README.md
improved_scraper.py		improved_scraper.py
kill_scraper.sh		kill_scraper.sh
markdown_cleaner.py		markdown_cleaner.py
reorganize_docs.py		reorganize_docs.py
requirements.txt		requirements.txt
run_improved_scraper.sh		run_improved_scraper.sh
setup_venv.sh		setup_venv.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Thirdweb Documentation Scraper

Features

Project Components

Requirements

Setup and Usage

Setup Environment

Run Full Documentation Pipeline

Run Only Reorganization

Directory Structure

Additional Resources

Why This Project

About

Uh oh!

Releases

Packages

Uh oh!

Languages

cobibean/ThirdwebDocsScraper

Folders and files

Latest commit

History

Repository files navigation

Thirdweb Documentation Scraper

Features

Project Components

Requirements

Setup and Usage

Setup Environment

Run Full Documentation Pipeline

Run Only Reorganization

Directory Structure

Additional Resources

Why This Project

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages