Tanvir/website database api routes #4657

tsbhangu · 2025-10-31T23:16:21Z

Summary

Add database schema and migration for websites table
Add API endpoints for website indexing
Integrate with IndexSourceDb for source tracking
Add Turbopuffer sync for vector search

Details

This PR builds on the website crawler infrastructure (#4656) to add the API and database layer:

Database:

New websites table migration
WebsiteDb model with full metadata support
Integration with IndexSourceDb for job tracking

API Endpoints:

POST /sources/website/{domain}/index - Start website crawling
GET /sources/website/{domain}/status - Check crawl job status
GET /sources/website/{domain}/{website_id} - Get specific page
GET /sources/website/{domain} - List all indexed pages
POST /sources/website/{domain}/reindex - Re-crawl website
DELETE /sources/website/{domain}/delete - Delete specific website
DELETE /sources/website/{domain}/delete-all - Delete all websites

Features:

Background job processing for crawling
Real-time status tracking
Automatic Turbopuffer sync for search
Proper error handling and rollback

Dependencies

Requires PR Add website crawler infrastructure #4656 (Website Crawler Infrastructure) to be merged first

Test plan

Database migrations verified
Background jobs tested with real crawling

vercel · 2025-10-31T23:16:26Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Updated (UTC)
dev.ferndocs.com	Error		Nov 5, 2025 4:47am
fern-dashboard	Ready	Preview	Nov 5, 2025 4:47am
fern-dashboard-dev	Ready	Preview	Nov 5, 2025 4:47am
ferndocs.com	Ready	Preview	Nov 5, 2025 4:47am
preview.ferndocs.com	Error		Nov 5, 2025 4:47am
prod-assets.ferndocs.com	Error		Nov 5, 2025 4:47am
prod.ferndocs.com	Error		Nov 5, 2025 4:47am

1 Skipped Deployment

Project	Deployment	Preview	Updated (UTC)
fern-platform	Ignored		Nov 5, 2025 4:47am

servers/fai/src/fai/routes/website.py

eyw520 · 2025-11-04T19:48:53Z

servers/fai/src/fai/utils/website/__init__.py

 from fai.utils.website.crawler import DocumentationCrawler
 from fai.utils.website.extractor import ContentExtractor
-from fai.utils.website.models import DocumentChunk
+from fai.utils.website.jobs import crawl_website_job


We use direct imports throughout the codebase, let's remove the init.py file.

eyw520 · 2025-11-04T19:50:43Z

servers/fai/src/fai/main.py

        port=8080,
        server_header=False,
-        reload=VARIABLES.IS_LOCAL,
+        reload=False,


I'd rather keep the local reload behavior - thoughts?

Yup sorry had sneaked in from testing

eyw520 · 2025-11-04T19:52:27Z

servers/fai/src/fai/models/api/website_api.py

+
+
+class ReindexWebsiteRequest(BaseModel):
+    base_url: str = Field(description="The base URL to re-crawl (will delete old pages and re-index)")


Noticing this reuses the previous config. What if we wanted to update settings like the chunk overlap?

That's fair. We can add them all as optional arguments and override the initial config with the values someone specifies

- Add database migration for websites table - Add WebsiteDb model and API types - Add API endpoints for website indexing (/sources/website/{domain}/...) - Add integration with IndexSourceDb for job tracking - Add Turbopuffer sync functionality for vector search - Update OpenAPI spec with new endpoints 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

- Remove obvious comments that don't add value - Keep comments that explain non-obvious logic (e.g., status determination, fresh DB sessions)

- Extract crawl_website_job to utils/website/jobs.py for better separation - Add WebsiteCrawlConfig domain model with default values - Implement selective sync functions for websites (sync_websites_to_tpuf, sync_websites_to_query_index) - Track website IDs during crawl for incremental syncing - Update delete operations to use selective deletion - Add comprehensive test suite (12 route tests + 10 sync tests) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

…ation - Fixed af951c45da91 to reference 1a06a4d351f9 instead of missing 2d743e49aaa1 - Created merge migration to combine two branches from initial schema - Regenerated websites table migration with proper revision chain - Migration chain: 1a06a4d351f9 -> [af951c45da91, 62afaf912daa] -> 7440621afbb0 -> 8e63cf285ea3 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

tsbhangu requested a review from eyw520 as a code owner October 31, 2025 23:16

vercel bot deployed to Preview – fern-dashboard October 31, 2025 23:16 View deployment

vercel bot deployed to Preview – fern-dashboard-dev October 31, 2025 23:17 View deployment

tsbhangu force-pushed the tanvir/website-database-api-routes branch from 01a17ba to 9925581 Compare October 31, 2025 23:20

tsbhangu had a problem deploying to Fern Dev October 31, 2025 23:20 — with GitHub Actions Failure

vercel bot deployed to Preview – ferndocs.com October 31, 2025 23:20 View deployment

vercel bot deployed to Preview – ferndocs.com October 31, 2025 23:22 View deployment

tsbhangu temporarily deployed to Fern Dev October 31, 2025 23:22 — with GitHub Actions Inactive

vercel bot deployed to Preview – ferndocs.com October 31, 2025 23:25 View deployment

tsbhangu temporarily deployed to Fern Dev October 31, 2025 23:26 — with GitHub Actions Inactive

vercel bot deployed to Preview – preview.ferndocs.com October 31, 2025 23:30 View deployment

vercel bot deployed to Preview – dev.ferndocs.com October 31, 2025 23:31 View deployment

vercel bot deployed to Preview – prod-assets.ferndocs.com October 31, 2025 23:31 View deployment

vercel bot deployed to Preview – prod.ferndocs.com October 31, 2025 23:31 View deployment

vercel bot deployed to Preview – fern-dashboard October 31, 2025 23:31 View deployment

vercel bot deployed to Preview – ferndocs.com October 31, 2025 23:32 View deployment

tsbhangu temporarily deployed to Fern Dev October 31, 2025 23:32 — with GitHub Actions Inactive

vercel bot deployed to Preview – prod-assets.ferndocs.com October 31, 2025 23:38 View deployment

vercel bot deployed to Preview – prod.ferndocs.com October 31, 2025 23:38 View deployment

vercel bot deployed to Preview – preview.ferndocs.com October 31, 2025 23:38 View deployment

vercel bot deployed to Preview – dev.ferndocs.com October 31, 2025 23:38 View deployment

vercel bot deployed to Preview – fern-dashboard October 31, 2025 23:39 View deployment

vercel bot deployed to Preview – fern-dashboard-dev October 31, 2025 23:40 View deployment

vercel bot reviewed Oct 31, 2025

View reviewed changes

servers/fai/src/fai/routes/website.py Outdated Show resolved Hide resolved

vercel bot deployed to Preview – ferndocs.com November 1, 2025 00:35 View deployment

tsbhangu temporarily deployed to Fern Dev November 1, 2025 00:35 — with GitHub Actions Inactive

vercel bot deployed to Preview – prod-assets.ferndocs.com November 1, 2025 00:40 View deployment

vercel bot deployed to Preview – prod.ferndocs.com November 1, 2025 00:40 View deployment

vercel bot deployed to Preview – dev.ferndocs.com November 1, 2025 00:40 View deployment

vercel bot deployed to Preview – fern-dashboard-dev November 4, 2025 19:51 View deployment

vercel bot deployed to Preview – fern-dashboard November 4, 2025 19:51 View deployment

eyw520 approved these changes Nov 4, 2025

View reviewed changes

tsbhangu had a problem deploying to Fern Dev November 5, 2025 04:32 — with GitHub Actions Error

vercel bot deployed to Preview – ferndocs.com November 5, 2025 04:32 View deployment

vercel bot deployed to Preview – ferndocs.com November 5, 2025 04:33 View deployment

tsbhangu temporarily deployed to Fern Dev November 5, 2025 04:33 — with GitHub Actions Inactive

tsbhangu temporarily deployed to Fern Dev November 5, 2025 04:37 — with GitHub Actions Inactive

vercel bot deployed to Preview – ferndocs.com November 5, 2025 04:37 View deployment

tsbhangu and others added 9 commits November 4, 2025 21:38

Remove unnecessary comments from website routes

acfcfc1

- Remove obvious comments that don't add value - Keep comments that explain non-obvious logic (e.g., status determination, fresh DB sessions)

Regenerate openapi.json with website routes

c2c2bbd

Openapi spec

061caae

Remvoe formatting chagnes

bbe4bed

Eden's review

e622ad5

Remove function

46c7b9f

tsbhangu force-pushed the tanvir/website-database-api-routes branch from 0879e22 to 46c7b9f Compare November 5, 2025 04:39

tsbhangu had a problem deploying to Fern Dev November 5, 2025 04:39 — with GitHub Actions Error

vercel bot deployed to Preview – ferndocs.com November 5, 2025 04:39 View deployment

OpenAPI

a801565

vercel bot deployed to Preview – ferndocs.com November 5, 2025 04:40 View deployment

tsbhangu deployed to Fern Dev November 5, 2025 04:40 — with GitHub Actions Active

vercel bot had a problem deploying to Preview – preview.ferndocs.com November 5, 2025 04:43 Failure

vercel bot had a problem deploying to Preview – prod-assets.ferndocs.com November 5, 2025 04:43 Failure

vercel bot had a problem deploying to Preview – dev.ferndocs.com November 5, 2025 04:44 Failure

vercel bot had a problem deploying to Preview – prod.ferndocs.com November 5, 2025 04:44 Failure

vercel bot deployed to Preview – fern-dashboard November 5, 2025 04:47 View deployment

vercel bot deployed to Preview – fern-dashboard-dev November 5, 2025 04:47 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tanvir/website database api routes #4657

Tanvir/website database api routes #4657

Uh oh!

tsbhangu commented Oct 31, 2025

Uh oh!

vercel bot commented Oct 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

eyw520 Nov 4, 2025

Uh oh!

tsbhangu Nov 5, 2025

Uh oh!

eyw520 Nov 4, 2025

Uh oh!

tsbhangu Nov 5, 2025

Uh oh!

eyw520 Nov 4, 2025

Uh oh!

tsbhangu Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		class ReindexWebsiteRequest(BaseModel):
		base_url: str = Field(description="The base URL to re-crawl (will delete old pages and re-index)")

Tanvir/website database api routes #4657

Are you sure you want to change the base?

Tanvir/website database api routes #4657

Uh oh!

Conversation

tsbhangu commented Oct 31, 2025

Summary

Details

Dependencies

Test plan

Uh oh!

vercel bot commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

eyw520 Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

tsbhangu Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

eyw520 Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

tsbhangu Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

eyw520 Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

tsbhangu Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vercel bot commented Oct 31, 2025 •

edited

Loading