Skip to content

Question: What is the best way to use website crawler in a workspace? #605

@azaylamba

Description

@azaylamba

The website crawler feature is a great feature and can be used to ingest webpages in the workspace. Just wondering, what is the best way to update the workspace when some of the webpages now have updated content after initial crawling is done.
I believe we would need to crawl the website again, would that result in duplicate documents in the workspace and vector database?
How to avoid the duplication and update the workspace with updated webpages?
Should we create a new workspace and crawl the website again? This doesn't seem scalable when the website content is being updated frequently.

What should be the best approach in this situation?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions