It is a demonstration application that uses ScreenshotOne API to render a full-page screenshot of website pages, apply OCR to it, and search for given patterns.
It uses the following technologies:
Check out more examples in the ScreenshotOne examples repository.
You provide an URL via the CLI argument and the application will:
- Take a full-page screenshot of the given URL;
- Split the screenshot into multiple parts;
- Apply OCR to each part;
- Search for the given patterns in the OCRed parts by asking AI to answer that if the pattern is present in the part;
- Then it will get the HTML content of the page;
- Parse links and navigate to the internal links.
- And repeat the process from step 1 for the new page till it finds the match of the given patterns on the OCRed part of the page.
- Then it will print the results.
The code was written with the help of Cursor as specified in the instructions.
- Clone the repository:
git clone https://github.com/screenshotone/examples.git
- Go to the
examples/python/vision-researcher
directory:
cd examples/python/vision-researcher
- Install the dependencies:
pip install -r requirements.txt
- Create a
.env
file and set the following environment variables:
SCREENSHOTONE_API_KEY=your_screenshotone_api_key
OPENAI_API_KEY=your_openai_api_key
- Run the application:
python vision_researcher.py <url> <prompt> <max pages>
For example, to search for the content containing "testimonials" on the ScreenshotOne website:
python vision_researcher.py https://screenshotone.com "Does the website page or the website page parts contain testimonials?" 5
The results will be printed in the console: