Open Deep Research

An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models.

The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction over time and deep dive into a topic. Goal is to keep the repo size at <500 LoC so it is easy to understand and build on top of.

If you like this project, please consider starring it and giving me a follow on X/Twitter. This project is sponsored by Aomni.

How It Works

flowchart TB
    subgraph Input
        Q[User Query]
        B[Breadth Parameter]
        D[Depth Parameter]
    end

    DR[Deep Research] -->
    SQ[SERP Queries] -->
    PR[Process Results]

    subgraph Results[Results]
        direction TB
        NL((Learnings))
        ND((Directions))
    end

    PR --> NL
    PR --> ND

    DP{depth > 0?}

    RD["Next Direction:
    - Prior Goals
    - New Questions
    - Learnings"]

    MR[Markdown Report]

    %% Main Flow
    Q & B & D --> DR

    %% Results to Decision
    NL & ND --> DP

    %% Circular Flow
    DP -->|Yes| RD
    RD -->|New Context| DR

    %% Final Output
    DP -->|No| MR

    %% Styling
    classDef input fill:#7bed9f,stroke:#2ed573,color:black
    classDef process fill:#70a1ff,stroke:#1e90ff,color:black
    classDef recursive fill:#ffa502,stroke:#ff7f50,color:black
    classDef output fill:#ff4757,stroke:#ff6b81,color:black
    classDef results fill:#a8e6cf,stroke:#3b7a57,color:black

    class Q,B,D input
    class DR,SQ,PR process
    class DP,RD recursive
    class MR output
    class NL,ND results

Features

Iterative Research: Performs deep research by iteratively generating search queries, processing results, and diving deeper based on findings
Intelligent Query Generation: Uses LLMs to generate targeted search queries based on research goals and previous findings
Depth & Breadth Control: Configurable parameters to control how wide (breadth) and deep (depth) the research goes
Smart Follow-up: Generates follow-up questions to better understand research needs
Comprehensive Reports: Produces detailed markdown reports with findings and sources
Concurrent Processing: Handles multiple searches and result processing in parallel for efficiency

Requirements

Node.js environment
API keys for:
- Firecrawl API (for web search and content extraction)
- One of the following AI providers:
  - NVIDIA API (recommended - access to DeepSeek R1, Llama 3.1 405B, Nemotron 70B)
  - Fireworks AI (for DeepSeek R1)
  - OpenAI API (for GPT-4o-mini)

Setup

Node.js

Clone the repository
Install dependencies:

npm install

Set up environment variables in a .env.local file:

FIRECRAWL_KEY="your_firecrawl_key"
# If you want to use your self-hosted Firecrawl, add the following below:
# FIRECRAWL_BASE_URL="http://localhost:3002"

# NVIDIA API (build.nvidia.com) - Recommended
NVIDIA_API_KEY="your_nvidia_api_key"

# Alternative: OpenAI API (fallback)
OPENAI_KEY="your_openai_key"

# Alternative: Fireworks AI (for DeepSeek R1)
# FIREWORKS_KEY="your_fireworks_key"

Model Priority

The system automatically selects the best available model in this order:

DeepSeek R1 (Fireworks) - if FIREWORKS_KEY is set
DeepSeek R1 (NVIDIA) - if NVIDIA_API_KEY is set ⭐ Recommended
Llama 3.1 405B (NVIDIA) - Most capable model
Nemotron 70B (NVIDIA) - NVIDIA's research-optimized model
Llama 3.1 70B (NVIDIA) - Strong general purpose model
GPT-4o-mini (OpenAI) - Fallback option

To use local LLM, comment out other API keys and instead set OPENAI_ENDPOINT and CUSTOM_MODEL:

Set OPENAI_ENDPOINT to the address of your local server (eg."http://localhost:1234/v1")
Set CUSTOM_MODEL to the name of the model loaded in your local server.

NVIDIA API (Recommended)

NVIDIA's build.nvidia.com provides access to state-of-the-art models including:

DeepSeek R1: Excellent reasoning capabilities, perfect for research
Llama 3.1 405B: Most capable open-source model available
Nemotron 70B: NVIDIA's research-optimized model
Llama 3.1 70B: Strong general-purpose model

To get an API key:

Visit build.nvidia.com
Sign up for a free account
Generate an API key
Add it to your .env.local as NVIDIA_API_KEY

Docker

Clone the repository
Rename .env.example to .env.local and set your API keys
Run docker build -f Dockerfile
Run the Docker image:

docker compose up -d

Execute npm run docker in the docker service:

docker exec -it deep-research npm run docker

Usage

Run the research assistant:

npm start

You'll be prompted to:

Enter your research query
Specify research breadth (recommended: 3-10, default: 4)
Specify research depth (recommended: 1-5, default: 2)
Answer follow-up questions to refine the research direction

The system will then:

Generate and execute search queries
Process and analyze search results
Recursively explore deeper based on findings
Generate a comprehensive markdown report

The final report will be saved as report.md or answer.md in your working directory, depending on which modes you selected.

Concurrency

If you have a paid version of Firecrawl or a local version, feel free to increase the ConcurrencyLimit by setting the CONCURRENCY_LIMIT environment variable so it runs faster.

If you have a free version, you may sometimes run into rate limit errors, you can reduce the limit to 1 (but it will run a lot slower).

DeepSeek R1

Deep research performs great on R1! You can access DeepSeek R1 through two providers:

NVIDIA (Recommended)

NVIDIA_API_KEY="your_nvidia_api_key"

Fireworks AI

FIREWORKS_KEY="your_fireworks_api_key"

The system will automatically use Fireworks R1 if both keys are present (higher priority).

Alternative Providers

For other OpenAI-compatible APIs or local models, you can use these environment variables:

OPENAI_ENDPOINT="custom_endpoint"
CUSTOM_MODEL="custom_model"

These will take the highest priority if set.

How It Works

Initial Setup
- Takes user query and research parameters (breadth & depth)
- Generates follow-up questions to understand research needs better
Deep Research Process
- Generates multiple SERP queries based on research goals
- Processes search results to extract key learnings
- Generates follow-up research directions
Recursive Exploration
- If depth > 0, takes new research directions and continues exploration
- Each iteration builds on previous learnings
- Maintains context of research goals and findings
Report Generation
- Compiles all findings into a comprehensive markdown report
- Includes all sources and references
- Organizes information in a clear, readable format

License

MIT License - feel free to use and modify as needed.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
.nvmrc		.nvmrc
.prettierignore		.prettierignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
prettier.config.mjs		prettier.config.mjs
report.md		report.md
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open Deep Research

How It Works

Features

Requirements

Setup

Node.js

Model Priority

NVIDIA API (Recommended)

Docker

Usage

Concurrency

DeepSeek R1

NVIDIA (Recommended)

Fireworks AI

Alternative Providers

How It Works

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Open Deep Research

How It Works

Features

Requirements

Setup

Node.js

Model Priority

NVIDIA API (Recommended)

Docker

Usage

Concurrency

DeepSeek R1

NVIDIA (Recommended)

Fireworks AI

Alternative Providers

How It Works

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages