|
| 1 | +--- |
| 2 | +name: gpt-researcher |
| 3 | +description: GPT Researcher is an autonomous deep research agent that conducts web and local research, producing detailed reports with citations. Use this skill when helping developers understand, extend, debug, or integrate with GPT Researcher - including adding features, understanding the architecture, working with the API, customizing research workflows, adding new retrievers, integrating MCP data sources, or troubleshooting research pipelines. |
| 4 | +--- |
| 5 | + |
| 6 | +# GPT Researcher Development Skill |
| 7 | + |
| 8 | +GPT Researcher is an LLM-based autonomous agent using a planner-executor-publisher pattern with parallelized agent work for speed and reliability. |
| 9 | + |
| 10 | +## Quick Start |
| 11 | + |
| 12 | +### Basic Python Usage |
| 13 | + |
| 14 | +```python |
| 15 | +from gpt_researcher import GPTResearcher |
| 16 | +import asyncio |
| 17 | + |
| 18 | +async def main(): |
| 19 | + researcher = GPTResearcher( |
| 20 | + query="What are the latest AI developments?", |
| 21 | + report_type="research_report", # or detailed_report, deep, outline_report |
| 22 | + report_source="web", # or local, hybrid |
| 23 | + ) |
| 24 | + await researcher.conduct_research() |
| 25 | + report = await researcher.write_report() |
| 26 | + print(report) |
| 27 | + |
| 28 | +asyncio.run(main()) |
| 29 | +``` |
| 30 | + |
| 31 | +### Run Servers |
| 32 | + |
| 33 | +```bash |
| 34 | +# Backend |
| 35 | +python -m uvicorn backend.server.server:app --reload --port 8000 |
| 36 | + |
| 37 | +# Frontend |
| 38 | +cd frontend/nextjs && npm install && npm run dev |
| 39 | +``` |
| 40 | + |
| 41 | +--- |
| 42 | + |
| 43 | +## Key File Locations |
| 44 | + |
| 45 | +| Need | Primary File | Key Classes | |
| 46 | +|------|--------------|-------------| |
| 47 | +| Main orchestrator | `gpt_researcher/agent.py` | `GPTResearcher` | |
| 48 | +| Research logic | `gpt_researcher/skills/researcher.py` | `ResearchConductor` | |
| 49 | +| Report writing | `gpt_researcher/skills/writer.py` | `ReportGenerator` | |
| 50 | +| All prompts | `gpt_researcher/prompts.py` | `PromptFamily` | |
| 51 | +| Configuration | `gpt_researcher/config/config.py` | `Config` | |
| 52 | +| Config defaults | `gpt_researcher/config/variables/default.py` | `DEFAULT_CONFIG` | |
| 53 | +| API server | `backend/server/app.py` | FastAPI `app` | |
| 54 | +| Search engines | `gpt_researcher/retrievers/` | Various retrievers | |
| 55 | + |
| 56 | +--- |
| 57 | + |
| 58 | +## Architecture Overview |
| 59 | + |
| 60 | +``` |
| 61 | +User Query → GPTResearcher.__init__() |
| 62 | + │ |
| 63 | + ▼ |
| 64 | + choose_agent() → (agent_type, role_prompt) |
| 65 | + │ |
| 66 | + ▼ |
| 67 | + ResearchConductor.conduct_research() |
| 68 | + ├── plan_research() → sub_queries |
| 69 | + ├── For each sub_query: |
| 70 | + │ └── _process_sub_query() → context |
| 71 | + └── Aggregate contexts |
| 72 | + │ |
| 73 | + ▼ |
| 74 | + [Optional] ImageGenerator.plan_and_generate_images() |
| 75 | + │ |
| 76 | + ▼ |
| 77 | + ReportGenerator.write_report() → Markdown report |
| 78 | +``` |
| 79 | + |
| 80 | +**For detailed architecture diagrams**: See [references/architecture.md](references/architecture.md) |
| 81 | + |
| 82 | +--- |
| 83 | + |
| 84 | +## Core Patterns |
| 85 | + |
| 86 | +### Adding a New Feature (8-Step Pattern) |
| 87 | + |
| 88 | +1. **Config** → Add to `gpt_researcher/config/variables/default.py` |
| 89 | +2. **Provider** → Create in `gpt_researcher/llm_provider/my_feature/` |
| 90 | +3. **Skill** → Create in `gpt_researcher/skills/my_feature.py` |
| 91 | +4. **Agent** → Integrate in `gpt_researcher/agent.py` |
| 92 | +5. **Prompts** → Update `gpt_researcher/prompts.py` |
| 93 | +6. **WebSocket** → Events via `stream_output()` |
| 94 | +7. **Frontend** → Handle events in `useWebSocket.ts` |
| 95 | +8. **Docs** → Create `docs/docs/gpt-researcher/gptr/my_feature.md` |
| 96 | + |
| 97 | +**For complete feature addition guide with Image Generation case study**: See [references/adding-features.md](references/adding-features.md) |
| 98 | + |
| 99 | +### Adding a New Retriever |
| 100 | + |
| 101 | +```python |
| 102 | +# 1. Create: gpt_researcher/retrievers/my_retriever/my_retriever.py |
| 103 | +class MyRetriever: |
| 104 | + def __init__(self, query: str, headers: dict = None): |
| 105 | + self.query = query |
| 106 | + |
| 107 | + async def search(self, max_results: int = 10) -> list[dict]: |
| 108 | + # Return: [{"title": str, "href": str, "body": str}] |
| 109 | + pass |
| 110 | + |
| 111 | +# 2. Register in gpt_researcher/actions/retriever.py |
| 112 | +case "my_retriever": |
| 113 | + from gpt_researcher.retrievers.my_retriever import MyRetriever |
| 114 | + return MyRetriever |
| 115 | + |
| 116 | +# 3. Export in gpt_researcher/retrievers/__init__.py |
| 117 | +``` |
| 118 | + |
| 119 | +**For complete retriever documentation**: See [references/retrievers.md](references/retrievers.md) |
| 120 | + |
| 121 | +--- |
| 122 | + |
| 123 | +## Configuration |
| 124 | + |
| 125 | +Config keys are **lowercased** when accessed: |
| 126 | + |
| 127 | +```python |
| 128 | +# In default.py: "SMART_LLM": "gpt-4o" |
| 129 | +# Access as: self.cfg.smart_llm # lowercase! |
| 130 | +``` |
| 131 | + |
| 132 | +Priority: Environment Variables → JSON Config File → Default Values |
| 133 | + |
| 134 | +**For complete configuration reference**: See [references/config-reference.md](references/config-reference.md) |
| 135 | + |
| 136 | +--- |
| 137 | + |
| 138 | +## Common Integration Points |
| 139 | + |
| 140 | +### WebSocket Streaming |
| 141 | + |
| 142 | +```python |
| 143 | +class WebSocketHandler: |
| 144 | + async def send_json(self, data): |
| 145 | + print(f"[{data['type']}] {data.get('output', '')}") |
| 146 | + |
| 147 | +researcher = GPTResearcher(query="...", websocket=WebSocketHandler()) |
| 148 | +``` |
| 149 | + |
| 150 | +### MCP Data Sources |
| 151 | + |
| 152 | +```python |
| 153 | +researcher = GPTResearcher( |
| 154 | + query="Open source AI projects", |
| 155 | + mcp_configs=[{ |
| 156 | + "name": "github", |
| 157 | + "command": "npx", |
| 158 | + "args": ["-y", "@modelcontextprotocol/server-github"], |
| 159 | + "env": {"GITHUB_TOKEN": os.getenv("GITHUB_TOKEN")} |
| 160 | + }], |
| 161 | + mcp_strategy="deep", # or "fast", "disabled" |
| 162 | +) |
| 163 | +``` |
| 164 | + |
| 165 | +**For MCP integration details**: See [references/mcp.md](references/mcp.md) |
| 166 | + |
| 167 | +### Deep Research Mode |
| 168 | + |
| 169 | +```python |
| 170 | +researcher = GPTResearcher( |
| 171 | + query="Comprehensive analysis of quantum computing", |
| 172 | + report_type="deep", # Triggers recursive tree-like exploration |
| 173 | +) |
| 174 | +``` |
| 175 | + |
| 176 | +**For deep research configuration**: See [references/deep-research.md](references/deep-research.md) |
| 177 | + |
| 178 | +--- |
| 179 | + |
| 180 | +## Error Handling |
| 181 | + |
| 182 | +Always use graceful degradation in skills: |
| 183 | + |
| 184 | +```python |
| 185 | +async def execute(self, ...): |
| 186 | + if not self.is_enabled(): |
| 187 | + return [] # Don't crash |
| 188 | + |
| 189 | + try: |
| 190 | + result = await self.provider.execute(...) |
| 191 | + return result |
| 192 | + except Exception as e: |
| 193 | + await stream_output("logs", "error", f"⚠️ {e}", self.websocket) |
| 194 | + return [] # Graceful degradation |
| 195 | +``` |
| 196 | + |
| 197 | +--- |
| 198 | + |
| 199 | +## Critical Gotchas |
| 200 | + |
| 201 | +| ❌ Mistake | ✅ Correct | |
| 202 | +|-----------|-----------| |
| 203 | +| `config.MY_VAR` | `config.my_var` (lowercased) | |
| 204 | +| Editing pip-installed package | `pip install -e .` | |
| 205 | +| Forgetting async/await | All research methods are async | |
| 206 | +| `websocket.send_json()` on None | Check `if websocket:` first | |
| 207 | +| Not registering retriever | Add to `retriever.py` match statement | |
| 208 | + |
| 209 | +--- |
| 210 | + |
| 211 | +## Reference Documentation |
| 212 | + |
| 213 | +| Topic | File | |
| 214 | +|-------|------| |
| 215 | +| System architecture & diagrams | [references/architecture.md](references/architecture.md) | |
| 216 | +| Core components & signatures | [references/components.md](references/components.md) | |
| 217 | +| Research flow & data flow | [references/flows.md](references/flows.md) | |
| 218 | +| Prompt system | [references/prompts.md](references/prompts.md) | |
| 219 | +| Retriever system | [references/retrievers.md](references/retrievers.md) | |
| 220 | +| MCP integration | [references/mcp.md](references/mcp.md) | |
| 221 | +| Deep research mode | [references/deep-research.md](references/deep-research.md) | |
| 222 | +| Multi-agent system | [references/multi-agents.md](references/multi-agents.md) | |
| 223 | +| Adding features guide | [references/adding-features.md](references/adding-features.md) | |
| 224 | +| Advanced patterns | [references/advanced-patterns.md](references/advanced-patterns.md) | |
| 225 | +| REST & WebSocket API | [references/api-reference.md](references/api-reference.md) | |
| 226 | +| Configuration variables | [references/config-reference.md](references/config-reference.md) | |
0 commit comments