Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
205 changes: 205 additions & 0 deletions docs/troubleshooting/serper-403-error.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
# SERPER API 403 Forbidden Error - Troubleshooting Guide

## What is the Error?

The error you're seeing is:
```
error="403, message='Forbidden', url='https://google.serper.dev/search'"
Serper API request failed: 403, message='Forbidden'
```

A **403 Forbidden** HTTP status code from the Serper API means your request was rejected due to authentication/authorization issues.

## Root Causes

### 1. **Invalid or Missing API Key** (Most Common)
- The `SERPER_API_KEY` environment variable is either:
- Not set
- Set to an invalid/expired key
- Contains extra whitespace or formatting issues
- Typo in the key value

### 2. **API Key Permissions**
- The API key doesn't have permission to access the Serper API
- The key might be for a different Serper service/endpoint

### 3. **Account/Billing Issues**
- Your Serper account might be:
- Suspended
- Over quota/limit
- Has billing issues (payment failed)
- Account expired

### 4. **API Key Format Issues**
- The key might be malformed
- Missing characters
- Wrong key type (e.g., using a test key in production)

## How to Diagnose

### Step 1: Verify API Key is Set

Check if the environment variable is set:

**Linux/Mac:**
```bash
echo $SERPER_API_KEY
```

**Windows (PowerShell):**
```powershell
$env:SERPER_API_KEY
```

**Windows (CMD):**
```cmd
echo %SERPER_API_KEY%
```

### Step 2: Test the API Key Directly

Test your Serper API key using curl:

```bash
curl -X POST https://google.serper.dev/search \
-H "X-API-KEY: YOUR_API_KEY_HERE" \
-H "Content-Type: application/json" \
-d '{"q": "test query"}'
```

If you get a 403, the key is invalid. If you get 200, the key works.

### Step 3: Check Serper Dashboard

1. Go to [Serper.dev Dashboard](https://serper.dev/dashboard)
2. Log in to your account
3. Check:
- API key status
- Usage/quota limits
- Billing status
- Account status

## Solutions

### Solution 1: Get a Valid Serper API Key

1. **Sign up for Serper:**
- Visit [serper.dev](https://serper.dev)
- Create an account or log in

2. **Get your API key:**
- Go to Dashboard → API Keys
- Copy your API key

3. **Set the environment variable:**

**Linux/Mac:**
```bash
export SERPER_API_KEY="your_api_key_here"
```

**Windows (PowerShell):**
```powershell
$env:SERPER_API_KEY = "your_api_key_here"
```

**Windows (CMD):**
```cmd
set SERPER_API_KEY=your_api_key_here
```

**For Hugging Face Spaces:**
- Go to your Space settings
- Add `SERPER_API_KEY` as a secret/environment variable

4. **Verify it's set correctly:**
```bash
# Check for whitespace issues
echo "|$SERPER_API_KEY|" # Should show key between pipes with no extra spaces
```

### Solution 2: Use a Different Search Provider (Temporary Fix)

If you can't fix the Serper API key immediately, switch to DuckDuckGo (free, no API key required):

**Set environment variable:**
```bash
export WEB_SEARCH_PROVIDER="duckduckgo"
```

Or in your `.env` file:
```env
WEB_SEARCH_PROVIDER=duckduckgo
```

**Note:** DuckDuckGo provides lower quality results (snippets only, no full content scraping) but will work without an API key.

### Solution 3: Check Account Status

1. Log into [Serper Dashboard](https://serper.dev/dashboard)
2. Verify:
- Account is active
- No billing issues
- Quota not exceeded
- API key is not revoked

### Solution 4: Regenerate API Key

If your key might be compromised or invalid:

1. Go to Serper Dashboard
2. Revoke the old key
3. Generate a new API key
4. Update your environment variable

## Code Improvements (Optional)

The current code doesn't handle 403 errors specifically. Here's what could be improved:

### Current Behavior
- 403 errors are treated as generic `SearchError`
- System retries 3 times (wasteful - 403 won't fix itself)
- No automatic fallback to other providers

### Recommended Improvements

1. **Detect 403 specifically** and treat as configuration error (not transient)
2. **Disable Serper** after 403 and fall back to DuckDuckGo
3. **Log clearer error messages** with troubleshooting hints

## Quick Fix for Your Current Issue

**Immediate workaround:**

1. **Remove or comment out SERPER_API_KEY:**
```bash
unset SERPER_API_KEY
```

2. **Set provider to DuckDuckGo:**
```bash
export WEB_SEARCH_PROVIDER="duckduckgo"
```

3. **Restart your application**

This will use DuckDuckGo instead of Serper, allowing your research to continue (though with lower quality results).

## Prevention

1. **Validate API keys at startup** - Check if key works before using it
2. **Use environment variable validation** - Ensure key format is correct
3. **Monitor API usage** - Set up alerts for quota limits
4. **Have fallback providers** - Always have DuckDuckGo as backup

## Summary

- **403 Forbidden** = Invalid/missing API key or account issues
- **Fix:** Get valid Serper API key from [serper.dev](https://serper.dev)
- **Workaround:** Use `WEB_SEARCH_PROVIDER=duckduckgo` for free search
- **Test:** Use curl to verify your API key works
- **Check:** Serper dashboard for account/billing status




143 changes: 143 additions & 0 deletions docs/troubleshooting/serper-free-tier-optimization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# Serper Free Tier Optimization

## Problem

Free Serper API keys have limited credits:
- **2,500 credits** (one-time, expire after 6 months)
- **100 requests/second** rate limit
- Each successful API query consumes 1 credit (or 2 if requesting >10 results)

When credits are exhausted, Serper returns **403 Forbidden** errors. The application was treating all 403 errors as invalid keys and failing immediately, without retries.

## Solution

We've implemented several optimizations to better handle free tier quotas:

### 1. Proper Rate Limiting

**Before:** 10 requests/second (below free tier limit but not optimized)

**After:** 90 requests/second (safely under 100/second free tier limit)

```python
# src/tools/rate_limiter.py
def get_serper_limiter(api_key: str | None = None) -> RateLimiter:
# Free tier: 90/second (safely under 100/second limit)
return RateLimiterFactory.get("serper", "90/second")
```

This ensures:
- Stays safely under 100 requests/second free tier limit
- Allows high throughput when needed
- Credits are the limiting factor (2,500 total), not rate

### 2. Jitter for Request Spreading

Added random jitter (0-1 second) after acquiring rate limit permission:

```python
# src/tools/rate_limiter.py
async def acquire(self, wait: bool = True, jitter: bool = False) -> bool:
if self._limiter.hit(self._rate_limit, self._identity):
if jitter:
# Add 0-1 second random jitter
jitter_seconds = random.uniform(0, 1.0)
await asyncio.sleep(jitter_seconds)
return True
```

**Benefits:**
- Prevents "thundering herd" - multiple parallel requests hitting at once
- Spreads load slightly to avoid bursts
- Minimal delay (max 1 second)

### 3. Retry Logic for Credit Exhaustion

**Before:** 403 errors → `ConfigurationError` → No retries

**After:** 403 errors → `RateLimitError` → Retry with exponential backoff

```python
# src/tools/vendored/serper_client.py
if response.status == 403:
# Treat as credit exhaustion (retryable) not invalid key
raise RateLimitError("Serper API credits may be exhausted...")
```

```python
# src/tools/serper_web_search.py
@retry(
stop=stop_after_attempt(5), # 5 retries
wait=wait_random_exponential(
multiplier=2, min=5, max=120, exp_base=2
), # 5s to 120s backoff with jitter
)
```

**Retry Schedule:**
- Attempt 1: Immediate
- Attempt 2: Wait 5-10s (with jitter)
- Attempt 3: Wait 10-20s (with jitter)
- Attempt 4: Wait 20-40s (with jitter)
- Attempt 5: Wait 40-80s (with jitter)

### 4. Better Error Messages

The system now distinguishes between:
- **Invalid API key** (ConfigurationError) - No retries, immediate failure
- **Quota exhaustion** (RateLimitError) - Retries with backoff

## Usage

The optimizations are **automatic** - no configuration needed. The system will:

1. **Rate limit** to 1 request per 60 seconds
2. **Add jitter** to spread requests
3. **Retry on 403** with exponential backoff
4. **Log clearly** when quota is exhausted

## Monitoring

Watch for these log messages:

```
Serper API returned 403 Forbidden
May be quota exhaustion (free tier) or invalid key
Retrying with backoff...
```

If you see repeated 403 errors even after retries, your quota is likely exhausted for the day/month.

## Recommendations for Free Tier

1. **Monitor usage** at [serper.dev/dashboard](https://serper.dev/dashboard)
2. **Use DuckDuckGo fallback** when quota exhausted:
```bash
export WEB_SEARCH_PROVIDER=duckduckgo
```
3. **Consider paid tier** if you need more requests
4. **Batch requests** - the rate limiter will automatically space them out

## Configuration

To adjust rate limiting (not recommended for free tier):

```python
# In src/tools/rate_limiter.py
# Change from:
return RateLimiterFactory.get("serper", "1/60second")

# To (for paid tier):
return RateLimiterFactory.get("serper", "10/second")
```

## Summary

✅ **Proper rate limiting** (90 req/s, under 100/s limit)
✅ **Jitter** to spread requests (0-1s)
✅ **Retry logic** for credit exhaustion
✅ **Better error handling**
✅ **Automatic** - no config needed

**Note:** Free tier provides 2,500 credits (one-time, expire after 6 months). Monitor usage at [serper.dev/dashboard](https://serper.dev/dashboard). Once credits are exhausted, you'll need to upgrade to a paid plan or use DuckDuckGo fallback.

19 changes: 19 additions & 0 deletions src/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,12 +81,31 @@ def configure_orchestrator(
Returns:
Tuple of (orchestrator, backend_info_string)
"""
from src.tools.clinicaltrials import ClinicalTrialsTool
from src.tools.europepmc import EuropePMCTool
from src.tools.neo4j_search import Neo4jSearchTool
from src.tools.pubmed import PubMedTool
from src.tools.search_handler import SearchHandler
from src.tools.web_search_factory import create_web_search_tool

# Create search handler with tools
tools = []

# Add biomedical search tools (always available, no API keys required)
tools.append(PubMedTool())
logger.info("PubMed tool added to search handler")

tools.append(ClinicalTrialsTool())
logger.info("ClinicalTrials tool added to search handler")

tools.append(EuropePMCTool())
logger.info("EuropePMC tool added to search handler")

# Add Neo4j knowledge graph search tool (if Neo4j is configured)
neo4j_tool = Neo4jSearchTool()
tools.append(neo4j_tool)
logger.info("Neo4j search tool added to search handler")

# Add web search tool
web_search_tool = create_web_search_tool(provider=web_search_provider or "auto")
if web_search_tool:
Expand Down
2 changes: 1 addition & 1 deletion src/orchestrator/research_flow.py
Original file line number Diff line number Diff line change
Expand Up @@ -522,7 +522,7 @@ def _get_rag_service(self) -> LlamaIndexRAGService | None:
"""
if self._rag_service is None:
try:
self._rag_service = get_rag_service()
self._rag_service = get_rag_service(oauth_token=self.oauth_token)
self.logger.info("RAG service initialized for research flow")
except (ConfigurationError, ImportError) as e:
self.logger.warning(
Expand Down
Loading
Loading