-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Initial Checks
- I confirm that I'm using the latest version of Pydantic AI
- I confirm that I searched for my issue in https://github.com/pydantic/pydantic-ai/issues before opening this issue
Description
Summary
The FallbackModel documentation doesn't mention that underlying provider SDKs (like OpenAI SDK) (might) have built-in retry logic that can significantly delay or prevent the fallback model from being triggered. This led to unexpected behavior where rate limit errors (429) would retry for up to 60 seconds before the fallback activated, rather than immediately switching models.
Current Documentation
The current documentation states:
"By default, the FallbackModel only moves on to the next model if the current model raises a ModelHTTPError. You can customize this behavior by passing a custom
fallback_onargument to theFallbackModelconstructor."
And shows this example:
openai_model = OpenAIChatModel('gpt-4o')
anthropic_model = AnthropicModel('claude-3-5-sonnet-latest')
fallback_model = FallbackModel(openai_model, anthropic_model)
agent = Agent(fallback_model)
response = agent.run_sync('What is the capital of France?')The Problem
When following this example, users may experience:
- Rate limit errors (429) retry for extended periods instead of immediately falling back to the secondary model
- Console output showing retry attempts:
{"event": "Retrying request to /chat/completions in 60.000000 seconds"} - Delayed fallback activation due to the OpenAI SDK's default
max_retries=2behavior
This happens because:
- The OpenAI SDK has
DEFAULT_MAX_RETRIES = 2built-in - On 429 errors, it respects the
Retry-Afterheader (up to 60 seconds) - These retries happen before the
FallbackModelever sees the error - The
FallbackModelonly activates after all SDK-level retries are exhausted
Expected Behavior
Users expect that when using FallbackModel, rate limit errors would immediately trigger a fallback to the secondary model, not wait through multiple retry attempts.
Solution
The issue is resolved by configuring the provider client to disable retries (which is not super nice solution, but it get's the job done)
import openai
# Create OpenAI client with retries disabled
openai_client = openai.AsyncAzureOpenAI(
api_key=settings.OPENAI_API_KEY,
azure_endpoint=settings.OPENAI_API_BASE,
api_version=settings.OPENAI_API_VERSION,
max_retries=0, # Critical: disable SDK-level retries
)
openai_model = OpenAIChatModel(
'gpt-4o',
provider=AzureProvider(openai_client=openai_client)
)
anthropic_model = AnthropicModel('claude-3-5-sonnet-latest')
fallback_model = FallbackModel(openai_model, anthropic_model)Suggested Pydantic-AI Docs Improvements
I suggest adding a section to the FallbackModel documentation that covers:
- Provider SDK Retry Behavior: Mention that provider SDKs often have built-in retry logic
- Disabling Retries for Immediate Fallback: Show how to configure
max_retries=0for common providers - Rate Limit Handling: Explain that rate limits may retry for extended periods if not configured properly
- Best Practices: Recommend disabling provider-level retries when using
FallbackModelto ensure immediate fallback (if thats what is expected).
Related
This affects all provider integrations that wrap SDKs with built-in retry logic, not just OpenAI. Consider adding similar guidance for other providers in their respective documentation pages.
References
Example Code
Python, Pydantic AI & LLM client version
"pydantic-ai==1.1.0"
"pydantic==2.12.3"