Skip to content

FallbackModel and Provider/Client SDK Retry Behavior might be conflicting (no hint in pydantic-ai docs) #3267

@LysanderKie

Description

@LysanderKie

Initial Checks

Description

Summary

The FallbackModel documentation doesn't mention that underlying provider SDKs (like OpenAI SDK) (might) have built-in retry logic that can significantly delay or prevent the fallback model from being triggered. This led to unexpected behavior where rate limit errors (429) would retry for up to 60 seconds before the fallback activated, rather than immediately switching models.

Current Documentation

The current documentation states:

"By default, the FallbackModel only moves on to the next model if the current model raises a ModelHTTPError. You can customize this behavior by passing a custom fallback_on argument to the FallbackModel constructor."

And shows this example:

openai_model = OpenAIChatModel('gpt-4o')
anthropic_model = AnthropicModel('claude-3-5-sonnet-latest')
fallback_model = FallbackModel(openai_model, anthropic_model)

agent = Agent(fallback_model)
response = agent.run_sync('What is the capital of France?')

The Problem

When following this example, users may experience:

  1. Rate limit errors (429) retry for extended periods instead of immediately falling back to the secondary model
  2. Console output showing retry attempts: {"event": "Retrying request to /chat/completions in 60.000000 seconds"}
  3. Delayed fallback activation due to the OpenAI SDK's default max_retries=2 behavior

This happens because:

  • The OpenAI SDK has DEFAULT_MAX_RETRIES = 2 built-in
  • On 429 errors, it respects the Retry-After header (up to 60 seconds)
  • These retries happen before the FallbackModel ever sees the error
  • The FallbackModel only activates after all SDK-level retries are exhausted

Expected Behavior

Users expect that when using FallbackModel, rate limit errors would immediately trigger a fallback to the secondary model, not wait through multiple retry attempts.

Solution

The issue is resolved by configuring the provider client to disable retries (which is not super nice solution, but it get's the job done)

import openai

# Create OpenAI client with retries disabled
openai_client = openai.AsyncAzureOpenAI(
    api_key=settings.OPENAI_API_KEY,
    azure_endpoint=settings.OPENAI_API_BASE,
    api_version=settings.OPENAI_API_VERSION,
    max_retries=0,  # Critical: disable SDK-level retries
)

openai_model = OpenAIChatModel(
    'gpt-4o',
    provider=AzureProvider(openai_client=openai_client)
)

anthropic_model = AnthropicModel('claude-3-5-sonnet-latest')
fallback_model = FallbackModel(openai_model, anthropic_model)

Suggested Pydantic-AI Docs Improvements

I suggest adding a section to the FallbackModel documentation that covers:

  1. Provider SDK Retry Behavior: Mention that provider SDKs often have built-in retry logic
  2. Disabling Retries for Immediate Fallback: Show how to configure max_retries=0 for common providers
  3. Rate Limit Handling: Explain that rate limits may retry for extended periods if not configured properly
  4. Best Practices: Recommend disabling provider-level retries when using FallbackModel to ensure immediate fallback (if thats what is expected).

Related

This affects all provider integrations that wrap SDKs with built-in retry logic, not just OpenAI. Consider adding similar guidance for other providers in their respective documentation pages.

References

Example Code

Python, Pydantic AI & LLM client version

"pydantic-ai==1.1.0"
"pydantic==2.12.3"

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions