Skip to content

Refactored retry config into _retry.py and added support for exponential backoff and Retry-After header #871

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: fcm-http2
Choose a base branch
from

Conversation

jonathanedey
Copy link
Contributor

@jonathanedey jonathanedey commented Apr 14, 2025

Adds more retry logic support

This PR covers:

  • Moving and refactoring HttpxRetryTransport to _retry.py
  • Adding new class HttpxRetry to manage retry state
  • Adding support for setting some retry configurations on HttpxRetryTransport creation
  • Performing exponential backoff before request retries
  • Respecting Retry-After headers from request responses
  • Unit tests for HttpxRetry and HttpxRetryTransport
  • Unit test for handing HTTPX request errors

@jonathanedey jonathanedey added the release:stage Stage a release candidate label Apr 14, 2025
Copy link
Member

@lahirumaramba lahirumaramba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jonathan! Overall looking good! Added a few comments!

class HttpxRetry:
"""HTTPX based retry config"""
# TODO: Decide
# urllib3.Retry ignores the status_forcelist when respecting Retry-After header
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean urllib3.Retry retries other status codes (that's not in status_forcelist) with Retry-After headers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I worded that comment poorly so you can disregard it.

What I wanted to point out that in urllib3.Retry there is no default status codes that are retried if status_forcelist and respect_retry_after_header both are not set. These RETRY_AFTER_STATUS_CODES are only applied if respect_retry_after_header is set.

I decided to do the same in this implementation but still wanted to highlight it because the urllib3 docs were misleading and implied these are used as status_forcelist defaults.

return False

# Placeholder for exception retrying
def is_retryable_error(self, error: Exception):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is best not to leave placeholders in PRs if we can. If this is currently a no-op we can remove it and add it when we need it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, removed!

self.history.append((request, response, error))


# TODO: Remove comments
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove these. Or add a comment on what it doesn't cover as a TODO for future. I prefer we use a bug/issue to track these instead of TODO comments when possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, these todos were mainly markers so as to not miss removing some thought process comments before final merge.

Moving the missing features notes to a bugs.


def __init__(
self,
status: int = 10,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to call this max_retries instead?

Copy link
Contributor Author

@jonathanedey jonathanedey May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current state yes because status retries are our only retries we have. This was more of a future proof decision for when we added error retires where we would have total, status and error counters.

Similar to the placeholder comment I think we can use max_retries (and retries_left internally since we decrease this value) and add the other options later.

response, error = None, None

try:
logger.debug('Sending request in _dispatch_with_retry(): %r', request)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we plan to keep these logs in the production code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would want to keep these for future debugging. This would be helpful to catch issues as we iterate. Wdyt?

if error and not retry.is_retryable_error(error):
raise error

retry.increment(request, response)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we pass the error here? retry.increment(request, response, error)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch! Fixed

return response
if error:
raise error
raise Exception('_dispatch_with_retry() ended with no response or exception')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's narrow this down to an Exception type RuntimeError or AssertionError might work better here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with AssertionError since this case should never be reached if the logic above is correct.


logger = logging.getLogger(__name__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to set the log level in production code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Developers should be able to set the logging level in their production apps by using the following code:

import logging

logging.basicConfig()
firebase_admin_logger = logging.getLogger('firebase_admin')
firebase_admin_logger.setLevel(logging.DEBUG)

# Simulate attempt 6 completed
retry.increment(self._TEST_REQUEST, response)
# History len 6, attempt 7 -> 0.5*(2^4) = 10.0
assert retry.get_backoff_time() == pytest.approx(10.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be 0.5 * (2^5) = 16.0? and clammed at 10 due to backoff_max? If that's the case let's clarify it here :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that should be the correct calculation, fixed!

self.status_forcelist = status_forcelist
self.backoff_factor = backoff_factor
self.backoff_max = backoff_max
self.raise_on_status = raise_on_status
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If raise_on_status is unused we should just remove it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release:stage Stage a release candidate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants