Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2642,7 +2642,7 @@ def create_gzip_decoder(
return CompositeRawDecoder.by_headers(
[({"Content-Encoding", "Content-Type"}, _compressed_response_types, gzip_parser)],
stream_response=True,
fallback_parser=gzip_parser.inner_parser,
fallback_parser=gzip_parser,
)
Comment on lines 2642 to 2646
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find GzipParser class and its parse method
rg -n 'class GzipParser' -A 30 --type py

Repository: airbytehq/airbyte-python-cdk

Length of output: 3227


🏁 Script executed:

# Find CompositeRawDecoder class and by_headers method
rg -n 'class CompositeRawDecoder' -A 50 --type py

Repository: airbytehq/airbyte-python-cdk

Length of output: 5560


🏁 Script executed:

# Check the by_headers implementation more thoroughly
rg -n 'def by_headers' -B 5 -A 40 --type py

Repository: airbytehq/airbyte-python-cdk

Length of output: 5320


🏁 Script executed:

cat -n airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py | sed -n '34,60p'

Repository: airbytehq/airbyte-python-cdk

Length of output: 1183


Fix looks correct — but found a pre-existing issue in GzipParser worth addressing.

The change properly ensures that when GzipDecoder is explicitly configured, decompression is always attempted regardless of response headers. Logic is sound: fallback is now the full gzip_parser instead of gzip_parser.inner_parser.

One thing flagged during verification:

GzipParser.parse() has a docstring-implementation mismatch. The docstring claims "If the data is not gzipped, reset the pointer and pass the data to the inner parser as is" (lines 38-39 in composite_raw_decoder.py), but the actual implementation (lines 45-46) just calls gzip.GzipFile(fileobj=data, mode="rb") with no error handling or pointer reset. When non-gzip data hits this, it will raise OSError (BadGzipFile) on first read, not gracefully fallback.

This is pre-existing, but your change makes it more likely to surface: with fallback_parser=gzip_parser, non-gzip data now goes through GzipParser.parse() instead of bypassing it. If a user configures GzipDecoder against an API returning plain (non-gzip) responses, they'll hit this hard crash.

Could you either:

  • Fix GzipParser.parse() to actually reset the pointer and pass plain data through (matching the docstring), or
  • Update the docstring to reflect that it will raise OSError on non-gzip input?

Pagination support is fine: CompositeRawDecoder.by_headers() returns cls(fallback_parser, ...) which sets self.parser = fallback_parser, so the decoder's parser property will correctly expose gzip_parser with its inner_parser. The pagination check should pass as expected.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py` around
lines 2642 - 2646, GzipParser.parse has a docstring that promises graceful
fallback to the inner parser for non-gzipped data but the implementation
directly wraps the input in gzip.GzipFile without handling BadGzipFile; change
GzipParser.parse to try reading/decompressing via gzip.GzipFile (or a small read
to trigger format validation) inside a try/except catching OSError/BadGzipFile,
and on exception reset the input pointer (call data.seek(0) or reopen the
original bytes stream) then call and return self.inner_parser.parse(data) so
non-gzip payloads are passed through as documented; keep raising other
unexpected exceptions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the docstring-implementation mismatch in GzipParser.parse() — that's a real pre-existing issue.

However, I'm keeping this PR scoped to the minimal fix (changing the fallback parser) rather than expanding into GzipParser changes, for a few reasons:

  1. The scenario this PR fixes: the user explicitly selects GzipDecoder because the API always returns gzip — so the fallback hitting GzipParser is correct behavior and will decompress successfully.
  2. The edge case you describe (API conditionally returning non-gzip while GzipDecoder is selected) is a separate concern that existed before this PR — the previous by_headers path already used GzipParser when headers matched, so a response without gzip encoding was already handled by the fallback (which before just skipped decompression silently, potentially producing garbage).
  3. Fixing GzipParser.parse() to gracefully handle non-gzip input is a good improvement but would change behavior for all GzipParser consumers and warrants its own review.

I'd suggest either fixing the GzipParser docstring or implementing the graceful fallback as a follow-up. Happy to defer to the human reviewer on whether they'd prefer it bundled here.


Devin session


@staticmethod
Expand Down
Loading