Fix broken companies/scroll pagination#69151
Fix broken companies/scroll pagination#69151Jacobo Blasco (iacobus) wants to merge 2 commits intoairbytehq:masterfrom
Conversation
👋 Welcome to Airbyte!Thank you for your contribution from iacobus/airbyte! We're excited to have you in the Airbyte community. Helpful Resources
PR Slash CommandsAs needed or by request, Airbyte Maintainers can execute the following slash commands on your PR:
If you have any questions, feel free to ask in the PR comments or join our Slack community. Tips for Working with CI
|
The companies/scroll API actually provides the same token in every page, so the previous mechanism of stopping pagination when the same token was observed was always limiting results to 2 pages (200 records). Instead, rely on receiving an empty `data` array, which is what occurs when the previous page was the last one. Doing that actually exposed a new issue. Every page now is requested with exactly the same URL. It appears that the HttpRequester instance is created with use_cache=True, so starting on page 3, every response is returned from cache, with always the same records from page 2, which creates an infinite loop due to never receiving an empty array. To fix this, this commit creates a NoCacheRequester that forces use_cache=False for this particular endpoint.
3245546 to
cfb4034
Compare
|
Hi team Airbyte, Please have a look at this fix for the currently broken The fix is tested end to end on a local installation, plus using Docker to execute the source in isolation. My guess is that the infinite-pagination behavior addressed by turning off HttpRequester cache is related to #58638. I'm not sure if other Intercom APIs behave like this companies/scroll. This is the one that we perceived as obviously broken in our Cloud production usage of Airbyte. Happy to support getting this across the line as soon as possible since it's blocking one of our product features. |
|
Jacobo Blasco (@iacobus) hi, thank you for your contribution. Could you please sign our Contributor License Agreement before we can start reviewing your PR? We will run Integration tests after this Also please take a look at this failure. |
6499272 to
a9b09f0
Compare
|
/run-connector-tests
|
|
Hey Danylo Jablonski (@DanyloGL), got a new issue here. Starting on November 13th, 2025, the Strangely, this PR's fix still works under Connector Builder, but fails in a real sync. I can't tell at this point if my workflow running local Airbyte is somehow preventing this to work, or if the connector is completely broken for everybody due to that signature change. Please let me know? I'll try to see how to patch it in the meantime so that it resumes working. |
|
OK, I have a working patch for the new problem. Editing Please advise with next steps. Should I add a 1-argument-only version of this patch to this PR? |
|
Alfredo Garcia (@agarctfi) any chance that we can somehow move forward here? The integration is broken right now for the companies. |
|
nilzzzzzz Apologies for the delay in getting this over the finish line. We've since pushed these updates to help solve this issue within the CDK: It is being tested in progressive rollouts, and we found another follow-up issue that needs to be solved before it is released to everyone. It is intermittent and happens when concurrency causes the Eventually, the syncs do finish, but ideally, we can account for this, which is the follow-up work that we are finishing up. The work for that is being done in these PRs: From my understanding, we expect to have this work completed sometime early/mid next week. If you are an Airbyte Cloud user, we can also add you to the progressive rollout so your syncs benefit from these changes sooner. If you do, please create a support ticket asking to be added to the If you are on OSS, you should also be able to edit the Docker image tag directly under Workspace Settings -> Sources and change it to |
Thank you for taking the time to give such a detailed explanation and a reasonable workaround! Highly appreciated! |
What
The companies sync is currently only syncing as many as 200 results due to broken pagination. The previous code made the incorrect assumption that each page would return a different token, but the behavior is the opposite.
How
The companies/scroll API actually provides the same token in every page, so the previous mechanism of stopping pagination when the same token was observed was always limiting results to 2 pages (200 records). Instead, rely on receiving an empty
dataarray, which is what occurs when the previous page was the last one.Doing that actually exposed a new issue. Every page now is requested with exactly the same URL. It appears that the HttpRequester instance is created with use_cache=True, so starting on page 3, every response is returned from cache, with always the same records from page 2, which creates an infinite loop due to never receiving an empty array. To fix this, this commit creates a NoCacheRequester that forces use_cache=False for this particular endpoint.
I tested these changes running Airbyte locally (using
abctl), plus executing the source in isolation via Docker command.Review guide
The description above seems sufficient to understand the problem and the solution. The Intercom docs describe the behavior of the scroll parameter well too.
User Impact
This fixes the companies sync, which is otherwise limited to 200 results (broken).
Can this PR be safely reverted and rolled back?