Skip to content

Comments

Fix broken companies/scroll pagination#69151

Open
Jacobo Blasco (iacobus) wants to merge 2 commits intoairbytehq:masterfrom
iacobus:bugfix/intercom-companies-scroll
Open

Fix broken companies/scroll pagination#69151
Jacobo Blasco (iacobus) wants to merge 2 commits intoairbytehq:masterfrom
iacobus:bugfix/intercom-companies-scroll

Conversation

@iacobus
Copy link

What

The companies sync is currently only syncing as many as 200 results due to broken pagination. The previous code made the incorrect assumption that each page would return a different token, but the behavior is the opposite.

How

The companies/scroll API actually provides the same token in every page, so the previous mechanism of stopping pagination when the same token was observed was always limiting results to 2 pages (200 records). Instead, rely on receiving an empty data array, which is what occurs when the previous page was the last one.

Doing that actually exposed a new issue. Every page now is requested with exactly the same URL. It appears that the HttpRequester instance is created with use_cache=True, so starting on page 3, every response is returned from cache, with always the same records from page 2, which creates an infinite loop due to never receiving an empty array. To fix this, this commit creates a NoCacheRequester that forces use_cache=False for this particular endpoint.

I tested these changes running Airbyte locally (using abctl), plus executing the source in isolation via Docker command.

Review guide

The description above seems sufficient to understand the problem and the solution. The Intercom docs describe the behavior of the scroll parameter well too.

User Impact

This fixes the companies sync, which is otherwise limited to 200 results (broken).

Can this PR be safely reverted and rolled back?

  • YES 💚
  • NO ❌

@CLAassistant
Copy link

CLAassistant commented Nov 3, 2025

CLA assistant check
All committers have signed the CLA.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 3, 2025

👋 Welcome to Airbyte!

Thank you for your contribution from iacobus/airbyte! We're excited to have you in the Airbyte community.

Helpful Resources

PR Slash Commands

As needed or by request, Airbyte Maintainers can execute the following slash commands on your PR:

  • /format-fix - Fixes most formatting issues.
  • /bump-version - Bumps connector versions.
  • /run-connector-tests - Runs connector tests.
  • /run-cat-tests - Runs CAT tests.
  • /build-connector-images - Builds and publishes a pre-release docker image for the modified connector(s).

If you have any questions, feel free to ask in the PR comments or join our Slack community.

Tips for Working with CI

  1. Pre-Release Checks. Please pay attention to these, as they contain standard checks on the metadata.yaml file, docs requirements, etc. If you need help resolving a pre-release check, please ask a maintainer.
    • Note: If you are creating a new connector, please be sure to replace the default logo.svg file with a suitable icon.
  2. Connector CI Tests. Some failures here may be expected if your tests require credentials. Please review these results to ensure (1) unit tests are passing, if applicable, and (2) integration tests pass to the degree possible and expected.
  3. (Optional.) BYO Connector Credentials for tests in your fork. You can optionally set up your fork with BYO credentials for your connector. This can significantly speed up your review, ensuring your changes are fully tested before the maintainers begin their review.

📝 Edit this welcome message.

The companies/scroll API actually provides the same token
in every page, so the previous mechanism of stopping pagination
when the same token was observed was always limiting results to
2 pages (200 records). Instead, rely on receiving an empty `data`
array, which is what occurs when the previous page was the last one.

Doing that actually exposed a new issue. Every page now is requested
with exactly the same URL. It appears that the HttpRequester instance
is created with use_cache=True, so starting on page 3, every response
is returned from cache, with always the same records from page 2, which
creates an infinite loop due to never receiving an empty array. To fix
this, this commit creates a NoCacheRequester that forces use_cache=False
for this particular endpoint.
@iacobus
Copy link
Author

Hi team Airbyte,

Please have a look at this fix for the currently broken companies sync in source-intercom. I'm honestly not sure what's going on with the integration tests, nor I feel confident tweaking versions.

The fix is tested end to end on a local installation, plus using Docker to execute the source in isolation.

My guess is that the infinite-pagination behavior addressed by turning off HttpRequester cache is related to #58638.

I'm not sure if other Intercom APIs behave like this companies/scroll. This is the one that we perceived as obviously broken in our Cloud production usage of Airbyte.

Happy to support getting this across the line as soon as possible since it's blocking one of our product features.

@DanyloGL
Copy link
Collaborator

Jacobo Blasco (@iacobus) hi, thank you for your contribution. Could you please sign our Contributor License Agreement before we can start reviewing your PR? We will run Integration tests after this

Also please take a look at this failure.
❌ Failed - Connector Version Increment Check: The dockerImageTag in metadata.yaml was not incremented. Master version is 0.13.13, current version is 0.13.13.

@iacobus Jacobo Blasco (iacobus) force-pushed the bugfix/intercom-companies-scroll branch from 6499272 to a9b09f0 Compare November 10, 2025 09:33
@DanyloGL
Copy link
Collaborator

Danylo Jablonski (DanyloGL) commented Nov 10, 2025

/run-connector-tests

Connector CI Tests Started

These tests will leverage Airbyte's integration test credentials.

Check job output.
✅ Connector CI Tests job completed successfully. See logs for details.

@DanyloGL Danylo Jablonski (DanyloGL) moved this from Waiting CLA to Waiting Eng Team in 🧑‍🏭 Community Pull Requests Nov 10, 2025
@iacobus
Copy link
Author

Jacobo Blasco (iacobus) commented Nov 17, 2025

Hey Danylo Jablonski (@DanyloGL), got a new issue here.

Starting on November 13th, 2025, the companies sync has been failing with a new error. I'm almost certain the error is caused by this diff that changed the signature of a SimpleRetriever method, while the IntercomScrollRetriever still uses the old signature.

Strangely, this PR's fix still works under Connector Builder, but fails in a real sync.

I can't tell at this point if my workflow running local Airbyte is somehow preventing this to work, or if the connector is completely broken for everybody due to that signature change. Please let me know? I'll try to see how to patch it in the meantime so that it resumes working.

@iacobus
Copy link
Author

Jacobo Blasco (iacobus) commented Nov 17, 2025

OK, I have a working patch for the new problem. Editing IntercomScrollRetriever to pass an empty stream_state works. This patch is a shim supporting both versions, doing it this way so that ConnectorBuilder also works (guessing it's stuck with the old CDK).

    def _read_pages(
        self,
        records_generator_fn: Callable[[Optional[requests.Response]], Iterable[Record]],
        *args,
        # stream_state: Mapping[str, Any],
        # stream_slice: StreamSlice,
    ) -> Iterable[Record]:
        """
        Reads pages with pagination and reset handling using _next_page_token.
        """
        
        # SHIM
        if len(args) == 2:
            #  old CDK
            stream_state, stream_slice = args
        elif len(args) == 1:
            # new CDK
            (stream_slice,) = args
            stream_state = {}
        else:
            raise TypeError(
                f"Unexpected call signature to _read_pages: args={args}"
            )
         # END OF SHIM
            
        # original method

Please advise with next steps. Should I add a 1-argument-only version of this patch to this PR?

@nilzzzzzz
Copy link

Alfredo Garcia (@agarctfi) any chance that we can somehow move forward here? The integration is broken right now for the companies.

@agarctfi
Copy link
Contributor

nilzzzzzz Apologies for the delay in getting this over the finish line. We've since pushed these updates to help solve this issue within the CDK:
#70335
airbytehq/airbyte-python-cdk#864

It is being tested in progressive rollouts, and we found another follow-up issue that needs to be solved before it is released to everyone. It is intermittent and happens when concurrency causes the comapnies stream and company_segments stream to run in parallel. Users could get a 400 error since the Intercom API only allows one scroll request at a given time.

Eventually, the syncs do finish, but ideally, we can account for this, which is the follow-up work that we are finishing up.

The work for that is being done in these PRs:
airbytehq/airbyte-python-cdk#870
#71141

From my understanding, we expect to have this work completed sometime early/mid next week.

If you are an Airbyte Cloud user, we can also add you to the progressive rollout so your syncs benefit from these changes sooner. If you do, please create a support ticket asking to be added to the 0.13.16-rc.1 rollout, or feel free to comment your workspace ID here, and we can get that done fairly quickly.

If you are on OSS, you should also be able to edit the Docker image tag directly under Workspace Settings -> Sources and change it to 0.13.16-rc.1. This will pin you to the progressive rollout version that should sync all the records, with the caveat that it may occasionally fail due to the follow-up issue I outlined above.

@nilzzzzzz
Copy link

nilzzzzzz Apologies for the delay in getting this over the finish line. We've since pushed these updates to help solve this issue within the CDK: #70335 airbytehq/airbyte-python-cdk#864

It is being tested in progressive rollouts, and we found another follow-up issue that needs to be solved before it is released to everyone. It is intermittent and happens when concurrency causes the comapnies stream and company_segments stream to run in parallel. Users could get a 400 error since the Intercom API only allows one scroll request at a given time.

Eventually, the syncs do finish, but ideally, we can account for this, which is the follow-up work that we are finishing up.

The work for that is being done in these PRs: airbytehq/airbyte-python-cdk#870 #71141

From my understanding, we expect to have this work completed sometime early/mid next week.

If you are an Airbyte Cloud user, we can also add you to the progressive rollout so your syncs benefit from these changes sooner. If you do, please create a support ticket asking to be added to the 0.13.16-rc.1 rollout, or feel free to comment your workspace ID here, and we can get that done fairly quickly.

If you are on OSS, you should also be able to edit the Docker image tag directly under Workspace Settings -> Sources and change it to 0.13.16-rc.1. This will pin you to the progressive rollout version that should sync all the records, with the caveat that it may occasionally fail due to the follow-up issue I outlined above.

Thank you for taking the time to give such a detailed explanation and a reasonable workaround! Highly appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Waiting Eng Team

Development

Successfully merging this pull request may close these issues.

6 participants