Skip to content

fix: cancel timed out requests #65

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 27, 2025
Merged

fix: cancel timed out requests #65

merged 6 commits into from
Mar 27, 2025

Conversation

MQ37
Copy link
Contributor

@MQ37 MQ37 commented Mar 26, 2025

Since we cannot remove request that are already being crawled we just return dataset id on timeout so the user knows where to check for results - as we discussed.

closes #31

@MQ37 MQ37 requested a review from jirispilka March 26, 2025 10:35
@jirispilka
Copy link
Collaborator

jirispilka commented Mar 27, 2025

On second thought, this goes against the idea of standby mode — it's meant to be request/response. No one will check the dataset, and that's not the goal.

It would be better to fix the issue by terminating the running request and returning empty results.
I can't image a use-case that someone would use it this way.

Can you?

@MQ37
Copy link
Contributor Author

MQ37 commented Mar 27, 2025

On second thought, this goes against the idea of standby mode — it's meant to be request/response. No one will check the dataset, and that's not the goal.

It would be better to fix the issue by terminating the running request and returning empty results. I can't image a use-case that someone would use it this way.

Can you?

Agree, not many people will check the dataset and they will consume mainly the response itself. Will check if there is a simple way to skip or prevent the request from crawling.

@MQ37 MQ37 changed the title fix: return dataset it on request timeout fix: cancel timed out requests Mar 27, 2025
@MQ37 MQ37 requested a review from matyascimbulka March 27, 2025 09:40
@MQ37
Copy link
Contributor Author

MQ37 commented Mar 27, 2025

@jirispilka Changed the implementation to cancel the requests of timed out response based on the discussion apify/crawlee#1215. This way the request content is not handled, they are only added to the dataset with status failed with reason timed out.

Copy link
Collaborator

@matyascimbulka matyascimbulka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree that in standby mode the dataset is irrelevant. Thank you the solutions looks good. I just have small comments.

The biggest issue for me is that the bounded array doesn't survive migrations which could potentially leave some requests running. @jirispilka Do you think this could be an issue?

And there are 2 lint errors.

@matyascimbulka
Copy link
Collaborator

@MQ37 I'm looking at the code again and I'm wondering if we can't just check if responseData in responses.ts has the responseId key when handling the request in the handler. That way we wouldn't need to manage another new array and it would survive migrations.

@MQ37
Copy link
Contributor Author

MQ37 commented Mar 27, 2025

@MQ37 I'm looking at the code again and I'm wondering if we can't just check if responseData in responses.ts has the responseId key when handling the request in the handler. That way we wouldn't need to manage another new array and it would survive migrations.

This might be simpler but we would still need to handle the migration, right? Or the response data is handled on migration?

@matyascimbulka
Copy link
Collaborator

I think migrations would work as expected. The responseData map would be initiated empty therefore any resurrected requests wouldn't find their responseId as its key. The missing key would signal the handler to set the request.noRetry = true; and throw the error.

@MQ37
Copy link
Contributor Author

MQ37 commented Mar 27, 2025

I think migrations would work as expected. The responseData map would be initiated empty therefore any resurrected requests wouldn't find their responseId as its key. The missing key would signal the handler to set the request.noRetry = true; and throw the error.

I guess the responseData approach would be simpler, thank you for suggestion 👍

What actually happens to the user request to the /search endpoint when the Actor migrates. Does Apify act as a persist the connection that is then handed over to the new Actor container or is it closed? If it is closed we actually do not need to handle this case.

@MQ37
Copy link
Contributor Author

MQ37 commented Mar 27, 2025

I think migrations would work as expected. The responseData map would be initiated empty therefore any resurrected requests wouldn't find their responseId as its key. The missing key would signal the handler to set the request.noRetry = true; and throw the error.

I guess the responseData approach would be simpler, thank you for suggestion 👍

What actually happens to the user request to the /search endpoint when the Actor migrates. Does Apify act as a persist the connection that is then handed over to the new Actor container or is it closed? If it is closed we actually do not need to handle this case.

Ahh, I see. We just sent Actor is migrating please try again and cut the connection.

@MQ37
Copy link
Contributor Author

MQ37 commented Mar 27, 2025

@matyascimbulka refactored based on your suggestion and the implementation is much simpler, thank you 👍

@MQ37 MQ37 requested a review from matyascimbulka March 27, 2025 15:47
Copy link
Collaborator

@matyascimbulka matyascimbulka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're welcome. I'm happy to help. This looks good.

Copy link
Collaborator

@jirispilka jirispilka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MQ37 @matyascimbulka thank you guys!

@MQ37 please just update the CHANGELOG.md and we are good to go.

@MQ37 MQ37 merged commit d9eddc7 into master Mar 27, 2025
1 check passed
MQ37 added a commit that referenced this pull request Mar 27, 2025
MQ37 added a commit that referenced this pull request Mar 27, 2025
MQ37 added a commit that referenced this pull request Mar 27, 2025
MQ37 added a commit that referenced this pull request Mar 27, 2025
* Revert "chore: Revert "fix: cancel timed out requests (#65)" (#67)"

This reverts commit 7d66686.

* only cancel requests for standby actors

* Update CHANGELOG.md

* Update CHANGELOG.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove pending requests from Playwright request queue after timeout
3 participants