-
Notifications
You must be signed in to change notification settings - Fork 8
fix: cancel timed out requests #65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
On second thought, this goes against the idea of standby mode — it's meant to be request/response. No one will check the dataset, and that's not the goal. It would be better to fix the issue by terminating the running request and returning empty results. Can you? |
Agree, not many people will check the dataset and they will consume mainly the response itself. Will check if there is a simple way to skip or prevent the request from crawling. |
@jirispilka Changed the implementation to cancel the requests of timed out response based on the discussion apify/crawlee#1215. This way the request content is not handled, they are only added to the dataset with status failed with reason timed out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also agree that in standby mode the dataset is irrelevant. Thank you the solutions looks good. I just have small comments.
The biggest issue for me is that the bounded array doesn't survive migrations which could potentially leave some requests running. @jirispilka Do you think this could be an issue?
And there are 2 lint errors.
@MQ37 I'm looking at the code again and I'm wondering if we can't just check if |
This might be simpler but we would still need to handle the migration, right? Or the response data is handled on migration? |
I think migrations would work as expected. The |
I guess the What actually happens to the user request to the |
Ahh, I see. We just sent Actor is migrating please try again and cut the connection. |
@matyascimbulka refactored based on your suggestion and the implementation is much simpler, thank you 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're welcome. I'm happy to help. This looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MQ37 @matyascimbulka thank you guys!
@MQ37 please just update the CHANGELOG.md and we are good to go.
Since we cannot remove request that are already being crawled we just return dataset id on timeout so the user knows where to check for results - as we discussed.
closes #31