Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core(driver): recover from PROTOCOL_TIMEOUT in void calls #11508

Closed
wants to merge 1 commit into from

Conversation

patrickhulce
Copy link
Collaborator

Summary
A radical change, but one that might solve most of our PROTOCOL_TIMEOUT problems. Basically when we are sending Chrome a command but we don't care about the result, recover from any protocol timeouts and just keep going if Chrome still seems to be responding. Tradeoff here is that we don't know for sure whether Chrome just didn't reply to us and did what we asked it to or if it really failed to do what we asked and future results will be unexpected. Given the type of methods that frequently appear in #6512, I suspect getting a report would still work and be much preferred to nonsense fatal errors, but I could be wrong and this might not be worth it.

Thoughts?

Related Issues/PRs
fixes #6512

@patrickhulce patrickhulce changed the title core(driver): recover from PROTOCOL_TIMEOUT in void core(driver): recover from PROTOCOL_TIMEOUT in void calls Oct 1, 2020
* @description Error message explaining that Chrome has stopped responding to protocol messages.
* @example {Network.enable} protocolMethod
* */
chromeNotResponding: 'Waiting for DevTools protocol response has exceeded the allotted time. (Method: {protocolMethod})',
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could tweak this obviously, I have half a mind to just tweak both of these to say "Chrome stopped responding, Lighthouse couldn't really do anything about it"

@brendankenny
Copy link
Member

Tradeoff here is that we don't know for sure whether Chrome just didn't reply to us and did what we asked it to or if it really failed to do what we asked and future results will be unexpected

Seems like we might need to do a big run and to get an idea of what happens? What's the usual rate of PROTOCOL_TIMOUT?

@brendankenny
Copy link
Member

What's the usual rate of PROTOCOL_TIMOUT?

It's 0.106% in the last HTTP Archive run, but that might be better behaved than with much more random urls and a variety of types of installs of Chrome.

@connorjclark
Copy link
Collaborator

My hypothesis is that the particular command that hangs is often immaterial, and that the entire connection is fubar. I'd like to see an analysis of how many commands past the first hung void call actually get a response. Can you add that as a counter somehow?

@patrickhulce
Copy link
Collaborator Author

It's 0.106% in the last HTTP Archive run, but that might be better behaved than with much more random urls and a variety of types of installs of Chrome.

The problem I personally experience is that I never get it from the CLI, but get it nearly 30% of runs in DevTools. With that lens, I think it's going to be really difficult to collect the data we need to figure this out :(

I'd like to see an analysis of how many commands past the first hung void call actually get a response. Can you add that as a counter somehow?

Yeah we could do that in driver and after X timeouts just fatally throw immediately. I wanted to try to track this with Sentry but the reasoning above for DevTools makes that tricky :/

@connorjclark
Copy link
Collaborator

You might have some luck writing an audit runner wrapped in a layout test. It'd be hacky, but it could automate running Lighthouse in DevTools.

@patrickhulce
Copy link
Collaborator Author

Sounds like we need a harness to run DevTools in GCP at scale to move this forward and not sure I have the bandwidth to take that on right now. I'm going to close and unassign, but this should be linked up if anyone else wants to take this up in the future :)

@Hayzlee11
Copy link

Summary A radical change, but one that might solve most of our PROTOCOL_TIMEOUT problems. Basically when we are sending Chrome a command but we don't care about the result, recover from any protocol timeouts and just keep going if Chrome still seems to be responding. Tradeoff here is that we don't know for sure whether Chrome just didn't reply to us and did what we asked it to or if it really failed to do what we asked and future results will be unexpected. Given the type of methods that frequently appear in #6512, I suspect getting a report would still work and be much preferred to nonsense fatal errors, but I could be wrong and this might not be worth it.

Thoughts?

**Related Issues/PRs*6512

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

☂️ PROTOCOL_TIMEOUT
5 participants