-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry "org.hbase.async.RemoteException: Call queue is full on" RPCs #135
Comments
Hi, Do you know what the plan is for fixing this? |
is there a workaround or anything? |
@manolama, can you glance at this PR - does it match what your plan for this issue was? These changes don't solve the issue for us yet, though it is catching these exceptions so presumably we need further handling to retry them. |
I am getting this error using Cloudera CDH5.7.2 which comes with I have been able to work around on it for some Get requests by increasing My typical use case is querying for 500k-1m random Gets (out of a total of 25m rows) stored on 9 region servers, hosting 100 pre-split regions. 1m row keys roughly equate to 8GB of data. Are there any other workarounds or advice for dealing with the call queue full issue? |
All I keep doing is resizing/splitting regions until they are smaller and smaller. I have 40 region servers and currently 258 regions |
Our workaround (on same CDH5.7.x) was to set tsd.core.meta.enable_realtime_ts = false. |
@vitaliyf from my reading the setting |
I seems to have lost the track here, what happens here? Apache HBase 1.3 was released this January, I think this issue seems to be resolved?.. The proper behavior should be to not bail out but retry on that kind of exception, but avoid clearing location cache since it's likely temporary overload and not a permanent failure. |
@mikhail-antonov This can still happen in 1.3, it simply has to do with a region server being unable to handle the request load. We can add code to AsyncHBase that would buffer and retry requests with a delay but that only makes sense for buffered writes. For reads, it makes more sense to fail the RPC and let the application figure out what to do, I think. @dsimmie You're correct, that has no affect on AsyncHBase. |
Thanks @manolama for the update. How does the native client behave with GetRequests ? Doesn't it retry like for PutRequest ? Is there someone working on this bug ? We are using asynchbase outside openTSDB and are highly affected by this. |
@stannie42 Not too sure yet regarding the native client but we just upgraded internally to 1.3 and faced the issue when the HBase config changed and merged the read and write queues. We're separating them again and if that solves it I'd suggest you try it as well. |
HBase 1.x and later return an exception when the call queue is full. The native client will retry these calls as if it's was a recoverable exception. AsyncHBase should do the same.
The text was updated successfully, but these errors were encountered: