-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Druid pods taking more time to come up when security enabled #301
Comments
@AdheipSingh : Can you please provide any info on this ? if you know already |
sorry missed this out, |
Thanks for the rly @AdheipSingh and sry for not responding actually we were testing this and found out that when we give all node types as a statefulset like you said all pods are deployed in parallel, But let's say in my case we are having Now when we install we see along with 2 statefulsets only one deployment is getting up hence stopping other deployments from coming up. Can you please check this behavior if not already done? Or any suggestions would be of a great help!! Stack trace |
Intermediately i see this issue after doing scratch install multiple times. But i always see above stack trace no sure why :( |
@bharathkuppala https://github.com/druid-io/druid-operator/blob/master/controllers/druid/handler.go#L182 Be deployment or statefulset in any combination, all pods should come up in parallel for the 1st installation. Can you tell me the generation number in the CR metadata, if its > 1 and rollingDeploy is kept to true in your CR, upgrades and changes will happen incremental. Use |
@AdheipSingh : Yes like you said there is no problem while doing the upgrade with generation number > 1 it would follow the prescribed order. But somehow i see with podMangamentPolicy as parallel both Historical and MM are deployed next broker would comeup stopping other deployments |
We are trying our druid cluster in open shift. Just wanted to know do we have to check something on open shift side to troubleshoot this behavior? druid-operator/controllers/druid/handler.go Line 680 in 5abef01
|
@bharathkuppala IMHO it should not arise, i will try to re-create this issue. Just to double check, did you switch between sts and deployments in Kind ? |
@AdheipSingh : Yes i did switch but still i see this issue |
druid-tsdb-apache-druid-historicals-0 0/1 Pending 0 0s This is what is happening in my current setup once historical, MM and Broker pods are up then only coordinator and routers are triggered |
are you adding any node selectors ? are you coordinator/routers get scheduled on different nodes |
@AdheipSingh : No we are not using any node selectors for scheduling the pods. And we are using rollingDeploy as true because we want our pods to have an upgrade based on prescribed order. Yea i see on druid-operator code we are not checking generation so pods should come in parallel. I can share logs of druid-operator so we could have a complete trace. |
@bharathkuppala whats your podmanagement policy ? try setting to |
@AdheipSingh : Yes it is by default parallel right for sts? |
Hello All,
druid-operator: v0.0.7
druid: v0.22.1
We are trying to enable basic security for druid and we found out that it was taking Approx. 16 min for entire cluster to come up. Upon investigation on logs we found out that there were multiple retries against coordinator trying to fetch some users data.
Logs from middleManager:
{"timeMillis":1653992838211,"thread":"main","level":"WARN","loggerName":"org.apache.druid.java.util.common.RetryUtils","message":"Retrying (1 of 9) in 727ms.","thrown":{"commonElementCount":0,"localizedMessage":"No content to map due to end-of-input\n at [Source: (byte[])\"\"; line: -1, column: 0]","message":"No content to map due to end-of-input\n at [Source: (byte[])\"\"; line: -1, column: 0]","name":"com.fasterxml.jackson.databind.exc.MismatchedInputException","extendedStackTrace":[{"class":"com.fasterxml.jackson.databind.exc.MismatchedInputException","method":"from","file":"MismatchedInputException.java","line":59,"exact":false,"location":"jackson-databind-2.10.5.1.jar","version":"2.10.5.1"},{"class":"com.fasterxml.jackson.databind.ObjectMapper","method":"_initForReading","file":"ObjectMapper.java","line":4360,"exact":false,"location":"jackson-databind-2.10.5.1.jar","version":"2.10.5.1"},{"class":"com.fasterxml.jackson.databind.ObjectMapper","method":"_readMapAndClose","file":"ObjectMapper.java","line":4205,"exact":false,"location":"jackson-databind-2.10.5.1.jar","version":"2.10.5.1"},{"class":"com.fasterxml.jackson.databind.ObjectMapper","method":"readValue","file":"ObjectMapper.java","line":3292,"exact":false,"location":"jackson-databind-2.10.5.1.jar","version":"2.10.5.1"},{"class":"org.apache.druid.security.basic.authorization.db.cache.CoordinatorPollingBasicAuthorizerCacheManager","method":"tryFetchUserMapsFromCoordinator","file":"CoordinatorPollingBasicAuthorizerCacheManager.java","line":400,"exact":false,"location":"?","version":"?"},{"class":"org.apache.druid.security.basic.authorization.db.cache.CoordinatorPollingBasicAuthorizerCacheManager","method":"lambda$fetchUserAndRoleMapFromCoordinator$4","file":"CoordinatorPollingBasicAuthorizerCacheManager.java","line":330,"exact":false,"location":"?","version":"?"},{"class":"org.apache.druid.java.util.common.RetryUtils","method":"retry","file":"RetryUtils.java","line":129,"exact":true,"location":"druid-core-0.22.1.jar","version":"0.22.1"},{"class":"org.apache.druid.java.util.common.RetryUtils","method":"retry","file":"RetryUtils.java","line":81,"exact":true,"location":"druid-core-0.22.1.jar","version":"0.22.1"},{"class":"org.apache.druid.java.util.common.RetryUtils","method":"retry","file":"RetryUtils.java","line":163,"exact":true,"location":"druid-core-0.22.1.jar","version":"0.22.1"},{"class":"org.apache.druid.java.util.common.RetryUtils","method":"retry","file":"RetryUtils.java","line":153,"exact":true,"location":"druid-core-0.22.1.jar","version":"0.22.1"},{"class":"org.apache.druid.security.basic.authorization.db.cache.CoordinatorPollingBasicAuthorizerCacheManager","method":"fetchUserAndRoleMapFromCoordinator","file":"CoordinatorPollingBasicAuthorizerCacheManager.java","line":328,"exact":true,"location":"druid-basic-security-0.22.1.jar","version":"0.22.1"},{"class":"org.apache.druid.security.basic.authorization.db.cache.CoordinatorPollingBasicAuthorizerCacheManager","method":"initUserMaps","file":"CoordinatorPollingBasicAuthorizerCacheManager.java","line":457,"exact":true,"location":"druid-basic-security-0.22.1.jar","version":"0.22.1"},{"class":"org.apache.druid.security.basic.authorization.db.cache.CoordinatorPollingBasicAuthorizerCacheManager","method":"start","file":"CoordinatorPollingBasicAuthorizerCacheManager.java","line":116,"exact":true,"location":"druid-basic-security-0.22.1.jar","version":"0.22.1"},{"class":"sun.reflect.NativeMethodAccessorImpl","method":"invoke0","file":"NativeMethodAccessorImpl.java","line":-2,"exact":false,"location":"?","version":"1.8.0_275"},{"class":"sun.reflect.NativeMethodAccessorImpl","method":"invoke","file":"NativeMethodAccessorImpl.java","line":62,"exact":false,"location":"?","version":"1.8.0_275"},{"class":"sun.reflect.DelegatingMethodAccessorImpl","method":"invoke","file":"DelegatingMethodAccessorImpl.java","line":43,"exact":false,"location":"?","version":"1.8.0_275"},{"class":"java.lang.reflect.Method","method":"invoke","file":"Method.java","line":498,"exact":false,"location":"?","version":"1.8.0_275"},{"class":"org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler","method":"start","file":"Lifecycle.java","line":446,"exact":true,"location":"druid-core-0.22.1.jar","version":"0.22.1"},{"class":"org.apache.druid.java.util.common.lifecycle.Lifecycle","method":"start","file":"Lifecycle.java","line":341,"exact":true,"location":"druid-core-0.22.1.jar","version":"0.22.1"},{"class":"org.apache.druid.guice.LifecycleModule$2","method":"start","file":"LifecycleModule.java","line":143,"exact":true,"location":"druid-core-0.22.1.jar","version":"0.22.1"},{"class":"org.apache.druid.cli.GuiceRunnable","method":"initLifecycle","file":"GuiceRunnable.java","line":115,"exact":true,"location":"druid-services-0.22.1.jar","version":"0.22.1"},{"class":"org.apache.druid.cli.ServerRunnable","method":"run","file":"ServerRunnable.java","line":63,"exact":true,"location":"druid-services-0.22.1.jar","version":"0.22.1"},{"class":"org.apache.druid.cli.Main","method":"main","file":"Main.java","line":113,"exact":true,"location":"druid-services-0.22.1.jar","version":"0.22.1"}]},"endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","threadId":1,"threadPriority":5}
In the above log its looking for some method which would fetch some information from coordinator and would retry for 9 times.
Wanted to know does druid-operator impose any restrictions upon druid installation like order its broker, historicals and middlemanager which would comeup first during scratch installation then comes coordinator and router?
If above is true then pods will fail looking for coordinator?
Any help on this?
The text was updated successfully, but these errors were encountered: