Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement] Define a general Helm chart configuration approach #6123

Open
3 of 4 tasks
dnskr opened this issue Mar 3, 2024 · 12 comments
Open
3 of 4 tasks

[Improvement] Define a general Helm chart configuration approach #6123

dnskr opened this issue Mar 3, 2024 · 12 comments

Comments

@dnskr
Copy link
Contributor

dnskr commented Mar 3, 2024

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

What would you like to be improved?

There is no clear common approach at the moment on how to configure Kyuubi deployment if the Helm chart is used.
I would like to discuss requirements, limitations and different options to choose one approach to follow and support it in the Kyuubi Helm chart configuration. The problem has been mentioned and discussed in multiple issues and PRs, so the idea is to collect all opinions in one place and make the decision.

Configuration system of the Apache Kyuubi
The configuration system allows to configure values using the following options (ordered from low to high prio):

  1. [static] Kyuubi configuration files
  2. [static] Hadoop configuration files
  3. [static] Engine (Spark, Flink, Trino etc) configuration files
  4. [runtime] Environment variables
  5. [runtime] JDBC Connection URLs
  6. [runtime] SET Statements

Runtime options JDBC Connection URLs and SET Statements
Can be skipped in the discussion, because they used only when Kyuubi is up and running.

Runtime option Environment variables
Configured by {{ .Values.env }} and {{ .Values.envFrom }} value properties.
The Helm chart users can specify environment variables to provide necessary configuration values with low effort if needed. The properties also allow to use provided (existing) ConfigMaps and Secrets as the sources of environment variables, for instance:

env:
  - name: ENV_VALUE
    value: env-value
  - name: ENV_FROM_CONFIGMAP_KEY
    valueFrom:
      configMapKeyRef:
        name: env-configmap
        key: env-key

envFrom:
  - configMapRef:
      name: all-env-configmap

Static options
Represented by configuration files which should be located in each Kyuubi container in specific paths.
In general case, the easiest way to provide files into Kubernetes pod (container) is to mount ConfigMap or Secret to a specific path.

[IN PROGRESS] Requirements
Note: this section is in progress and subject to discuss.

  1. Ability to create ConfigMaps under the hood from value properties of value.yaml file.
    Secrets should never be created and managed by Helm chart because of security consideration!
  2. Ability to specify existing (created outside the chart) ConfigMaps and Secrets by resource name as a reference.
  3. Ability to provide multiple existing ConfigMaps and Secrets with priority order.
    Multiple ConfigMaps and Secrets might have key duplicates, so the implementation should clearly resolve the collision by merging keys in priority order.
  4. Ability to mix ConfigMaps managed by the chart with ConfigMaps and Secrets provided by user with priority order.
    The issue with key duplicates should be clearly resolved. ConfigMaps managed by the chart should have the lowest prio.
  5. The approach should work for Helm and GitOps tools like ArgoCD, Flux etc.
  6. Easy way to specify one or many configuration files as Helm values, i.e. properties in value.yaml file.
    Some configuration files might be huge and complex, so the idea is to prevent identation issues in values.yaml file.
  7. Easy way to create ConfigMaps and Secrets from one or many configuration files.
    Users might have a lot of xml, properties and other files, so the idea is to help users to create ConfigMap and Secret resources in a simple way.

How should we improve?

[IN PROGRESS] Approach
Note: this section is in progress and subject to discuss.

  1. Group configuration file properties in values.yaml by system like Kyuubi, Hadoop, Spark, Trino etc.
kyuubiConfDir: /opt/kyuubi/conf
kyuubiConf:
  ...

sparkConfDir: /opt/spark/conf
sparkConf:
  ...
  1. Use files property to specify various files.
    Users can define files with any file name. Each entity within files property used as a key/value pair in the corresponding ConfigMap.
sparkConf:
  files:
    'spark-env.sh': |
      #!/usr/bin/env bash
      export SPARK_LOG_DIR=/opt/spark/logs
    'spark-defaults.conf': |
      spark.submit.deployMode=cluster
  1. Use from property to specify list of existing ConfigMaps and Secrets to be mounted to the configuration path of Kyuubi container.
sparkConf:
  from:
    - configMap:
        name: my-spark-confs
    - secret:
        name: my-sensetive-spark-confs
    - secret:
        name: my-sensetive-spark-confs-2
        items:
          - key: secretKey
            path: filename.xml

The implementation idea is to use Projected Volumes with core/v1/SecretProjection and core/v1/ConfigMapProjection entities.
Also it will allow to merge ConfigMap created from files property with the entities from from property.

  1. Move xxxConfDir property to xxxConf property.
sparkConf:
  dir: /opt/spark/conf
  1. Configuration example for Spark
sparkConf:
  dir: /opt/spark/conf
  files:
    'spark-env.sh': |
      #!/usr/bin/env bash
      export SPARK_LOG_DIR=/opt/spark/logs
    'spark-defaults.conf': |
      spark.submit.deployMode=cluster
  from:
    - configMap:
        name: my-spark-confs
    - secret:
        name: my-sensetive-spark-confs
    - secret:
        name: my-sensetive-spark-confs-2
        items:
          - key: secretKey
            path: filename.xml
  1. Provide documentation with examples on how to set file content as a property when installing the chart, see Helm docs.
helm install kyuubi charts/kyuubi --set-file kyuubiConf.log4j2=kyuubi/conf/log4j2.xml.template
  1. Provide documentation with examples on how to create ConfigMap from file or directory, see Kubernetes docs .
kubectl create configmap my-spark-configs --from-file=prod/spark-configs/

Are you willing to submit PR?

  • Yes. I would be willing to submit a PR with guidance from the Kyuubi community to improve.
  • No. I cannot submit a PR at this time.
@dnskr
Copy link
Contributor Author

dnskr commented Apr 21, 2024

@pan3793 @zwangsheng @chgl Could you please have a look at it?
I would like to implement it (see point 5 with conf example), but it would be great to have your feedback and comments before.

@chgl
Copy link

chgl commented Apr 22, 2024

Looks good to me!

@chgl
Copy link

chgl commented May 10, 2024

maybe just one minor detail: make sure all changed configs contribute to the checksum annotation for the statefulset so kyuubi is automatically restarted: https://github.com/apache/kyuubi/blob/master/charts/kyuubi/templates/kyuubi-statefulset.yaml#L41

@dnskr
Copy link
Contributor Author

dnskr commented May 29, 2024

@chgl I'm afraid it is impossible to calculate and keep updated checksum for entities in sparkConf.from list, i.e. for provided ConfigMaps and Secrets. We probably can use lookup function to find checksum value during chart release or update, but users can change those ConfigMaps and Secrets in between. Do you know any good (ideally standard) solution to solve the problem?

@chgl
Copy link

chgl commented May 30, 2024

@dnskr , you're right of course! I hadn't considered the from secrets/configmaps. In that case I believe https://github.com/stakater/Reloader is still the best way to go (would require making the annotations for the StatefulSet https://github.com/apache/kyuubi/blob/master/charts/kyuubi/templates/kyuubi-statefulset.yaml#L40 configurable, though).

@sudohainguyen
Copy link
Contributor

quick question, is restarting kyuubi when spark default config updated a must?
I believe spark config can be updated in runtime, once the configmap is updated, upcoming sessions will be affected

@dnskr
Copy link
Contributor Author

dnskr commented Jun 27, 2024

quick question, is restarting kyuubi when spark default config updated a must? I believe spark config can be updated in runtime, once the configmap is updated, upcoming sessions will be affected

@sudohainguyen Very good question. In the current implementation there is no Kyuubi auto restart on Spark config changes, but I assume it might be needed to stop running Spark immediately on some config changes.

@sudohainguyen
Copy link
Contributor

yeah I think it depends on how users expect it
it would be perfect if we can make it as a configurable option

@dnskr
Copy link
Contributor Author

dnskr commented Jul 2, 2024

Created draft implementation #6521

@dnskr
Copy link
Contributor Author

dnskr commented Jul 2, 2024

Suggested approach and the draft implementation has an unsolved problem described below.

Some properties affect both helm chart and the application. For instance, .Values.server.rest.port is used in Kubernetes resource definitions (Pod and Service ports) and as kyuubi.frontend.rest.bind.port value in kyuubi-defaults.conf file. So, kyuubi.frontend.rest.bind.port value should be in sync with .Values.server.rest.port property if the chart user provides custom (overrides default) kyuubi-defaults.conf file.

Ideally, it would be great to move such properties outside the default kyuubi-defaults.conf file and keep the file empty by default. I.e. the solution is to set kyuubi.frontend.rest.bind.port in another way, for example using hardcoded environment variable or cli argument, but as far as I know, Kyuubi does not provide such capabilities at the momement.
I mean we cannot set kyuubi.frontend.rest.bind.port as KYUUBI_FRONTEND_REST_BIND_PORT env variable or cli argument --conf kyuubi.frontend.rest.bind.port= 10099 similar to Spark.

Any ideas?

@pan3793
Copy link
Member

pan3793 commented Jul 4, 2024

@dnskr Thanks for continuously working on this area. Carry my idea from the Slack channel - overriding configurations via --conf is preferred over env vars.

@dnskr
Copy link
Contributor Author

dnskr commented Jul 4, 2024

@dnskr Thanks for continuously working on this area. Carry my idea from the Slack channel - overriding configurations via --conf is preferred over env vars.

@pan3793 I created improvement proposal based on your suggestion: Provide Kyuubi configuration properties with cli options.
I'll try to implement it, but also I'll be glad if someone takes the opportunity to implement it because I'm not sure I have enough context at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants