Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DSIP-63][k8s] Support User-customized K8s YAML Task #16478

Closed
2 of 3 tasks
Tracked by #14102
Mighten opened this issue Aug 18, 2024 · 13 comments
Closed
2 of 3 tasks
Tracked by #14102

[DSIP-63][k8s] Support User-customized K8s YAML Task #16478

Mighten opened this issue Aug 18, 2024 · 13 comments
Assignees
Labels

Comments

@Mighten
Copy link
Contributor

Mighten commented Aug 18, 2024

Action List: Extension of operations for the k8s YAML task:

Search before asking

  • I had searched in the DSIP and found no similar DSIP.

Motivation

Supporting user-customized K8s YAML tasks has the following benefits:

  • Flexibility: Unlike the existing K8s low-code job with limited functionality, YAML tasks provide users with the flexibility to define sophisticated task instances in DolphinScheduler, similar to how custom JSON does in DataX.

  • Workflow Customization: Users can integrate operational and maintenance processes into DolphinScheduler using YAML for complex workflows.

  • Configuration Requirements: The current K8s low-code job does not meet users' in-depth needs, particularly for tasks involving multiple pods or specific configurations like environment variables and tolerations; in contrast, K8s YAML tasks do.

In short, by enabling user-customized YAML tasks, DolphinScheduler can better support a wide range of Kubernetes-based workflows and operational requirements.

Design Detail

2.1 Design Overview

The following is a Swimlane Diagram showing how this k8s YAML task is embedded into Apache DolphinScheduler:

2-1-1-design-overview
Figure 2-1(1). Design Overview

  1. User starts a Web page to edit and save K8s YAML Workflow.
  2. UI provides an editor for user to input YAML in Custom Template mode.
  3. API Server encapsulates command and hands it over to Master.
  4. Master splits the workflow DAG and dispatches tasks to Worker.
  5. Worker picks the appropriate task executor and operation. E.g., for k8s Pod YAML, Worker picks YAML Task Executor, and then picks Pod Operation.
  6. Worker reports status to Master.
  7. User reviews k8s YAML task log in the Task Instance Window.

2.2 Frontend Design

The frontend adds support for user-customized k8s YAML tasks while remaining compatible with the original k8s low-code jobs.

2-2-1-frontend-design
Figure 2-2(1). Design Overview

  1. The Web UI layouts

    When the user switches on the Custom Template, the Low-code k8s Job fields should hide and YAML editor should appear (or vice versa), similar to the JSON Custom Template in the DataX plugin.

    This feature, as shown in Figure 2-2(1), is implemented using the Vue component span, which is controlled by reactive variables (such as yamlEditorSpan) in the file dolphinscheduler-ui/src/views/projects/task/components/node/fields/use-k8s.ts.

  2. The Request body

    When the user switches to Custom Template mode, the request body should include only YAML-related fields (customConfig and yamlContent), and all previously hidden fields should not be sent.

    This feature is implemented using the taskParams in the file dolphinscheduler-ui/src/views/projects/task/components/node/format-data.ts

  3. i18n/locales

    Apache DolphinScheduler is an international software and should support multiple languages.

    The text on the Web UI are retrieved from variables defined in the file dolphinscheduler-ui/src/locales/{en_US, zh_CN}/project.ts. And for user-customized k8s YAML tasks, there are three key variables to consider:

    • k8s_custom_template: the label for the switch to enable user-customized k8s YAML tasks.
    • k8s_yaml_template: the label for the text editor used to input user YAML.
    • k8s_yaml_empty_tips: the warning message displayed when a user tries to submit empty YAML

    This feature is implemented by invoking t('project.node.${variable_name}') (such as t('project.node.k8s_yaml_template')) in the file dolphinscheduler-ui/src/views/projects/task/components/node/fields/use-k8s.ts.

2.3 Backend Design

The backend design describes the process of how the worker executes user-customized k8s YAML tasks. As shown in Figure 2-3(1), we can see how user-customized k8s YAML Pod tasks are related to the original k8s low-code jobs.

2-3-1-backend-design-overview
Figure 2-3(1). Backend Design Overview

After the worker checks the parameters, K8sYamlTaskExecutor is loaded for the current user-customized k8s YAML Pod task. Once the YAML is parsed into HasMetadata, its kind field is used to assign abstractK8sOperation as K8sPodOperation for executing the YAML Pod task.

  1. K8s Task Executors

    2-3-2-backend-design-task-executors
    Figure 2-3(2). K8s Task Executors

    Three k8s task executor are involved, as shown in Figure 2-3(2):

    • AbstractK8sTaskExecutor is an abstract class that represents a k8s task executor.
    • K8sTaskExecutor is a concrete class that extends AbstractK8sTaskExecutor to represent a low-code executor
    • K8sYamlTaskExecutor is a concrete class that extends AbstractK8sTaskExecutor to represent a user-customized k8s YAML task executor.
  2. K8s Operation handler

    2-3-3-backend-design-operation-handlers
    Figure 2-3(3). K8s Operation Handlers

    Two operation handlers are involved, as shown in Figure 2-3(3):

    • AbstractK8sOperation is an interface representing all k8s resource operations.
    • K8sPodOperation is a concrete class that implements AbstractK8sOperation to handle Pod operations

2.4 Usecase Design

A typical use case for a k8s YAML task includes uploading YAML, online workflows, and starting workflows, similar to k8s low-code jobs, unless users switch to the Custom Template option to fill in YAML.

2-4-1-usecase-design
Figure 2-4(1). Usecase Design

  1. The user edits a k8s YAML node in a workflow
  2. If the Custom Template is activated and YAML content is not blank, the user may online this whole workflow
  3. If the workflow is online, the user may start the workflow and review the logs generated during the execution of the workflow.

Compatibility, Deprecation, and Migration Plan

3.1 Compatibility Plan

The user-customized k8s YAML feature requires only customConfig to be activated, By default, the value is 0, which applies to the existing k8s low-code jobs.

The remainder of this section will demonstrate the flexibility and compatibility of this design by using the example of introducing Configmaps:

    this.k8sYamlType = K8sYamlType.valueOf(this.metadata.getKind());
    generateOperation();

After parsing with YamlUtils::load, the kind field acquired by this.metadata.getKind() will be ConfigMaps. Then, this.k8sYamlType is determined and used to generate the corresponding operations:

    private void generateOperation() {
        switch (k8sYamlType) {
            case Pod:
                abstractK8sOperation = new K8sPodOperation(k8sUtils.getClient());
                break;
            case ConfigMaps:
                abstractK8sOperation = new K8sConfigmapsOperation(k8sUtils.getClient());
                break;
            default:
                throw new TaskException(
                        String.format("K8sYamlTaskExecutor do not support type %s", k8sYamlType.name()));
        }
    }

Consequently, generateOperation() will set this.abstractK8sOperation to a new instance of K8sConfigmapsOperation. Next, we can implement K8sConfigmapsOperation to handle the ConfigMaps operations.

3.2 Deprecation Plan

N/A for now, waiting for community opinions.

3.3 Migration Plan

N/A for now, waiting for community opinions.

Test Plan

4.1 Overview

The User-customized k8s YAML task feature allows users to submit YAML task to k8s, including Pod, ConfigMaps, and other resources.

This test plan aims to ensure that the feature functions as expected and meets user requirements.

4.2 Scope

  1. YAML Pod
Test Case # Name Action Expectation
1 UI Display Edit YAML, save and reopen The YAML content stays up-to-date.
2 UI Validation try to submit empty YAML The UI modal dialog intercepts empty YAML.
3 Online Workflow Save workflow, and online The User successfully brings the workflow online.
4 Dryrun Workflow Run workflow as dryrun mode The Master successfully dry runs this task.
5 Test Workflow Run workflow as test mode The Worker successfully tests this task.
6 Run Workflow Run workflow The Worker successfully runs this task.

Code of Conduct

@Mighten Mighten added DSIP Waiting for reply Waiting for reply labels Aug 18, 2024
@SbloodyS SbloodyS removed the Waiting for reply Waiting for reply label Aug 18, 2024
@SbloodyS
Copy link
Member

cc @Gallardot @ruanwenjun

@fuchanghai
Copy link
Member

@caishunfeng pls help to add this issue to #14102

@SbloodyS SbloodyS mentioned this issue Aug 18, 2024
77 tasks
@SbloodyS SbloodyS changed the title [DSIP][k8s] Support User-customized K8s YAML Task [DSIP-63][k8s] Support User-customized K8s YAML Task Aug 18, 2024
@SbloodyS
Copy link
Member

@caishunfeng pls help to add this issue to #14102

Done. You're also DS Committer and have permission to add to it.

@Gallardot
Copy link
Member

Gallardot commented Aug 18, 2024

Before discussing this DSIP, I hope everyone can reach a basic consensus. Supporting customization can indeed meet more demand scenarios, but excessive customization can bring more problems.

I see in the design that it supports users to directly create pod and configmap, and even supports creating multiple POD.

Regarding the support for configmap, I have some questions:

  1. Why support configmap? For the same workflow, does it create a configmap for each task instance? Is the content in the configmap different each time?
    If it is the same, why create it each time? As a configuration resource in k8s, shouldn't configmap be static? As a way to obtain configuration, besides configmap, should secret also be supported?
  2. Should the configmap be mounted to the pod as a file? If so, should PV and PVC be supported?
  3. If it is just to reference the configuration in the configmap, can it be directly referenced through env?

Regarding the support for pod, I have some questions:

  1. How is the name of the pod defined? How can different workflow in the same namespace ensure that pod names do not duplicate? This is also the case with configmap.
  2. How is the lifecycle of the pod managed? Will DS delete it after the task ends? How to ensure that DS can definitely delete it?
  3. If the execution strategy of the workflow is parallel, how should the pod be handled?
  4. If multiple pods are created at the same time, are these pods related? Or is it just to run multiple pods concurrently? If it is concurrent, does it support deployments? Does it support StatefulSet? Should DS manage it as a controller of k8s resources? I am afraid this is not what DS should do.
  5. Or more broadly, do you want to support the task of creating helm charts?
  6. How to retrieve the logs of a pod? How to retrieve the logs of multiple pods? If there are multiple containers in a pod, how to retrieve the logs of multiple containers?

If the issues are not adequately addressed, I am afraid I will vote -1 on this DSIP.

@fuchanghai
Copy link
Member

fuchanghai commented Aug 18, 2024

  1. For each type, we can set a strategy. The first strategy is to ignore if it exists, and the second strategy is to delete first and then add to meet various scenarios.
  2. Add two labels to the pod according to the strategy type. If it is an ignore if it exists strategy, use taskCode as the label. If it is a delete first and then add strategy, use taskInstanceId.
  3. Delete the pod according to the label
ObjectMeta.setLabel

  1. We can also use taskInstance to replace the user-defined pod name
ObjectMeta.setName

5.Perhaps this issue can be targeted at a single pod, without considering multiple pods. In fact, if there are multiple pods in a node, we can give multiple pods a label with the value of processInstanceId+taskInstanceId, and obtain multiple pods through processInstance and obtain logs separately.

@Gallardot cc @EricGao888 @Mighten @SbloodyS WDYT?

@SbloodyS
Copy link
Member

Totally agreed with @Gallardot

From my persional perspective, since DS is a scheduling system. The current k8s task is mainly used to replace the cron-job of k8s. And we have no plans to support k8s deployment scheduling management since this maintenance will involve a huge amount of work. So we need to reach a basic consensus.

@fuchanghai
Copy link
Member

fuchanghai commented Aug 19, 2024

This is indeed too big. At present, the most commonly used in our company are configMap and pod types. Deployments are only used when using flink. For SaaS type products.ConfigMap is usually initiated by users or when users modify their own configurations. In most cases, pod type is the most commonly used type. We can open an issue only for pods and first discuss how to complete type of pods.
@Mighten cc @Gallardot @SbloodyS

@fuchanghai
Copy link
Member

For the scenario of a pod with multiple containers, I think it is necessary to divide the logs by container. When querying the logs, the front end needs to pass the container name to check the logs of the specific container. The front end needs to make a table to switch containers to view the logs. This transformation is a bit much. I hope that this issue will only consider a single pod and a single container.

@Gallardot
Copy link
Member

This is indeed too big. At present, the most commonly used in our company are configMap and pod types. Deployments are only used when using flink. For SaaS type products.ConfigMap is usually initiated by users or when users modify their own configurations. In most cases, pod type is the most commonly used type. We can open an issue only for pods and first discuss how to complete type of pods.

@Mighten cc @Gallardot @SbloodyS

I’m sorry, but I don’t agree with this view. Pods are the most commonly used because they are the basic unit of service workload. But they are also the least used since only early versions of Kubernetes directly used pods. That’s why more advanced workload like Deployments and StatefulSets were introduced later. Managing the lifecycle of pods is an important task in Kubernetes, not just creating a pod.

@fuchanghai
Copy link
Member

From the current low-code functions of k8s Task, it is to put a pod in a job type task, which is not much different from a single pod task.

@qingwli
Copy link
Member

qingwli commented Sep 24, 2024

I agree with @Gallardot thinking. And for @fuchanghai said support pod level. Few questions:

    1. How to limit users just create pod jobs. And if the user wants to create like a deployment, we need to parse the user's yaml and check?
    1. I agree it's not just a pod creation. It's more like a pod managent. For now, If user start a spark job or other k8s job. We can add some limit for this pods. But user can defined yaml can avoid this policy, can occur lots of chain questions.
    1. Which scenario needs user defined a pods? We have a k8s pod now, If our k8s job can't support some function like specific configurations like environment variables and tolerations. We can enhance and support them.

Overall, vote -1 for this DSIP.

@davidzollo
Copy link
Contributor

My suggestion is that using a single pod with a single container, this scenario is suitable for rapid testing and development.

And to concern, my response are as follows,
1. How is the name of the Pod defined? How can different workflows in the same namespace ensure that Pod names do not duplicate?

Pod names must be unique within the same namespace in Kubernetes. DS can generate unique names programmatically using Kubernetes Java Client API by appending identifiers like workflow name, task ID, and timestamp or uuid, this is not difficult.

2. How is the Pod lifecycle managed? Will DS delete it after the task ends? How to ensure that DS can definitely delete it?
DS can control the creation and deletion of Pods. After task completion through API, DS can use the deleteNamespacedPod method to delete the pods, retry mechanisms or manual cleanup calls can be used.

3. How are Pods handled if the workflow execution strategy is parallel?
For parallel execution, DS can create multiple Pods simultaneously. Each Pod runs independently, managed by unique configurations and labels. Each task can create its own Pod, ensuring resource separation and independent execution.

4. If multiple Pods are created simultaneously, are these Pods related, or do they just run concurrently? Does it support Deployments and StatefulSets? Should DS manage it as a controller of Kubernetes resources?
Multiple Pods created for parallel tasks are independent of each other. DS manages these Pods individually using the Kubernetes API rather than as a controller. For more complex scenarios like continuously running services (Deployments) StatefulSets, it is recommended to let Kubernetes native controllers manage these resources instead of DS, DS focuses on task scheduling.

5. Does DS support tasks for creating Helm Charts?
DS doesn’t natively include task types for directly deploying Helm Charts.

6. How to retrieve the logs of a Pod? How to retrieve logs of multiple Pods? If a Pod contains multiple containers, how to retrieve the logs of multiple containers?
DS can retrieve logs by calling the API of K8s.

@SbloodyS
Copy link
Member

Closing for no plans to do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants