JupyterHub Deployments Using GitOps Tools (FluxCD/ArgoCD) #3396

jash2105 · 2024-04-12T14:58:07Z

Hello JupyterHub team,

I've been exploring the current documentation and setup processes for JupyterHub on Kubernetes, primarily managed through Helm. This setup works well for basic deployments, but I've noticed a potential gap for large-scale, enterprise-grade deployments.

Many enterprise data science and engineering teams might prefer integrating JupyterHub with existing GitOps workflows, typically managed via FluxCD or ArgoCD, rather than directly using Helm for every change. This approach leverages their existing CI/CD pipelines and enhances maintainability and scalability.

Given this, I propose expanding the documentation to include detailed guidance on integrating JupyterHub with FluxCD and ArgoCD. This enhancement will:

Provide step-by-step instructions on setting up JupyterHub using FluxCD/ArgoCD for resource and configuration reconciliation.
Include practical configurations for a multi-user, highly available JupyterHub environment suitable for enterprise-level deployment, especially those requiring substantial GPU resources.
Offer comprehensive debugging documentation to assist teams in quickly resolving issues.

I believe these additions will significantly streamline the setup process for large teams and institutions, reducing the overhead associated with integrating JupyterHub into large-scale infrastructure.

I am eager to contribute by drafting the documentation and configuration examples. Before proceeding, I'd like to gather feedback on this idea and any specific requirements or suggestions the community or maintainers might have.

Looking forward to your thoughts and hoping to contribute effectively to this amazing project!

welcome · 2024-04-12T14:58:09Z

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.

You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

jash2105 · 2024-04-13T03:02:35Z

Hey @consideRatio , can I work on this and submit a pr ? I think this would greatly benfit the community , Let me know what you think !

consideRatio · 2024-04-13T09:34:18Z

Hey @jash2105, thank you for investing your time in this project and JupyterHub ecosystem of open-source software!!

Provide step-by-step instructions on setting up JupyterHub using FluxCD/ArgoCD for resource and configuration reconciliation.

🎉 I think it would be great to provide docs to complement existing docs with details that enable readers to deploy the helm chart with FluxCD or ArgoCD in dedicated sections.

I suspect it makes sense to have separate pages for FluxCD and ArgoCD, but if they require very similar where they share more content than they differ they could live on the same page.

Note that we have some past discussions of relevance about ArgoCD, for example:

This helm chart makes use of lookup function in the chart's templates, but that requires template rendering to be done with interaction against the k8s api-server - but tools like ArgoCD may do it in isolation beforehand. This was clarified in Automatic secret generation triggers constant redeploy (ArgoCD) #2887 (comment), where adjustments like Automatic secret generation triggers constant redeploy (ArgoCD) #2887 (comment) could be needed.

I'm not sure where to put the docs, but maybe under Installing JupyterHub with ArgoCD under Setup JupyterHub, below:
. Alternatively, a section in the administration section about adjusting the deployment to be deployed with ArgoCD instead of helm perhaps?

Include practical configurations for a multi-user, highly available JupyterHub environment suitable for enterprise-level deployment, especially those requiring substantial GPU resources.

I'd appreciate if you focus on for example ArgoCD and/or FluxCD initially. The GPU topic is a complicated topic, so if documentation is to improve with regards to GPU things I'd like such contribution to be isolated and focused without coupling to other pieces. This makes review effort easier and that makes PRs get merged in general.

If there are GPU related notes specific to ArgoCD, I suggest considering those separately as well as a less complicated contribution to help deploy with ArgoCD without GPU is a valuable contribution by itself.

Offer comprehensive debugging documentation to assist teams in quickly resolving issues.

There are some general debugging docs. If there are specific ArgoCD debugging details, they can be part of an ArgoCD section - but otherwise I think we should try to build on the general debugging docs.

Btw if you write for example about ArgoCD, try to be aware about what ArgoCD is already documenting. The more we can link out to their docs to explain something, the easier the docs are to maintain long term as ArgoCD makes changes etc.

jash2105 · 2024-04-13T13:22:05Z

Absolutely, your overview is very thorough. Here’s my proposed timeline for the documentation process:

Initial Documentation: I plan to start with FluxCD, focusing initially on a straightforward installation guide that covers the basic setup without any custom configurations. This will include detailed steps on how to bootstrap a cluster using GitHub or GitLab with Flux, followed by a basic Helm chart installation. The goal is to establish a minimal viable setup with the necessary pods and services, along with some preliminary debugging steps.
Review and Iteration: Once the initial documentation is complete, I’ll submit it for review. Based on the feedback, I can make any necessary revisions.
Subsequent Documentation: Continuing from there, I'll create additional pull requests to gradually expand our documentation. This will include guides on customizing resources, integrating GPU support, and replicating the setup with Argo.

Does this sequence of steps fit well with our overall strategy? Please let me know if there are any adjustments you’d like me to consider or if there are specific areas you think we should prioritize.

manics · 2024-04-13T13:46:12Z

Argocd focuses on git-ops style deployments. What do you think about having the instructions, scripts and manifests in your own repository, and linking to them from the Z2JH docs? One challenge with having all docs in a single repo is it's not possible to automatically test them, it can be a pain for people to copy and paste code, and things can therefore easily get out of sync.

What might be particularly nice in a standalone repo is to have live manifests, and perhaps you could even deploy your own Argocd cluster in GitHub CI, and deploy the Z2JH config?

consideRatio · 2024-04-13T13:53:05Z

Thank you @jash2105 for planning this so clearly!

I didn't expect the "boottrap" part of "bootstrap a cluster using GitHub or GitLab with Flux" - but I may misunderstood you. I expected something like "how to deploy of the jupyterhub chart with Flux" under the assumption flux is already used to deploy things into an existing cluster. Maybe a github repository is required to be setup for this, but not a cluster using Flux?

I'm trying to ensure the scope of what is to be documented is sufficiently related to deploying the jupyterhub chart, because anything introduced in this project - even if its documentation - will require long term attention in its maintenance. If we document too much beyond whats relevant to deploy the jupyterhub chart, the project takes on too much long term maintenance burden.

I realize I can't guide this so clearly because I don't know Flux or Argo, but there should be a line drawn somewhere to focus on how to deploy this chart with Flux/Argo, as compared to how to work with Flux/Argo in general.

jash2105 · 2024-04-13T14:00:04Z

@consideRatio, I agree with your assessment. Starting with bootstrapping a cluster might indeed be excessive and could shift the focus too heavily onto Flux or similar CD tools. Instead, I propose initiating our efforts by deploying JupyterHub using Flux. This will be covered in my first PR. Subsequent updates can introduce enhancements such as custom deployment configurations, GPU resources, and eventually ArgoCD integration. Since I haven't set up ArgoCD on my cluster yet, we can prioritize Flux in the initial phase and then explore ArgoCD later on. Does this approach sound good to you? If so, you can expect a PR from me within the next few days or the coming week!

And to answer your question , yes, we are not setting up a cluster; we will just be setting up a git repository where we store all our manifests. And if we make any changes , the cluster will automatically recognize that and make those changes to the existing deployment.

jash2105 · 2024-04-13T14:05:32Z

@manics, are you suggesting that the documentation could potentially cause issues? I wouldn't expect that to be the case. Also, I agree with you about storing the plain manifests in a repository, whether it's mine or another. These manifests could serve as useful references. Moreover, having custom documentation alongside referring directly to the complete manifest could streamline the process, similar to how we handle the documentation and values.yaml file when deploying with Helm.

manics · 2024-04-13T14:11:54Z

I don't think it'll cause issues, it's more that I think from a maintainability perspective it may be easier to have a separate repo with docs, manifests, and potentially CI workflows combined.

I think it could also be easier for readers too, it's a lot easier to tightly integrate manifests and docs in their own repo since it won't be constrained by the existing docs layout. If someone wants to reproduce your steps they could just clone the repo, this isn't so practical if you have to clone the whole Z2JH repo and search through subdirectories.

jash2105 · 2024-04-13T20:33:46Z

@manics, I see your point about the issue requiring a fundamental restructuring of the repository. Given this, I propose continuing with the current PR. As we develop the GitOps documentation, if we formulate a plan by then, we could consider a comprehensive overhaul of the existing repositories. Does this sound like a viable approach to you?

jash2105 · 2024-04-28T07:33:51Z

#3407
@consideRatio @manics , I worked on a basic install config. Expect more prs incoming with other gitops tools and more configs in the following time to come. Thanks!

DeepCowProductions · 2024-09-02T13:17:37Z

In case someone needs inspiration for an argocd App definition (should work out of the box):

---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: jupterhub # name of the argocd object
  namespace: argocd # namespace where this manifest lives, not the app it self!
spec:
  project: jupyter # argocd project
  sources:
    # official  helm chart source, values are self hosted
    - repoURL: https://jupyterhub.github.io/helm-chart/
      targetRevision: 4.0.0-0.dev.git.6717.h61ab1167  # helm chart version
      chart: jupyterhub
      helm:
        valueFiles: # supply values from some self hosted repo
        - $values/jupyterhub/helm/values.yaml # path inside self hosted repo
    # self referencing repo to inject values.yaml
    - repoURL: 'https://github.com/org/repo.git'
      targetRevision: main # git branch
      ref: values 
    # extra yamls for additional ressources such as an ingress definition
    - repoURL: 'https://github.com/org/repo.git'
      path: jupyterhub/k8s # path inside repo for other resources
      targetRevision: main # git branch
      directory: # all yaml files inside "jupyterhub/k8s"
        recurse: true
        include: "{*.yaml,*.yml}"
  destination:
    server: 'https://kubernetes.default.svc' # kubernets cluster
    namespace: jupyterhub # deployment namespace for jupyterhub
  syncPolicy:
    syncOptions:
      - CreateNamespace=true # create destination kubernetes namespace
      - ServerSideApply=true # fix for meta data annotation being too long
    automated:
      selfHeal: true # auto sync and repair
      prune: true    # delete ressources after deletion of this manifest 
---

BenBo17 · 2025-02-10T08:26:29Z

@DeepCowProductions Thank you for your comment.
My JupterHub deployments were in OutOfSync state all the time because the values in the hub secret (e.g. auth_token) are newly generated on each helm template executed by ArgoCD. I couldn't figure out how to fix that.

I did use the automated sync policy but that wasn't sufficient as ArgoCD interprets the new generation of e.g. auth tokens as changes in the destination and not in the source.

So spec.syncPolicy.automated.selfHeal=true did the trick.

seanturner026 · 2025-02-10T14:20:30Z

@BenBo17 have a look at spec.ignoreDifferences. You should be able to configure your ArgoCD Application to ignore those differences

https://argo-cd.readthedocs.io/en/stable/user-guide/diffing/#application-level-configuration

BenBo17 · 2025-02-10T14:41:44Z

@seanturner026 I tried that by ignoring the following secret keys:

hub.config.ConfigurableHTTPProxy.auth_token
hub.config.JupyterHub.cookie_secret
hub.config.CryptKeeper.keys

But I also would have to ignore the checksum labels of the deployment. With that, Jupyter pods wouldn‘t restart when the secret gets changed. Please correct me if I‘m wrong here.

I think the selfHeal option might be the best solution at the moment, until ArgoCD is able to lookup existing secrets when performing helm template.

seanturner026 · 2025-02-10T15:46:59Z

We're using something like this. The syntax is quite odd with the ~.

spec:
  # omitted for brevity
  ignoreDifferences:
    - group: apps
      kind: Deployment
      jsonPointers:
        # plus some other values
        - /spec/template/metadata/annotations/checksum~1auth-token

I think this is meant to work in concert with selfHeal such that the helm template isn't constantly causing ArgoCD to cycle Pods (as ArgoCD no longer cares about a difference in the auth-token

jash2105 added the enhancement label Apr 12, 2024

consideRatio added documentation and removed enhancement labels Apr 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JupyterHub Deployments Using GitOps Tools (FluxCD/ArgoCD) #3396

JupyterHub Deployments Using GitOps Tools (FluxCD/ArgoCD) #3396

jash2105 commented Apr 12, 2024

welcome bot commented Apr 12, 2024

jash2105 commented Apr 13, 2024

consideRatio commented Apr 13, 2024

jash2105 commented Apr 13, 2024

manics commented Apr 13, 2024

consideRatio commented Apr 13, 2024

jash2105 commented Apr 13, 2024 •

edited

Loading

jash2105 commented Apr 13, 2024

manics commented Apr 13, 2024

jash2105 commented Apr 13, 2024

jash2105 commented Apr 28, 2024

DeepCowProductions commented Sep 2, 2024 •

edited

Loading

BenBo17 commented Feb 10, 2025 •

edited

Loading

seanturner026 commented Feb 10, 2025

BenBo17 commented Feb 10, 2025

seanturner026 commented Feb 10, 2025

JupyterHub Deployments Using GitOps Tools (FluxCD/ArgoCD) #3396

JupyterHub Deployments Using GitOps Tools (FluxCD/ArgoCD) #3396

Comments

jash2105 commented Apr 12, 2024

welcome bot commented Apr 12, 2024

jash2105 commented Apr 13, 2024

consideRatio commented Apr 13, 2024

jash2105 commented Apr 13, 2024

manics commented Apr 13, 2024

consideRatio commented Apr 13, 2024

jash2105 commented Apr 13, 2024 • edited Loading

jash2105 commented Apr 13, 2024

manics commented Apr 13, 2024

jash2105 commented Apr 13, 2024

jash2105 commented Apr 28, 2024

DeepCowProductions commented Sep 2, 2024 • edited Loading

BenBo17 commented Feb 10, 2025 • edited Loading

seanturner026 commented Feb 10, 2025

BenBo17 commented Feb 10, 2025

seanturner026 commented Feb 10, 2025

jash2105 commented Apr 13, 2024 •

edited

Loading

DeepCowProductions commented Sep 2, 2024 •

edited

Loading

BenBo17 commented Feb 10, 2025 •

edited

Loading