The primary goal of this project is to build a Chaos Engineering environment around the LitmusChaos platform. We try hard to provide a smooth development process including GitOps based deployment. Hence, we are leveraging flux, terraform, nix (using devenv as a nix flake) and kind (maybe k3s soon). nix is no requirement, but strongly recommended as it should automatically provide you with the other tools - you should not have to worry about how to install things with your package manager.
If you just want to kick the Chaos the tires quickly, or if you want to build a long lasting Chaos environment : This might be a place to start.
Experimentation is a natural element of Chaos Engineering. However, it should be just as natural in Software Development in general. That is why you might encounter bits (such as Knative) with no strong Chaos Engineering relationship in this repo. Those are meant to be optional.
The default localhost cluster environment has very few requirements. It should work on many types of clusters. However, it is optimized to work with just enough resources to run the whole Chaos Stack and the resilient Sock Shop. It aims to make things easiliy accessible.
Various things I built upon had minor issues (mostly because there where outdated). At this time, the "fixes" are here because I wanted to move on quickly. Would be happy to contribute back.
This repo is derived from flux-conductr. Look at that, if you are after a similar experience, focused on flux specifically.
- LitmusChaos platform
- This repo acts as a ChaosHub
- We serve the Sock Shop Microservices Demo Application as a scenario (defaulting to
containerdexperiments) - Tightly integrated Prometheus Stack including Grafana provisioned for the Sock Shop Appliation
- Loki
- Istio Eventing/Serving/Tracing (zipkin)
- Cilium
- Knative
- Locust load testing (supporting the UI)
- Portal API usage examples
- Support for deployment in proxy/custom CA environments
- Flux-/Terraform Deployment
- Nix Dev Experience
- Doom (Opt-In/Next Gen)
Even though, we am trying to cover most things declaratively, some random bits may be covered by make targets. Simply calling the default target:
makeshould output help hinting at what is covered.
You may also want to disable github actions to start.
Optional: Generate ssh deployent keys and add public key to your repo
make gen-keys
make gh-add-deploy-keyThere is a terraform + kind based bootstrap in tf.
cp sample.tfvars terraform.tfvars
# Set proper values in terraform.tfvars
make applyThis should spin up the limus server. Once it is up
make open-appshould open it in your browser.
Alternatively, you can bootstrap or even upgrade an existing cluster (be sure to have current kubecontext set properly). Also, make sure flux --version shows desired version.
./scripts/flux-bootstrap.shWe aim at supporting environments requiring a proxy (including custom CA certificate chains) to access external services.
A proxy has to be introduced in various places. Many systems (including kind) support configuration via environment variables, namely HTTPS_PROXY, HTTP_PROXY and NO_PROXY.
For flux, we ship a local-proxy cluster adding that environment. Set this cluster in tf/terraform.tfvars to try it.
For litmus, we only ship a runtime patch at the moment.
Regarding custom certificates, we simply overlay the compiled file in the containers using a ConfigMap. By default, we assume we can generate it on the host executing the initial deployment:
make -n recreate-ca-resmake -n patch-litmus-ca-certs patch-litmus-proxy-envshould give you an idea how we patch a system.
The terraform module provides a mechanism to patch the coredns ConfigMap. This may come in handy when working with a proxy.
I use mitmproxy locally to try things out.
The local cluster uses metallb to provide a loadbalancer. It binds multiple services to a single IP using metallb.universe.tf/allow-shared-ip.
The following ports are used:
9091: Litmus Portal9002: Litmus Server (for remote agents)3000: Grafana9411: Zipkin (Mesh/Tracing)20001: Kiali (Mesh/Istio)
Acting as a ChaosHub, this repo serves the sock-shop scenario/workflow
Grafana : admin / prom-operator.
Litmus : admin / litmus.
- There are TODO tags in the code
- Leverage
kustomizewith remote repos/resources in workflow (litmuschaos/k8s:latestdoes not yet havegit) - Leverage Istio for failure injection?
- This repo can act as a ChaosHub - add it during setup
- Add first class support for
mitmproxy(ship deployment) Add first class support for remote agent?Try GitOps scenarios?- Manifests Naming
- Fix annoying terraform plan
yaml_incluster - Add knative-serving/eventing/dns (using
nip.io?) - Add mongodb/prometheus convenience (e.g. auth) targets to
Makefile - Test drive 3.0-beta
disk-filldoes not yet play with containerd?- Catchup
cronscheduled sock-shop workflow - Introduce PrometheusRule Sock-Shop alerts
- Recover chaos "enabled" in Sock Shop Dashboard
Introduce istio based tracing- Introduce deas/calendar_monkey? ;)
- Use NodePort instead of LoadBalancer locally (just like we do it in
flux-conductr)
- Some experiments from
litmus-goappear to rely on/var/run/docker.sockwhich does not exist with containerd based environments (see) - Knative deployment straight from github deployment not possible
- knative challenging, should probably merge
kustomize.toolkit.fluxcd.io/substitute: disabledviakustomize. Other things need tweaks to upstream yaml to play with GitOps "... configured" / Managed fields) - Istio Ingress appears to have an image pulling issue, so it takes a while to come up
- litmus
helmrelease removal should remove default agent?
- https://docs.cilium.io/en/stable/network/istio/
- https://knative.dev/docs/install/installing-istio/#installing-istio
- Deploy knative straight from github? like flux-monitoring.yaml?
- Running Knative with Istio in a Kind Cluster (old!)
- Install Knative using quickstart