For Turkish documentation, see: README.tr.md
This repository provides a production-grade, scalable, and observable Kafka platform using modern Platform Engineering practices. Infrastructure automation, secure Kafka cluster setup, observability, REST API management, and distributed Kafka Connect operations are fully automated end-to-end. All components are designed for reliability, maintainability, and self-service management.
You are expected to design and manage a Kafka infrastructure focused on high availability, observability, automation, and reproducibility for production environments. All operations, tests, and scenarios are performed on a single central Kafka cluster.
Duration: 7 days
Scenario:
- A single Kafka cluster is set up as the foundation for the entire platform.
- All code must be production quality, well-documented, and version-controlled.
- Best practices for infrastructure automation, containerization, and monitoring are applied.
The following diagram shows all components of the Trendyol Kafka Platform architecture:
graph TB
subgraph AUTO["Automation"]
GHA[GitHub Actions]
TF[Terraform]
ANS[Ansible]
end
subgraph CLUSTER["Kafka Cluster"]
subgraph BROKERS["Brokers (4 nodes)"]
B1[Broker 1 - AZ1]
B2[Broker 2 - AZ1]
B3[Broker 3 - AZ2]
B4[Broker 4 - AZ2]
end
subgraph CONTROLLERS["Controllers (3 nodes)"]
C1[Controller 1 - AZ1]
C2[Controller 2 - AZ2]
C3[Controller 3 - AZ3]
end
SSL[SSL/TLS + SASL/SCRAM]
SSL -.secure.-> BROKERS
SSL -.secure.-> CONTROLLERS
BROKERS <-.cluster.-> CONTROLLERS
end
subgraph SERVICES["Services"]
CN1[Kafka Connect 1<br/>AZ1]
CN2[Kafka Connect 2<br/>AZ1]
API[REST API<br/>JWT Protected]
MON[Monitoring<br/>Prometheus + Grafana]
end
GHA --> TF
GHA --> ANS
TF --> CLUSTER
TF --> SERVICES
ANS --> CLUSTER
ANS --> SERVICES
CN1 --> SSL --> BROKERS
CN2 --> SSL
API --> SSL
MON -.metrics.-> CLUSTER
style BROKERS fill:#E3F2FD,stroke:#1976D2,stroke-width:3px
style CONTROLLERS fill:#F3E5F5,stroke:#7B1FA2,stroke-width:3px
style B1 fill:#4A90E2,stroke:#333,stroke-width:2px
style B2 fill:#4A90E2,stroke:#333,stroke-width:2px
style B3 fill:#4A90E2,stroke:#333,stroke-width:2px
style B4 fill:#4A90E2,stroke:#333,stroke-width:2px
style C1 fill:#7B68EE,stroke:#333,stroke-width:2px
style C2 fill:#7B68EE,stroke:#333,stroke-width:2px
style C3 fill:#7B68EE,stroke:#333,stroke-width:2px
style CN1 fill:#FF8C42,stroke:#333,stroke-width:2px
style CN2 fill:#FF8C42,stroke:#333,stroke-width:2px
style API fill:#F39C12,stroke:#333,stroke-width:2px
style MON fill:#50C878,stroke:#333,stroke-width:2px
style SSL fill:#C0392B,stroke:#333,stroke-width:3px
style TF fill:#7B42BC,stroke:#333,stroke-width:2px
style ANS fill:#E74C3C,stroke:#333,stroke-width:2px
style GHA fill:#2C3E50,stroke:#333,stroke-width:2px
- Modular, reusable, and scalable AWS infrastructure.
- Modules: network, compute, security.
- Resources:
- Kafka Broker: 4 nodes (2 AZ1, 2 AZ2)
- Kafka Controller: 3 nodes (1 AZ1, 1 AZ2, 1 AZ3)
- Kafka Connect Cluster: 2 nodes (both in AZ1)
- Observability Node: 1 node (AZ1)
- Automated Kafka broker and controller setup with Confluent Platform Ansible Collection.
- Production-grade security: SSL/TLS encryption, SASL/SCRAM authentication.
- Rack awareness (provides high availability and resilience against data loss by distributing nodes across physically or logically different data centers/availability zones) and multi-AZ distribution.
- JMX metrics enabled for monitoring.
- Prometheus: Collects metrics from all Kafka and Connect nodes (JMX and node_exporter).
- Alertmanager: Centralized alert management for infrastructure and Kafka events.
- Grafana: Ready dashboards for Kafka Broker, Controller, and Connect.
- Node Exporter: Installed on all nodes for system metrics.
- All observability components are automatically installed and managed with Ansible.
- FastAPI-based REST API using AdminClient for Kafka management.
- Endpoints for broker, topic, consumer group management, and topic configuration.
- All API operations are protected with JWT.
- Dockerized, runs on the Observability node.
- Distributed Kafka Connect cluster setup with Docker Compose.
- Secure integration with the main Kafka cluster.
- HTTP Source Connector fetches data from the REST API and streams it to Kafka topics.
- Full connector lifecycle management via Kafka Connect REST API.
- Single Cluster Architecture: All modules operate on the same Kafka cluster.
- Infrastructure as Code: All infrastructure is managed with Terraform.
- Automated Deployment: All setup and configuration steps are automated with Ansible and GitHub Actions.
- Production-Grade Security: End-to-end encryption, strong authentication, and best practices.
- Comprehensive Observability: Metrics, dashboards, and alerts for all critical components.
- Self-Service Platform: REST API and automation scripts for easy management and extensibility.
- terraform/ — Infrastructure code and modules
- ansible_kafka_cluster/ — Ansible playbooks and inventory scripts for Kafka cluster
- kafka_connect/ — Kafka Connect cluster setup and connector management scripts
- kafka_rest_api/ — FastAPI-based REST API for Kafka management
- observability/ — Prometheus, Alertmanager, Grafana, and monitoring automation
- Configure GitHub Secrets:
- To ensure all automation (CI/CD, Terraform, Ansible, etc.) works seamlessly, add all required secrets and credentials to the GitHub Secrets section of your repository.
- For example:
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY, any required SSH private keys, DockerHub tokens, etc. - To add secrets: Go to your GitHub repo > Settings > Secrets and variables > Actions > New repository secret.
- Missing or incorrect secrets will cause automated deployment and teardown operations to fail.

- Provision Infrastructure:
- Set up your AWS credentials and run the modules in terraform/.
- Deploy Kafka Cluster:
- Use the playbooks in ansible_kafka_cluster/ to set up brokers and controllers.
- Set Up Observability:
- Deploy Prometheus, Alertmanager, and Grafana using scripts in observability/.
- Deploy REST API:
- Run the FastAPI service in kafka_rest_api/ with Docker.
- Deploy Kafka Connect:
- Start the Connect cluster and manage connectors with Docker Compose in kafka_connect/.
- To deploy all infrastructure and application components with a single command, use deploy-all.yaml (or the relevant deploy-all.yaml in your project). This file provides fully automated deployment with no manual changes required. All modules are brought up in order and according to their dependencies.

- To safely shut down/delete the cluster and all resources, use destroy.yaml or the relevant destroy workflow. This process safely terminates the Kafka cluster and all related infrastructure.

- https://github.com/AhmetFurkanDEMIR/Trendyol-Data-Streaming-Case-Study/actions
Note: You do not need to make any manual changes to files or configurations during these operations. The entire process is automatic and idempotent.
Amazon CloudWatch is a centralized observability service used to monitor AWS resources, collect logs, generate alarms, and trigger automated actions.
In this project, Amazon CloudWatch was used to centrally collect system and Kafka logs located under the /var/log directory, store them securely, and make them available for aggregated search and analysis.
This process was implemented by automatically installing the CloudWatch Agent during the instance initialization phase in the Terraform step. The agent was configured via a configuration file to monitor logs under the /var/log directory.
-
Each module has its own README file:
- Terraform Module README.md
- Ansible Kafka Cluster Module README.md
- Kafka Connect Module README.md
- Kafka REST API Module README.md
- Observability Module README.md
containing setup, configuration, and troubleshooting instructions.
-
All design decisions, operational procedures, and assumptions are explained in the relevant directories.


