Skip to content

Kafka's Storage‐Compute Separation Architecture: Offloading Storage to Ceph

KamiWan edited this page May 21, 2024 · 6 revisions

Background

Redesigning streaming systems based on cloud object storage, such as S3, has become an industry consensus. In recent years, numerous innovations based on object storage have emerged within the Apache Kafka ecosystem, including AutoMQ's shared storage architecture based on EBS and S3, Confluent's tiered storage, Warpstream's direct write to S3 architecture, and Redpanda's shadow indexing. These storage architecture not only significantly reduce costs by migrating data to distributed object storage like S3 but also simplify the architecture of Kafka streaming systems and enhance their elasticity.

Our project, AutoMQ, utilizes a shared storage architecture based on S3 and EBS, which has proven to offer exceptional cost-effectiveness and elasticity in cloud environments. In private data centers, this innovative storage architecture can also create Kafka streaming systems with low latency, low cost, high throughput, and ultimate elasticiy. Ceph can serve not only as low-latency block storage but also as a cost-effective object storage service. If you have deployed Ceph in your environment, this tutorial will guide you on how to build a Kafka streaming system in your private data center that offloads storage to Ceph, achieving an optimal balance of latency, cost, and resilience.

Tips: AutoMQ is a cloud-native fork of Kafka that reinvents Kafka's storage layer with a shared storage architecture. Therefore, you can regard AutoMQ simply as an enhanced Kafka streaming system.

image

AutoMQ Runs on the Ceph Storage System

AutoMQ utilizes EBS and S3 for storage, while CEPH supports both POSIX and S3 access protocols, making it an ideal storage backend for AutoMQ. Below is a guide for deploying AutoMQ on CEPH.

Prerequisites

Configuring WAL

  1. For guidance on mounting raw devices on Linux hosts as per Ceph official documentation, refer to: https://docs.ceph.com/en/latest/rbd/

  2. Configure the raw device path to /dev/vdb.

  3. AutoMQ uses a raw device to store WAL data at a specified path. This can be configured using the startup parameter --override s3.wal.path=/dev/vdb.

Configure the S3URL

Create a User for CEPH

radosgw-admin user create --uid="automq" --display-name="automq"

By default, users are created with full permissions required for AutoMQ. For reduced permissions, consult the CEPH official documentation for customized settings. Here are the results after executing the commands mentioned above:


{
    "user_id": "automq",
    "display_name": "automq",
    "email": "",
    "suspended": 0,
    "max_buckets": 1000,
    "subusers": [],
    "keys": [
        {
            "user": "automq",
            "access_key": "X1J0E1EC3KZMQUZCVHED",
            "secret_key": "Hihmu8nIDN1F7wshByig0dwQ235a0WAeUvAEiWSD"
        }
    ],
    "swift_keys": [],
    "caps": [],
    "op_mask": "read, write, delete",
    "default_placement": "",
    "default_storage_class": "",
    "placement_tags": [],
    "bucket_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "user_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "temp_url_keys": [],
    "type": "rgw",
    "mfa_ids": []
}

Create a Bucket for CEPH

  1. Set environment variables to configure the Access Key and Secret Key required for AWS CLI.
export AWS_ACCESS_KEY_ID=X1J0E1EC3KZMQUZCVHED
export AWS_SECRET_ACCESS_KEY=Hihmu8nIDN1F7wshByig0dwQ235a0WAeUvAEiWSD
  1. Use the AWS CLI to create an S3 bucket.
aws s3api create-bucket --bucket automq-data --endpoint=http://127.0.0.1:80

Deploy AutoMQ

Below are the essential parameters needed to generate an S3 URL:

Parameter Name
Default Value in This Example
Description
--s3-access-key
X1J0E1EC3KZMQUZCVHED
After creating a Ceph user, remember to replace it according to the actual situation
--s3-secret-key
Hihmu8nIDN1F7wshByig0dwQ235a0WAeUvAEiWSD
After creating a Ceph user, remember to replace it according to the actual situation
--s3-region
us-west-2
This parameter is ineffective in Ceph, it can be set to any value, such as us-west-2
--s3-endpoint
http://127.0.0.1:80
This parameter is the address served by Ceph's S3-compatible component RGW. If there are multiple machines, it is recommended to use a load balancer (SLB) to consolidate into one IP address.
--s3-data-bucket
automq-data
-
--s3-ops-bucket
automq-ops
-

Having set up WAL and S3URL, you can now move forward with deploying AutoMQ. Please follow the detailed guidelines provided on Cluster Deployment on Linux▸.