Skip to content

Latest commit

 

History

History
43 lines (31 loc) · 2.39 KB

README.md

File metadata and controls

43 lines (31 loc) · 2.39 KB

Analytics Ingest Lambda

This repo the code for an AWS Lambda function that receives JSON events from an HTTP POST request and sends them to a Kafka topic, to be eventually ingested into a data lake. The function is written in Rust, using the AWS Lambda Rust Runtime and is meant to be deployed with the provided.al2023 OS-only runtime.

This was released as part of the "Real-time data lakes with the LOAD stack" blog post, which shows how to use this as a component, along with Arroyo and DuckDB to build a simple and cost-effective near-real-time data lake.

Building

Building this lambda is somewhat complex, as it uses the librdkafka C library under the hood. We've provided a Dockerfile to make the build process easier, particularly on non-Linux systems.

To build the zip file, you can use the provided build script:

./build.sh

which will produce a lambda.zip file in the current directory.

Configuration

This function relies on several pieces of configuration, using AWS SSM (for secrets) and environment variables (for everything else):

Configuration Option Type Description
KAFKA_BROKERS Required Specifies the Kafka brokers to connect to, provided as a comma-separated list of broker addresses.
KAFKA_TOPIC Required Defines the Kafka topic where messages will be published.
KAFKA_USERNAME_PARAM Optional Specifies the SSM parameter name containing the Kafka username for SASL authentication.
KAFKA_PASSWORD_PARAM Optional Specifies the SSM parameter name containing the Kafka password for SASL authentication.
SECURITY_PROTOCOL Optional Specifies the security.protocol used to construct the Kafka producer
SASL_MECHANISMS Optional Specifies the sasl.mechanisms used to construct the Kafka producer

Deploying

For a full guide to deploying the lambda, see the instructions here