Kafka cluster setup

Download the most recent kafka distribution (0.8.8.2)
Extract the tar file somewhere convenient :
- tar zxf kafka_2.10-0.8.2.0.tgz
Step into directory 'kafka_2.10-0.8.2.0'
Start a zookeeper instance :
- bin/zookeeper-server-start.sh config/zookeeper.properties
Open a new terminal and start the first kafka broker in the cluster :
- bin/kafka-server-start.sh config/server.properties
Open a new terminal window and make a copy of the configuration file for the second kafka broker in the cluster :
- cp config/server.properties config/server-1.properties
Adjust the configuration for the second broker to avoid brokerId, portNumber and logfile clashes.
- Change 'broker.id' to 1, 'port' to 9093, 'log.dir' to '/tmp/kafka-logs-1' in config/server-1.properties
Start the second kafka broker in the cluster:
- bin/kafka-server-start.sh config/server-1.properties
Open a new terminal window and make a copy of the configuration file for the second kafka broker in the cluster
- cp config/server.properties config/server-2.properties
Adjust the configuration for the third broker to avoid brokerId, portNumber and logfile clashes.
- Change 'broker.id' to 2, 'port' to 9094, 'log.dir' to '/tmp/kafka-logs-2' in config/server-2.properties
Start the third kafka broker in the cluster:
- bin/kafka-server-start.sh config/server-2.properties
Open a new terminal window and create a new 'github' topic, that will be used by producers and consumers of github events:
- bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic github

Verify that the topic was created succesfully

bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic github

  Topic:github	PartitionCount:1	ReplicationFactor:3 Configs:
  Topic: github	Partition: 0	Leader: 1 Replicas: 1,2,0	Isr: 1,2,0

We should now have a kafka cluster with three brokers that looks something like this:

            /  broker-0
  zookeeper -  broker-1
            \  broker-2

Producing Kafka events

Checkout the kafka-feeder repository from github:
- git clone https://github.com/daanhoogenboezem/kafka-feeder
We'll be working with githubarchive.org data. Create a directory for the data and step into it.
- mkdir ~/githubdata && cd ~/githubdata
Download a dataset from githubarchive.org that represents 1 weeks worth of events on all public repositories:
- wget http://data.githubarchive.org/2015-01-0{1..7}-{0..23}.json.gz
We'll be receiving a file gunzipped file per hour of public github activity, meaning 168 files. Extract them:
- ls -1 | xargs gunzip
Step into the kafka-feeder project directory and start sbt
Run the KafkaFeeder, which will start publishing (producing) events to the 'github' topic:
- run "localhost:9092,localhost:9093,localhost:9094" "github" "/path/to/githubdata" "10000"

You should be seeing output along the lines of :

  Processing file '2015-01-01-0.json', pushing events to topic 'github'
  Processed file 2015-01-01-0.json, sleeping for 10000 milliseconds
  ...

Consume event stream

Step into the kafka directory
Consume the stream of events being pushed to the 'github' topic:
- bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic github --from-beginning

Kafka management

Make sure that Kafka only binds to localhost
- Uncomment '#host.name=localhost' in config/server-*.properties
Enable the deletion of topics by adding the following entry to config/server-*.properties
- delete.topic.enable=true
Deleting a topic from kafka
- bin/kafka-topics.sh --delete --zookeeper localhost:2181 --topic topic-name

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src/main/scala/feeder		src/main/scala/feeder
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kafka cluster setup

Producing Kafka events

Consume event stream

Kafka management

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Kafka cluster setup

Producing Kafka events

Consume event stream

Kafka management

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages