Apache Flink Streaming Pipelines
To run this repo, the following components will need to be installed:
- Docker (required)
- Docker compose (required)
- Make (recommended) -- see below
-
On most Linux distributions and macOS,
make
is typically pre-installed by default. To check ifmake
is installed on your system, you can run themake --version
command in your terminal or command prompt. If it's installed, it will display the version information. -
Otherwise, you can try following the instructions below, or you can just copy+paste the commands from the
Makefile
into your terminal or command prompt and run manually.# On Ubuntu or Debian: sudo apt-get update sudo apt-get install build-essential # On CentOS or Fedora: sudo dnf install make # On macOS: xcode-select --install # On windows: choco install make # uses Chocolatey, https://chocolatey.org/install
-
Make sure you're in the pyflick
folder:
cd 06-streaming/pyflink
-
Build the Docker image and deploy the services in the
docker-compose.yml
file, including the PostgreSQL database and Flink cluster. This will (should) also create the sink table,processed_events
, where Flink will write the Kafka messages to.make up #// if you dont have make, you can run: # docker compose up --build --remove-orphans -d
⭐ Wait until the Flink UI is running at http://localhost:8081/ before proceeding to the next step. Note the first time you build the Docker image it can take anywhere from 5 to 30 minutes. Future builds should only take a few second, assuming you haven't deleted the image since.
ℹ️ After the image is built, Docker will automatically start up the job manager and task manager services. This will take a minute or so. Check the container logs in Docker desktop and when you see the line below, you know you're good to move onto the next step.
taskmanager Successful registration at resource manager akka.tcp://flink@jobmanager:6123/user/rpc/resourcemanager_* under registration id <id_number>
-
Now that the Flink cluster is up and running, it's time to finally run the PyFlink job! 😄
make job #// if you dont have make, you can run: # docker-compose exec jobmanager ./bin/flink run -py /opt/job/start_job.py -d
After about a minute, you should see a prompt that the job's been submitted (e.g.,
Job has been submitted with JobID <job_id_number>
). Now go back to the Flink UI to see the job running! 🎉 -
When you're done, you can stop and/or clean up the Docker resources by running the commands below.
make stop # to stop running services in docker compose make down # to stop and remove docker compose services make clean # to remove the docker container and dangling images
❕ Note the
/var/lib/postgresql/data
directory inside the PostgreSQL container is mounted to the./postgres-data
directory on your local machine. This means the data will persist across container restarts or removals, so even if you stop/remove the container, you won't lose any data written within the container.
ℹ️ To see all the make commands that're available and what they do, run:
make help
As of the time of writing this, the available commands are:
Usage:
make <target>
Targets:
help Show help with `make help`
db-init Builds and runs the PostgreSQL database service
build Builds the Flink base image with pyFlink and connectors installed
up Builds the base Docker image and starts Flink cluster
down Shuts down the Flink cluster
job Submit the Flink job
stop Stops all services in Docker compose
start Starts all services in Docker compose
clean Stops and removes the Docker container as well as images with tag `<none>`
psql Runs psql to query containerized postgreSQL database in CLI
postgres-die-mac Removes mounted postgres data dir on local machine (mac users) and in Docker
postgres-die-pc Removes mounted postgres data dir on local machine (PC users) and in Docker