This is the code repository for Apache Airflow Best Practices, published by Packt.
A practical guide to orchestrating data workflow with Apache Airflow
With practical approach and detailed examples, this book covers newest features of Apache Airflow 2.x and it's potential for workflow orchestration, operational best practices, and data engineering
This book covers the following exciting features:
- Explore the new features and improvements in Apache Airflow 2.0
- Design and build data pipelines using DAGs
- Implement ETL pipelines, ML workflows, and other advanced use cases
- Develop and deploy custom plugins and UI extensions
- Deploy and manage Apache Airflow in cloud environments such as AWS, GCP, and Azure
- Describe a path for the scaling of your environment over time
- Apply best practices for monitoring and maintaining Airflow
If you feel this book is for you, get your copy today!
All of the code is organized into folders. For example, chapter-04.
The code will look like the following:
class MetricsPlugin(AirflowPlugin):
"""Defining the plugin class"""
name = "Metrics Dashboard Plugin"
flask_blueprints = [metrics_blueprint]
appbuilder_views = [{
"name": "Dashboard", "category": "Metrics",
"view": MetricsDashboardView()
}]
Following is what you need for this book: This book is for data engineers, developers, IT professionals, and data scientists who want to optimize workflow orchestration with Apache Airflow. It's perfect for those who recognize Airflow’s potential and want to avoid common implementation pitfalls. Whether you’re new to data, an experienced professional, or a manager seeking insights, this guide will support you. A functional understanding of Python, some business experience, and basic DevOps skills are helpful. While prior experience with Airflow is not required, it is beneficial.
The code sources and examples in this book were primarily developed with the assumption that you would have access to Docker and Docker Compose. We also make some assumptions that you have a passing familiarity with Python, Kubernetes, and Docker.
With the following software and hardware list you can run all code files present in the book.
Software required | OS required |
---|---|
Airflow 2.0+ | Windows, macOS, or Linux |
Python 3.9+ | Windows, macOS, or Linux |
Docker | Windows, macOS, or Linux |
Postgres | Windows, macOS, or Linux |
We used the angreal to provide development environments and interactions for consumption. To install pip install angreal
and then from any folder the following commands will be available for execution as a sub command to angreal.
(e.g. if you wish to run the demo environment just use angreal demo
)
demo commands for controlling the demo environment
dev-setup setup a development environment
help Print this message or the help of the given subcommand(s)
test commands for executing tests
Dylan Intorf is a solutions architect and data engineer with a BS from Arizona State University in Computer Science. He has 10+ years of experience in the software and data engineering space, delivering custom tailored solutions to Tech, Financial, and Insurance industries.
Dylan Storey has a B.Sc. and M.Sc. from California State University, Fresno in Biology and a Ph.D. from University of Tennessee, Knoxville in Life Sciences where he leveraged computational methods to study a variety of biological systems. He has over 15 years of experience in building, growing, and leading teams; solving problems in developing and operating data products at a variety of scales and industries.
Kendrick van Doorn is an engineering and business leader with a background in software development, with over 10 years of developing tech and data strategies at Fortune 100 companies. In his spare time, he enjoys taking classes at different universities and is currently an MBA candidate at Columbia University.