Skip to content

PacktPublishing/Apache-Airflow-Best-Practices

Repository files navigation

Apache Airflow Best Practices

no-image

This is the code repository for Apache Airflow Best Practices, published by Packt.

A practical guide to orchestrating data workflow with Apache Airflow

What is this book about?

With practical approach and detailed examples, this book covers newest features of Apache Airflow 2.x and it's potential for workflow orchestration, operational best practices, and data engineering

This book covers the following exciting features:

  • Explore the new features and improvements in Apache Airflow 2.0
  • Design and build data pipelines using DAGs
  • Implement ETL pipelines, ML workflows, and other advanced use cases
  • Develop and deploy custom plugins and UI extensions
  • Deploy and manage Apache Airflow in cloud environments such as AWS, GCP, and Azure
  • Describe a path for the scaling of your environment over time
  • Apply best practices for monitoring and maintaining Airflow

If you feel this book is for you, get your copy today! https://www.packtpub.com/

Instructions and Navigations

All of the code is organized into folders. For example, chapter-04.

The code will look like the following:

class MetricsPlugin(AirflowPlugin):
    """Defining the plugin class"""
    name = "Metrics Dashboard Plugin"
    flask_blueprints = [metrics_blueprint]
    appbuilder_views = [{
        "name": "Dashboard", "category": "Metrics",
        "view": MetricsDashboardView()
    }]

Following is what you need for this book: This book is for data engineers, developers, IT professionals, and data scientists who want to optimize workflow orchestration with Apache Airflow. It's perfect for those who recognize Airflow’s potential and want to avoid common implementation pitfalls. Whether you’re new to data, an experienced professional, or a manager seeking insights, this guide will support you. A functional understanding of Python, some business experience, and basic DevOps skills are helpful. While prior experience with Airflow is not required, it is beneficial.

To get the most out of this book

The code sources and examples in this book were primarily developed with the assumption that you would have access to Docker and Docker Compose. We also make some assumptions that you have a passing familiarity with Python, Kubernetes, and Docker.

With the following software and hardware list you can run all code files present in the book.

Software and Hardware List

Software required OS required
Airflow 2.0+ Windows, macOS, or Linux
Python 3.9+ Windows, macOS, or Linux
Docker Windows, macOS, or Linux
Postgres Windows, macOS, or Linux

We used the angreal to provide development environments and interactions for consumption. To install pip install angreal and then from any folder the following commands will be available for execution as a sub command to angreal. (e.g. if you wish to run the demo environment just use angreal demo)

    demo         commands for controlling the demo environment
    dev-setup    setup a development environment
    help         Print this message or the help of the given subcommand(s)
    test         commands for executing tests

Related products

Get to Know the Authors

Dylan Intorf is a solutions architect and data engineer with a BS from Arizona State University in Computer Science. He has 10+ years of experience in the software and data engineering space, delivering custom tailored solutions to Tech, Financial, and Insurance industries.

Dylan Storey has a B.Sc. and M.Sc. from California State University, Fresno in Biology and a Ph.D. from University of Tennessee, Knoxville in Life Sciences where he leveraged computational methods to study a variety of biological systems. He has over 15 years of experience in building, growing, and leading teams; solving problems in developing and operating data products at a variety of scales and industries.

Kendrick van Doorn is an engineering and business leader with a background in software development, with over 10 years of developing tech and data strategies at Fortune 100 companies. In his spare time, he enjoys taking classes at different universities and is currently an MBA candidate at Columbia University.

About

Apache Airflow Best Practices, published by Packt

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •