Skip to content

Latest commit

 

History

History
 
 

04-analytics-engineering

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Week 4: Analytics Engineering

Goal: Transforming the data loaded in DWH into Analytical Views developing a dbt project.

Prerequisites

By this stage of the course you should have already:

  • A running warehouse (BigQuery or postgres)
  • A set of running pipelines ingesting the project dataset (week 3 completed)
  • The following datasets ingested from the course Datasets list:
    • Yellow taxi data - Years 2019 and 2020
    • Green taxi data - Years 2019 and 2020
    • fhv data - Year 2019.

Note:

  • A quick hack has been shared to load that data quicker, check instructions in week3/extras
  • If you recieve an error stating "Permission denied while globbing file pattern." when attemting to run fact_trips.sql this Video may be helpful in resolving the issue

Setting up your environment

Setting up dbt for using BigQuery (Alternative A - preferred)

  1. Open a free developer dbt cloud account followingthis link
  2. Following these instructions to connect to your BigQuery instance. More detailed instructions in dbt_cloud_setup.md

Optional: If you feel more comfortable developing locally you could use a local installation of dbt core. You can follow the official dbt documentation or follow the dbt core with BigQuery on Docker guide to setup dbt locally on docker. You will need to install the latest version with the BigQuery adapter (dbt-bigquery).

Setting up dbt for using Postgres locally (Alternative B)

As an alternative to the cloud, that require to have a cloud database, you will be able to run the project installing dbt locally. You can follow the official dbt documentation or use a docker image from oficial dbt repo. You will need to install the latest version with the postgres adapter (dbt-postgres). After local installation you will have to set up the connection to PG in the profiles.yml, you can find the templates here

Content

Introduction to analytics engineering

  • What is analytics engineering?
  • ETL vs ELT
  • Data modeling concepts (fact and dim tables)

🎥 Video

What is dbt?

  • Intro to dbt

🎥 Video

Starting a dbt project

Alternative A: Using BigQuery + dbt cloud

  • Starting a new project with dbt init (dbt cloud and core)
  • dbt cloud setup
  • project.yml

🎥 Video

Alternative B: Using Postgres + dbt core (locally)

  • Starting a new project with dbt init (dbt cloud and core)
  • dbt core local setup
  • profiles.yml
  • project.yml

🎥 Video

dbt models

  • Anatomy of a dbt model: written code vs compiled Sources
  • Materialisations: table, view, incremental, ephemeral
  • Seeds, sources and ref
  • Jinja and Macros
  • Packages
  • Variables

🎥 Video

Note: This video is shown entirely on dbt cloud IDE but the same steps can be followed locally on the IDE of your choice

Testing and documenting dbt models

  • Tests
  • Documentation

🎥 Video

Note: This video is shown entirely on dbt cloud IDE but the same steps can be followed locally on the IDE of your choice

Deployment

Alternative A: Using BigQuery + dbt cloud

  • Deployment: development environment vs production
  • dbt cloud: scheduler, sources and hosted documentation

🎥 Video

Alternative B: Using Postgres + dbt core (locally)

  • Deployment: development environment vs production
  • dbt cloud: scheduler, sources and hosted documentation

🎥 Video

Visualising the transformed data

🎥 Google data studio Video 🎥 Metabase Video

Advanced concepts

Community notes

Did you take notes? You can share them here.

Useful links