This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Multi-platform ETL pipeline system that extracts marketing/analytics data from 5 advertising platforms and loads into ClickHouse. Built on Python 3.12 with dlt as the core ETL framework.
Data Sources: Facebook Ads, Google Ads, Google Analytics 4, Google Play Console, App Store Connect, eSIM Analytics API
Data Destinations: ClickHouse travel (primary), ClickHouse esim_db (esim project), PostgreSQL (replication source)
# Pattern: python main.py <platform> <group>
python main.py facebook d1
python main.py google g1
python main.py google_analytics ga1
python main.py google_play gp1
python main.py app_store as1
python main.py pg dashboard
# eSIM pipelines (destination: clickhouse_esim → esim_db)
python main.py esim default
python main.py esim_facebook e1pip install -r requirements.txtdocker-compose up -d # Start ClickHouse + PostgreSQL
docker-compose --profile metabase up -d # Include MetabaseSet environment variables before running:
FB_BACKFILL_DAYS=90 python main.py facebook d1
GA4_BACKFILL_DAYS=30 python main.py google_analytics ga1
ESIM_FB_BACKFILL_DAYS=90 python main.py esim_facebook e1main.py → dispatches to pipelines/<platform>/<platform>_pipeline.py based on CLI args.
Each platform follows the same pattern:
- Pipeline runner (
pipelines/<platform>/<platform>_pipeline.py) — reads group config, creates dlt pipeline, executes sources - Sources (
pipelines/<platform>/sources.py) — dlt resource definitions that transform API data into tables - Group config (
secrets/<platform>.json) — JSON files defining account groups with credentials/tokens
facebook_ads/— Custom Facebook source with insights flattening, async job handling, timeout/retry logic, action metrics decompositiongoogle_analytics/— Custom GA4 source with helper utilitiesgoogle_ads/— Google Ads client integration
utils.py— Shared helpers:get_for_group()(loads group config with account_ids),load_config()(loads group config without account_ids assumption), date normalization, logging setupfacebook_ads/helpers.py— Insights async job polling with exponential backoff, action metrics flatteningfacebook_ads/settings.py— Field definitions, insight breakdowns, action type selections
Accounts are organized into named groups (d1, m4, d2, d1a, d1b, d1c, etc.). Each group:
- Has its own credentials in
secrets/<platform>.json - Maintains isolated dlt pipeline state (pipeline name includes group suffix)
- Runs independently in GitHub Actions
- Merge disposition with primary keys for upsert behavior (most pipelines)
- Replace disposition for full-refresh tables
- dlt handles ClickHouse schema creation and evolution automatically
- Dataset/table naming uses configurable separators
- Sequential batch processing for Facebook (3 workflow batches: daily-facebook-1/2/3)
- Configurable delays between accounts (
FB_ACCOUNT_DELAY_SECONDS, default 600s) - Facebook Insights uses async jobs with configurable timeout (
FB_INSIGHTS_JOB_TIMEOUT_SECONDS) and retry (FB_INSIGHTS_MAX_RETRIES) with exponential backoff - Max parallel: 1 per batch in GitHub Actions
.github/workflows/_reusable-etl.yml— Shared job template all pipelines use- Daily schedules (UTC): Google Ads 23:30 (5:00 IST, runs before Facebook), Facebook batches A/B/C 0:00/1:00/2:00 (5:30/6:30/7:30 IST), Subscription Google 2:50 (8:20 IST), Subscription Facebook 3:00 (8:30 IST), GA4 4:30, Google Play 5:10, App Store 6:10, eSIM Facebook 8:30
- Hourly: eSIM Analytics API every hour at :15
backfill.yml— Manual dispatch workflow for historical data loads- Secrets are base64-encoded in GitHub (DLT_SECRETS, FB_GROUPS, GOOGLE_GROUPS, etc.)
.dlt/secrets.toml— dlt credentials (local dev).dlt/config.toml— dlt runtime settingssecrets/*.json— Per-platform account group definitions.env— Production connection strings (ClickHouse, PostgreSQL)- Environment variables control backfill days, timeouts, delays, and retry counts
The esim project uses a separate ClickHouse database (esim_db) via the clickhouse_esim dlt destination.
pipelines/esim/— Analytics Export API pipeline. Manifest-driven: fetches dataset config (watermark fields, strategies, endpoints) from the backend's/internal/analytics/exports/manifest/at runtime. Column types are auto-detected by dlt from the API response data. Config insecrets/esim.json.pipelines/esim_facebook/— Facebook Ads pipeline for esim. Same pattern aspipelines/facebook/but targetsclickhouse_esim. Config insecrets/esim_facebook.json. UsesESIM_FB_*env vars.
pipelines/pg/ handles logical replication from remote PostgreSQL sources into ClickHouse, used for dashboard and travel datasets. This is distinct from the API-based pipelines.