Build a custom data-catalog in minutes
- CatalogBuilder is a simple tool to generate & deploy a documentation website for your data assets.
- It enables anyone at your company to quickly find the trusted data they are looking for.
There are many open-source projects (admundsen, open-metadata, datahub, metacat, atlas) to build such a catalog in-house. But as they offer a lot of advanced features, they are hard to manage and deploy if you're not a tech expert. They can be even harder to customize.
dbt docs is great to generate a documentation website on top of your dbt assets but:
- it focuses on dbt only (while you are interested in other sources + metadata)
- is very hard to customize (except you're an angular expert)
- can be slow.
👉 CatalogBuilder aims at offering a lightweight alternative to generate a documentation website on top of your data assets. It focuses on read-only data discovery and:
- ✔️ can be easily customized and deployed by low tech people
- ✔️ can then handle the very specific needs of your company
- ✔️ is fast and lightweight
- ✔️ is built on top of the very famous mkdocs-material python library which is used by millions of developers to deploy their documentation (such as fastapi).
catalog
is the CLI (command-line-interface) of CatalogBuilder to generate, show & deploy the documentation.
pip install catalog-builder
catalog download dbt_gitlab_data_team
To get started, let's download a catalog configuration example from the GitHub repo and play with it. The above command will download the catalogs/dbt_gitlab_data_team
folder on your laptop.
You will find in the folder:
assets file
: a file containing the list of the assets you want to put in your documentation. It can be a parquet file namedassets.parquet
or a json lines file namedassets.jsonl
. Each asset in the file must have the following fields:
asset_type
: for example:table
.documentation_path
: the path of the asset page in the generated documentation. For exampledataset_name/table_name
.data
: a dict of attributes used to generate the documentation. For example{"name": "foo"}
generate_assets_file.py
: the python script used to (re)generate theassets file
.requirements.txt
: the python requirements needed bygenerate_assets_file.py
.templates
: a folder which includes a jinja-template markdown-file for eachasset_type
. These templates are used to generate a markdown documentation file for each asset.source_docs
: a folder which includes files to include as-is in the documentation.mkdocs.yml
: the mkdocs configuration file used by mkdocs to build the documentation website from the generated markdown files.
catalog build dbt_gitlab_data_team
- For each asset of the
assets file
, the jinja template ofasset_type
will be rendered using the assetdata
to generate a markdown file which will be written intocatalogs/dbt_gitlab_data_team/docs/
atdocumentation_path
.- All files in
catalogs/dbt_gitlab_data_team/source_docs/
are copied intocatalogs/dbt_gitlab_data_team/docs/
- Mkdocs will then build the documentation website from the markdown files into
catalogs/dbt_gitlab_data_team/site
(usingmkdocs.yml
configuration file).
catalog serve dbt_gitlab_data_team
You can now see the generated documentation website at http://localhost:8000.
A. To deploy on GitHub pages:
catalog deploy github-pages dbt_gitlab_data_team
Mkdocs will deploy the site on GitHub pages (this only works if you are on a github repository).
B. To deploy on Google Cloud Storage Bucket:
catalog deploy gcs dbt_gitlab_data_team
Mkdocs will copy all the files in
catalogs/dbt_gitlab_data_team/site
to the bucket defined bysite_url
value ofcatalogs/dbt_gitlab_data_team/mkdocs.yml
. For instance if the site url ishttp://catalogs.unytics.io/dbt_gitlab_data_team/
it will copy all files undercatalogs/dbt_gitlab_data_team/site
togs://catalogs.unytics.io/dbt_gitlab_data_team/
C. To deploy elsewhere:
You can follow these instructions from mkdocs.
To generate a documentation website for your own dbt project, do the following:
- Change directory to your dbt project directory
- Download
catalogs/dbt
documentation example by runningcatalog download dbt
. - Run
dbt docs generate
to computetarget/manifest.json
andtarget/catalog.json
. - Generate the assets file by running
python catalogs/dbt/generate_assets_file.py
. The script will parsetarget/manifest.json
andtarget/catalog.json
to generate theassets file
in the expected format. - Run
catalog serve dbt
to build the website and show it locally.
Join our Slack for any question, to get help for getting started, to speak about a bug, to suggest improvements, or simply if you want to have a chat 🙂.
Any contribution is more than welcome 🤗!
- Add a ⭐ on the repo to show your support
- Join our Slack and talk with us
- Raise an issue to raise a bug or suggest improvements
- Open a PR!