Metadata Guardian is a Python package that provides an easy way to protect your data sources by searching its metadata. By searching with data rules, it will detect what you are looking to protect. Using Rust, it makes blazing fast multi-regex matching.
Read more in this article.
# Install all the data sources
pip install 'metadata_guardian[all]'
# Install one or more data sources from the list
pip install 'metadata_guardian[snowflake,avro,aws,gcp,deltalake,kafka_schema_registry,mysql]'
With available Data Rules:
from metadata_guardian import (
AvailableCategory,
ColumnScanner,
DataRules,
)
from metadata_guardian.source import MySQLSource
source = MySQLSource(
user="root",
password="12345678",
host="localhost",
)
data_rules = DataRules.from_available_category(category=AvailableCategory.PII)
column_scanner = ColumnScanner(data_rules=data_rules)
with source:
report = column_scanner.scan_external(
source,
database_name="sequelmovie",
include_comment=True,
)
report.to_console()
With custom Data Rules:
from metadata_guardian import (
AvailableCategory,
ColumnScanner,
DataRule,
DataRules,
)
from metadata_guardian.source import MySQLSource
source = MySQLSource(
user="root",
password="12345678",
host="localhost",
)
category = "example"
data_rule = DataRule(
rule_name="example_rule_name",
regex_pattern="\b(test|example)\b",
documentation="example_test",
)
data_rules = [data_rule]
data_rules = DataRules.from_new_category(category=category, data_rules=data_rules)
column_scanner = ColumnScanner(
data_rules=data_rules, progression_bar_disabled=False
)
with source:
report = column_scanner.scan_external(
source,
database_name="sequelmovie",
include_comment=True,
)
report.to_console()
The documentation is hosted here: https://fvaleye.github.io/metadata-guardian/python/