Feature/add dq tools #172

harismanazir · 2025-11-12T08:10:49Z

Summary

Adds comprehensive data quality rule creation functionality supporting rule types including column-level, table-level, and custom SQL rules.

Changes

Added create_dq_rules_tool to create single or bulk data quality rules
Added DQ-related models: DQRuleType, DQAlertPriority, DQThresholdCompareOperator, DQDimension, DQThresholdUnit, DQRuleConditionType, DQRuleCondition, DQRuleSpecification
Implemented DQ rule creation logic with PyAtlan's native DataQualityRule methods
Added validation for rule specifications with fail-fast error handling
Added tool to README.md with restriction examples

Usage

# Row Count rule
create_dq_rules_tool({
    "rule_type": "Row Count",
    "asset_qualified_name": "default/snowflake/123/DB/SCHEMA/TABLE",
    "threshold_compare_operator": "GREATER_THAN_EQUAL",
    "threshold_value": 100,
    "alert_priority": "URGENT",
    "description": "Ensure table has at least 100 rows"
})

# Null Count rule with threshold
create_dq_rules_tool({
    "rule_type": "Null Count",
    "asset_qualified_name": "default/snowflake/123/DB/SCHEMA/TABLE",
    "column_qualified_name": "default/snowflake/123/DB/SCHEMA/TABLE/EMAIL",
    "threshold_compare_operator": "LESS_THAN_EQUAL",
    "threshold_value": 5,
    "alert_priority": "URGENT",
    "description": "Email column should have minimal nulls"
})

# Custom SQL rule
create_dq_rules_tool({
    "rule_type": "Custom SQL",
    "asset_qualified_name": "default/snowflake/123/DB/SCHEMA/TABLE",
    "rule_name": "Revenue Consistency Check",
    "custom_sql": "SELECT COUNT(*) FROM TABLE WHERE revenue < 0",
    "threshold_compare_operator": "EQUAL",
    "threshold_value": 0,
    "alert_priority": "URGENT",
    "dimension": "CONSISTENCY",
    "description": "Ensure revenue values are within expected range"
})

# Bulk creation
create_dq_rules_tool([
    {
        "rule_type": "Null Count",
        "asset_qualified_name": "default/snowflake/123/DB/SCHEMA/TABLE",
        "column_qualified_name": "default/snowflake/123/DB/SCHEMA/TABLE/EMAIL",
        "threshold_compare_operator": "LESS_THAN_EQUAL",
        "threshold_value": 5,
        "alert_priority": "URGENT",
        "description": "Email validation"
    },
    {
        "rule_type": "Duplicate Count",
        "asset_qualified_name": "default/snowflake/123/DB/SCHEMA/TABLE",
        "column_qualified_name": "default/snowflake/123/DB/SCHEMA/TABLE/USER_ID",
        "threshold_compare_operator": "EQUAL",
        "threshold_value": 0,
        "alert_priority": "URGENT",
        "description": "Ensure unique user IDs"
    }
])

Tested

Creating DQ rule using MCP tool

Hk669 · 2025-11-12T08:17:45Z

modelcontextprotocol/server.py



-mcp = FastMCP("Atlan MCP Server", dependencies=["pyatlan", "fastmcp"])
+mcp = FastMCP("Atlan MCP Server")


why did we remove the dependencies?

-Reason :
Removed the dependencies param since it’s deprecated in FastMCP 2.11.4+ and will be removed soon. It now triggers a deprecation warning and isn’t needed — dependencies are already managed via pyproject.toml or fastmcp.json. This keeps the code clean, future-proof, and warning-free.

- DeprecationWarning:
The 'dependencies' parameter is deprecated as of FastMCP 2.11.4 and will be removed in a future version. Please specify dependencies in a fastmcp.json configuration file instead:

{ "entrypoint": "your_server.py", "environment": { "dependencies": ["pyatlan", "fastmcp"] } }

Hk669 · 2025-11-12T08:18:30Z

modelcontextprotocol/server.py

+    Examples:
+        # 1. Column-level: Null Count with threshold
+        rule = create_dq_rules_tool({
+            "rule_type": "Null Count",
+            "asset_qualified_name": "default/snowflake/123/DB/SCHEMA/TABLE",


can you reduce the number of examples? you can merge a few, but this will consume a lot of tokens every single time.

Key Improvements:

Token Reduction: ~60% (from ~1200 to ~480 tokens)

5 examples instead of 10 - Merged similar patterns

Inline variations - Used comments to show alternative values (e.g., # or "Regex", "Valid Values")

Shortened examples - Used ... for repeated qualified names in bulk example

Compact reference tables - Removed verbose descriptions from enum sections

Hk669 · 2025-11-12T08:19:35Z

modelcontextprotocol/server.py

+    Supported Rule Types:
+        Completeness: "Null Count", "Null Percentage", "Blank Count", "Blank Percentage"
+        Statistical: "Min Value", "Max Value", "Average", "Standard Deviation"
+        Uniqueness: "Unique Count", "Duplicate Count"
+        Validity: "Regex", "String Length", "Valid Values"
+        Timeliness: "Freshness"
+        Volume: "Row Count"
+        Custom: "Custom SQL"
+
+    Valid Alert Priority Levels:
+        - "LOW": Low priority alert
+        - "NORMAL": Normal priority alert (default)
+        - "URGENT": Urgent priority alert
+
+    Threshold Operators:
+        "EQUAL", "GREATER_THAN", "GREATER_THAN_EQUAL",
+        "LESS_THAN", "LESS_THAN_EQUAL", "BETWEEN"
+
+    Threshold Units (for Freshness rules):
+        "DAYS", "HOURS", "MINUTES"
+
+    Data Quality Dimensions (for Custom SQL):
+        "COMPLETENESS", "VALIDITY", "UNIQUENESS", "TIMELINESS",
+        "VOLUME", "ACCURACY", "CONSISTENCY"
+
+    Rule Condition Types:
+        String Length: "STRING_LENGTH_EQUALS", "STRING_LENGTH_BETWEEN",
+                      "STRING_LENGTH_GREATER_THAN", "STRING_LENGTH_GREATER_THAN_EQUALS",
+                      "STRING_LENGTH_LESS_THAN", "STRING_LENGTH_LESS_THAN_EQUALS"
+        Regex: "REGEX_MATCH", "REGEX_NOT_MATCH"
+        Valid Values: "IN_LIST", "NOT_IN_LIST"


these can go in the start, and you can break this into multiple params not just rules as a single param which an cause a lot of issues with multiple MCP clients.

Tested rule creation with current nested JSON structure (rules: Union[Dict, List[Dict]]) across three non-reasoning models:

GPT-4.1: ✅ Pass

Claude Sonnet 4 (non-reasoning): ✅ Pass

Composer 1: ❌ Fail (unable to generate valid JSON even with provided template)

Conclusion: Current structure works for most non-reasoning models (2/3). Maintaining bulk creation capability outweighs optimizing for Composer 1's specific limitation.

Hk669 · 2025-11-12T08:20:15Z

modelcontextprotocol/server.py

+        return {
+            "created_count": 0,
+            "created_rules": [],
+            "errors": [f"Parameter parsing error: {str(e)}"],


error should be a string, and remove the plural part.

Keeping the current implementation with "errors": [...] (plural, as a list) because:

It supports bulk operations where multiple errors can occur.

Hk669 · 2025-11-12T08:21:49Z

modelcontextprotocol/tools/models.py

+
+
+class DQRuleType(str, Enum):
+    """Enum for supported data quality rule types."""
+
+    # Completeness checks
+    NULL_COUNT = "Null Count"
+    NULL_PERCENTAGE = "Null Percentage"
+    BLANK_COUNT = "Blank Count"


can we reduce the number specifications? if anything is not required to create/set a rule for assets.

The model is already well-structured, with most fields marked as optional to provide flexibility without adding unnecessary complexity. If there's anything I may have overlooked, please feel free to point it out.

Hk669 · 2025-11-12T08:22:23Z

modelcontextprotocol/uv.lock

@@ -1,5 +1,5 @@
 version = 1
-revision = 2
+revision = 3


we dont need this change? we have not bumped any dependency right?

Context:
The current uv.lock file in main uses revision 2 format. When I run uv sync with uv 0.9.7, it automatically upgrades to revision 3, which also updates some dependencies.

Impact:

Lock file shows as modified after any uv sync

Dependency versions differ slightly (e.g., anyio 4.9.0 → 4.11.0)

… usage by 60%

Hk669 · 2025-11-21T09:10:40Z

modelcontextprotocol/tools/dq_rules.py

+COLUMN_LEVEL_RULES = {
+    DQRuleType.NULL_COUNT,
+    DQRuleType.NULL_PERCENTAGE,
+    DQRuleType.BLANK_COUNT,
+    DQRuleType.BLANK_PERCENTAGE,
+    DQRuleType.MIN_VALUE,
+    DQRuleType.MAX_VALUE,
+    DQRuleType.AVERAGE,
+    DQRuleType.STANDARD_DEVIATION,
+    DQRuleType.UNIQUE_COUNT,
+    DQRuleType.DUPLICATE_COUNT,
+    DQRuleType.REGEX,
+    DQRuleType.STRING_LENGTH,
+    DQRuleType.VALID_VALUES,
+    DQRuleType.FRESHNESS,
+}
+
+# Rule types that work at table level
+TABLE_LEVEL_RULES = {
+    DQRuleType.ROW_COUNT,
+}
+
+# Rule types that support conditions
+CONDITIONAL_RULES = {
+    DQRuleType.STRING_LENGTH,
+    DQRuleType.REGEX,
+    DQRuleType.VALID_VALUES,
+}


can we make a single method to validate these, we dont need these dicts.

add a method in the DQRuleType, to pickup the right rule

Hk669 · 2025-11-21T09:11:56Z

modelcontextprotocol/tools/dq_rules.py

+                if spec.rule_type == DQRuleType.CUSTOM_SQL:
+                    rule = _create_custom_sql_rule(spec, client)
+                elif spec.rule_type in TABLE_LEVEL_RULES:
+                    rule = _create_table_level_rule(spec, client)
+                elif spec.rule_type in COLUMN_LEVEL_RULES:
+                    rule = _create_column_level_rule(spec, client)


the above changes to create a method will be easy here to validate directly

Hk669 · 2025-11-21T09:14:00Z

modelcontextprotocol/tools/dq_rules.py

+def _create_table_level_rule(spec: DQRuleSpecification, client) -> DataQualityRule:
+    """
+    Create a table-level data quality rule.
+


can all these helper functions be just parameter driven logic (abstraction) in a single method? seems like a lot of duplicate code across

harismanazir added 6 commits November 10, 2025 00:29

version incompatibility issue

133e44a

add create DQ rule tool

26730fe

fix(dq-rules): enforce threshold_value for all rule types

655d868

apply auto-fixes from pre-commit hooks

c4e80e7

docs(readme): add data quality rules tool documentation

25bbd16

chore(models): remove comments

0a4b237

harismanazir requested review from Hk669 and firecast as code owners November 12, 2025 08:10

harismanazir assigned harismanazir and unassigned harismanazir Nov 12, 2025

Hk669 requested changes Nov 12, 2025

View reviewed changes

harismanazir added 7 commits November 13, 2025 13:24

Revert deprecation warning changes

543f87c

Merge remote-tracking branch 'upstream' into feature/add-dq-tools

8c60e51

pre-commit hook changes

0113ab0

refactor(dq): remove 6 duplicate enums, use PyAtlan enums directly

e169055

[server.py] : optimize create_dq_rules_tool docstring to reduce token…

cf60f80

… usage by 60%

Merge remote-tracking branch 'upstream' into feature/add-dq-tools

41250d9

revert uv.lock file changes

69ad196

harismanazir requested a review from Hk669 November 20, 2025 16:38

Hk669 requested changes Nov 21, 2025

View reviewed changes



		mcp = FastMCP("Atlan MCP Server", dependencies=["pyatlan", "fastmcp"])
		mcp = FastMCP("Atlan MCP Server")

Feature/add dq tools #172

Are you sure you want to change the base?

Feature/add dq tools #172

Uh oh!

Conversation

harismanazir commented Nov 12, 2025

Summary

Changes

Usage

Tested

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Key Improvements:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harismanazir Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Tested rule creation with current nested JSON structure (rules: Union[Dict, List[Dict]]) across three non-reasoning models:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harismanazir Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

harismanazir Nov 19, 2025 •

edited

Loading

harismanazir Nov 19, 2025 •

edited

Loading