Skip to content

Conversation

@harismanazir
Copy link

Summary

Adds comprehensive data quality rule creation functionality supporting rule types including column-level, table-level, and custom SQL rules.

Changes

  • Added create_dq_rules_tool to create single or bulk data quality rules
  • Added DQ-related models: DQRuleType, DQAlertPriority, DQThresholdCompareOperator, DQDimension, DQThresholdUnit, DQRuleConditionType, DQRuleCondition, DQRuleSpecification
  • Implemented DQ rule creation logic with PyAtlan's native DataQualityRule methods
  • Added validation for rule specifications with fail-fast error handling
  • Added tool to README.md with restriction examples

Usage

# Row Count rule
create_dq_rules_tool({
    "rule_type": "Row Count",
    "asset_qualified_name": "default/snowflake/123/DB/SCHEMA/TABLE",
    "threshold_compare_operator": "GREATER_THAN_EQUAL",
    "threshold_value": 100,
    "alert_priority": "URGENT",
    "description": "Ensure table has at least 100 rows"
})

# Null Count rule with threshold
create_dq_rules_tool({
    "rule_type": "Null Count",
    "asset_qualified_name": "default/snowflake/123/DB/SCHEMA/TABLE",
    "column_qualified_name": "default/snowflake/123/DB/SCHEMA/TABLE/EMAIL",
    "threshold_compare_operator": "LESS_THAN_EQUAL",
    "threshold_value": 5,
    "alert_priority": "URGENT",
    "description": "Email column should have minimal nulls"
})

# Custom SQL rule
create_dq_rules_tool({
    "rule_type": "Custom SQL",
    "asset_qualified_name": "default/snowflake/123/DB/SCHEMA/TABLE",
    "rule_name": "Revenue Consistency Check",
    "custom_sql": "SELECT COUNT(*) FROM TABLE WHERE revenue < 0",
    "threshold_compare_operator": "EQUAL",
    "threshold_value": 0,
    "alert_priority": "URGENT",
    "dimension": "CONSISTENCY",
    "description": "Ensure revenue values are within expected range"
})

# Bulk creation
create_dq_rules_tool([
    {
        "rule_type": "Null Count",
        "asset_qualified_name": "default/snowflake/123/DB/SCHEMA/TABLE",
        "column_qualified_name": "default/snowflake/123/DB/SCHEMA/TABLE/EMAIL",
        "threshold_compare_operator": "LESS_THAN_EQUAL",
        "threshold_value": 5,
        "alert_priority": "URGENT",
        "description": "Email validation"
    },
    {
        "rule_type": "Duplicate Count",
        "asset_qualified_name": "default/snowflake/123/DB/SCHEMA/TABLE",
        "column_qualified_name": "default/snowflake/123/DB/SCHEMA/TABLE/USER_ID",
        "threshold_compare_operator": "EQUAL",
        "threshold_value": 0,
        "alert_priority": "URGENT",
        "description": "Ensure unique user IDs"
    }
])

Tested

Creating DQ rule using MCP tool
image
image



mcp = FastMCP("Atlan MCP Server", dependencies=["pyatlan", "fastmcp"])
mcp = FastMCP("Atlan MCP Server")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we remove the dependencies?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-Reason :
Removed the dependencies param since it’s deprecated in FastMCP 2.11.4+ and will be removed soon. It now triggers a deprecation warning and isn’t needed — dependencies are already managed via pyproject.toml or fastmcp.json. This keeps the code clean, future-proof, and warning-free.

- DeprecationWarning:
The 'dependencies' parameter is deprecated as of FastMCP 2.11.4 and will be removed in a future version. Please specify dependencies in a fastmcp.json configuration file instead:

{
  "entrypoint": "your_server.py",
  "environment": {
    "dependencies": ["pyatlan", "fastmcp"]
  }
}

Comment on lines 931 to 935
Examples:
# 1. Column-level: Null Count with threshold
rule = create_dq_rules_tool({
"rule_type": "Null Count",
"asset_qualified_name": "default/snowflake/123/DB/SCHEMA/TABLE",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you reduce the number of examples? you can merge a few, but this will consume a lot of tokens every single time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key Improvements:

Token Reduction: ~60% (from ~1200 to ~480 tokens)

  • 5 examples instead of 10 - Merged similar patterns
  • Inline variations - Used comments to show alternative values (e.g., # or "Regex", "Valid Values")
  • Shortened examples - Used ... for repeated qualified names in bulk example
  • Compact reference tables - Removed verbose descriptions from enum sections

Comment on lines 1066 to 1096
Supported Rule Types:
Completeness: "Null Count", "Null Percentage", "Blank Count", "Blank Percentage"
Statistical: "Min Value", "Max Value", "Average", "Standard Deviation"
Uniqueness: "Unique Count", "Duplicate Count"
Validity: "Regex", "String Length", "Valid Values"
Timeliness: "Freshness"
Volume: "Row Count"
Custom: "Custom SQL"
Valid Alert Priority Levels:
- "LOW": Low priority alert
- "NORMAL": Normal priority alert (default)
- "URGENT": Urgent priority alert
Threshold Operators:
"EQUAL", "GREATER_THAN", "GREATER_THAN_EQUAL",
"LESS_THAN", "LESS_THAN_EQUAL", "BETWEEN"
Threshold Units (for Freshness rules):
"DAYS", "HOURS", "MINUTES"
Data Quality Dimensions (for Custom SQL):
"COMPLETENESS", "VALIDITY", "UNIQUENESS", "TIMELINESS",
"VOLUME", "ACCURACY", "CONSISTENCY"
Rule Condition Types:
String Length: "STRING_LENGTH_EQUALS", "STRING_LENGTH_BETWEEN",
"STRING_LENGTH_GREATER_THAN", "STRING_LENGTH_GREATER_THAN_EQUALS",
"STRING_LENGTH_LESS_THAN", "STRING_LENGTH_LESS_THAN_EQUALS"
Regex: "REGEX_MATCH", "REGEX_NOT_MATCH"
Valid Values: "IN_LIST", "NOT_IN_LIST"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these can go in the start, and you can break this into multiple params not just rules as a single param which an cause a lot of issues with multiple MCP clients.

Copy link
Author

@harismanazir harismanazir Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested rule creation with current nested JSON structure (rules: Union[Dict, List[Dict]]) across three non-reasoning models:

  • GPT-4.1: ✅ Pass
  • Claude Sonnet 4 (non-reasoning): ✅ Pass
  • Composer 1: ❌ Fail (unable to generate valid JSON even with provided template)

Conclusion: Current structure works for most non-reasoning models (2/3). Maintaining bulk creation capability outweighs optimizing for Composer 1's specific limitation.

return {
"created_count": 0,
"created_rules": [],
"errors": [f"Parameter parsing error: {str(e)}"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error should be a string, and remove the plural part.

Copy link
Author

@harismanazir harismanazir Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping the current implementation with "errors": [...] (plural, as a list) because:

  • It supports bulk operations where multiple errors can occur.

Comment on lines +77 to +85


class DQRuleType(str, Enum):
"""Enum for supported data quality rule types."""

# Completeness checks
NULL_COUNT = "Null Count"
NULL_PERCENTAGE = "Null Percentage"
BLANK_COUNT = "Blank Count"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reduce the number specifications? if anything is not required to create/set a rule for assets.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model is already well-structured, with most fields marked as optional to provide flexibility without adding unnecessary complexity. If there's anything I may have overlooked, please feel free to point it out.

@@ -1,5 +1,5 @@
version = 1
revision = 2
revision = 3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont need this change? we have not bumped any dependency right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context:
The current uv.lock file in main uses revision 2 format. When I run uv sync with uv 0.9.7, it automatically upgrades to revision 3, which also updates some dependencies.

Impact:

  • Lock file shows as modified after any uv sync
  • Dependency versions differ slightly (e.g., anyio 4.9.0 → 4.11.0)

@harismanazir harismanazir requested a review from Hk669 November 20, 2025 16:38
Comment on lines +31 to +58
COLUMN_LEVEL_RULES = {
DQRuleType.NULL_COUNT,
DQRuleType.NULL_PERCENTAGE,
DQRuleType.BLANK_COUNT,
DQRuleType.BLANK_PERCENTAGE,
DQRuleType.MIN_VALUE,
DQRuleType.MAX_VALUE,
DQRuleType.AVERAGE,
DQRuleType.STANDARD_DEVIATION,
DQRuleType.UNIQUE_COUNT,
DQRuleType.DUPLICATE_COUNT,
DQRuleType.REGEX,
DQRuleType.STRING_LENGTH,
DQRuleType.VALID_VALUES,
DQRuleType.FRESHNESS,
}

# Rule types that work at table level
TABLE_LEVEL_RULES = {
DQRuleType.ROW_COUNT,
}

# Rule types that support conditions
CONDITIONAL_RULES = {
DQRuleType.STRING_LENGTH,
DQRuleType.REGEX,
DQRuleType.VALID_VALUES,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make a single method to validate these, we dont need these dicts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a method in the DQRuleType, to pickup the right rule

Comment on lines +119 to +124
if spec.rule_type == DQRuleType.CUSTOM_SQL:
rule = _create_custom_sql_rule(spec, client)
elif spec.rule_type in TABLE_LEVEL_RULES:
rule = _create_table_level_rule(spec, client)
elif spec.rule_type in COLUMN_LEVEL_RULES:
rule = _create_column_level_rule(spec, client)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the above changes to create a method will be easy here to validate directly

Comment on lines +263 to +266
def _create_table_level_rule(spec: DQRuleSpecification, client) -> DataQualityRule:
"""
Create a table-level data quality rule.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can all these helper functions be just parameter driven logic (abstraction) in a single method? seems like a lot of duplicate code across

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants