-
Notifications
You must be signed in to change notification settings - Fork 20
Feature/add dq tools #172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feature/add dq tools #172
Conversation
modelcontextprotocol/server.py
Outdated
|
|
||
|
|
||
| mcp = FastMCP("Atlan MCP Server", dependencies=["pyatlan", "fastmcp"]) | ||
| mcp = FastMCP("Atlan MCP Server") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did we remove the dependencies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-Reason :
Removed the dependencies param since it’s deprecated in FastMCP 2.11.4+ and will be removed soon. It now triggers a deprecation warning and isn’t needed — dependencies are already managed via pyproject.toml or fastmcp.json. This keeps the code clean, future-proof, and warning-free.
- DeprecationWarning:
The 'dependencies' parameter is deprecated as of FastMCP 2.11.4 and will be removed in a future version. Please specify dependencies in a fastmcp.json configuration file instead:
{
"entrypoint": "your_server.py",
"environment": {
"dependencies": ["pyatlan", "fastmcp"]
}
}
| Examples: | ||
| # 1. Column-level: Null Count with threshold | ||
| rule = create_dq_rules_tool({ | ||
| "rule_type": "Null Count", | ||
| "asset_qualified_name": "default/snowflake/123/DB/SCHEMA/TABLE", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you reduce the number of examples? you can merge a few, but this will consume a lot of tokens every single time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Key Improvements:
Token Reduction: ~60% (from ~1200 to ~480 tokens)
- 5 examples instead of 10 - Merged similar patterns
- Inline variations - Used comments to show alternative values (e.g., # or "Regex", "Valid Values")
- Shortened examples - Used ... for repeated qualified names in bulk example
- Compact reference tables - Removed verbose descriptions from enum sections
| Supported Rule Types: | ||
| Completeness: "Null Count", "Null Percentage", "Blank Count", "Blank Percentage" | ||
| Statistical: "Min Value", "Max Value", "Average", "Standard Deviation" | ||
| Uniqueness: "Unique Count", "Duplicate Count" | ||
| Validity: "Regex", "String Length", "Valid Values" | ||
| Timeliness: "Freshness" | ||
| Volume: "Row Count" | ||
| Custom: "Custom SQL" | ||
| Valid Alert Priority Levels: | ||
| - "LOW": Low priority alert | ||
| - "NORMAL": Normal priority alert (default) | ||
| - "URGENT": Urgent priority alert | ||
| Threshold Operators: | ||
| "EQUAL", "GREATER_THAN", "GREATER_THAN_EQUAL", | ||
| "LESS_THAN", "LESS_THAN_EQUAL", "BETWEEN" | ||
| Threshold Units (for Freshness rules): | ||
| "DAYS", "HOURS", "MINUTES" | ||
| Data Quality Dimensions (for Custom SQL): | ||
| "COMPLETENESS", "VALIDITY", "UNIQUENESS", "TIMELINESS", | ||
| "VOLUME", "ACCURACY", "CONSISTENCY" | ||
| Rule Condition Types: | ||
| String Length: "STRING_LENGTH_EQUALS", "STRING_LENGTH_BETWEEN", | ||
| "STRING_LENGTH_GREATER_THAN", "STRING_LENGTH_GREATER_THAN_EQUALS", | ||
| "STRING_LENGTH_LESS_THAN", "STRING_LENGTH_LESS_THAN_EQUALS" | ||
| Regex: "REGEX_MATCH", "REGEX_NOT_MATCH" | ||
| Valid Values: "IN_LIST", "NOT_IN_LIST" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these can go in the start, and you can break this into multiple params not just rules as a single param which an cause a lot of issues with multiple MCP clients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested rule creation with current nested JSON structure (rules: Union[Dict, List[Dict]]) across three non-reasoning models:
- GPT-4.1: ✅ Pass
- Claude Sonnet 4 (non-reasoning): ✅ Pass
- Composer 1: ❌ Fail (unable to generate valid JSON even with provided template)
Conclusion: Current structure works for most non-reasoning models (2/3). Maintaining bulk creation capability outweighs optimizing for Composer 1's specific limitation.
| return { | ||
| "created_count": 0, | ||
| "created_rules": [], | ||
| "errors": [f"Parameter parsing error: {str(e)}"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error should be a string, and remove the plural part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping the current implementation with "errors": [...] (plural, as a list) because:
- It supports bulk operations where multiple errors can occur.
|
|
||
|
|
||
| class DQRuleType(str, Enum): | ||
| """Enum for supported data quality rule types.""" | ||
|
|
||
| # Completeness checks | ||
| NULL_COUNT = "Null Count" | ||
| NULL_PERCENTAGE = "Null Percentage" | ||
| BLANK_COUNT = "Blank Count" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we reduce the number specifications? if anything is not required to create/set a rule for assets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model is already well-structured, with most fields marked as optional to provide flexibility without adding unnecessary complexity. If there's anything I may have overlooked, please feel free to point it out.
modelcontextprotocol/uv.lock
Outdated
| @@ -1,5 +1,5 @@ | |||
| version = 1 | |||
| revision = 2 | |||
| revision = 3 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we dont need this change? we have not bumped any dependency right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context:
The current uv.lock file in main uses revision 2 format. When I run uv sync with uv 0.9.7, it automatically upgrades to revision 3, which also updates some dependencies.
Impact:
- Lock file shows as modified after any
uv sync - Dependency versions differ slightly (e.g., anyio 4.9.0 → 4.11.0)
| COLUMN_LEVEL_RULES = { | ||
| DQRuleType.NULL_COUNT, | ||
| DQRuleType.NULL_PERCENTAGE, | ||
| DQRuleType.BLANK_COUNT, | ||
| DQRuleType.BLANK_PERCENTAGE, | ||
| DQRuleType.MIN_VALUE, | ||
| DQRuleType.MAX_VALUE, | ||
| DQRuleType.AVERAGE, | ||
| DQRuleType.STANDARD_DEVIATION, | ||
| DQRuleType.UNIQUE_COUNT, | ||
| DQRuleType.DUPLICATE_COUNT, | ||
| DQRuleType.REGEX, | ||
| DQRuleType.STRING_LENGTH, | ||
| DQRuleType.VALID_VALUES, | ||
| DQRuleType.FRESHNESS, | ||
| } | ||
|
|
||
| # Rule types that work at table level | ||
| TABLE_LEVEL_RULES = { | ||
| DQRuleType.ROW_COUNT, | ||
| } | ||
|
|
||
| # Rule types that support conditions | ||
| CONDITIONAL_RULES = { | ||
| DQRuleType.STRING_LENGTH, | ||
| DQRuleType.REGEX, | ||
| DQRuleType.VALID_VALUES, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make a single method to validate these, we dont need these dicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a method in the DQRuleType, to pickup the right rule
| if spec.rule_type == DQRuleType.CUSTOM_SQL: | ||
| rule = _create_custom_sql_rule(spec, client) | ||
| elif spec.rule_type in TABLE_LEVEL_RULES: | ||
| rule = _create_table_level_rule(spec, client) | ||
| elif spec.rule_type in COLUMN_LEVEL_RULES: | ||
| rule = _create_column_level_rule(spec, client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the above changes to create a method will be easy here to validate directly
| def _create_table_level_rule(spec: DQRuleSpecification, client) -> DataQualityRule: | ||
| """ | ||
| Create a table-level data quality rule. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can all these helper functions be just parameter driven logic (abstraction) in a single method? seems like a lot of duplicate code across
Summary
Adds comprehensive data quality rule creation functionality supporting rule types including column-level, table-level, and custom SQL rules.
Changes
create_dq_rules_toolto create single or bulk data quality rulesDQRuleType,DQAlertPriority,DQThresholdCompareOperator,DQDimension,DQThresholdUnit,DQRuleConditionType,DQRuleCondition,DQRuleSpecificationDataQualityRulemethodsUsage
Tested
Creating DQ rule using MCP tool

