-
Notifications
You must be signed in to change notification settings - Fork 16.2k
docs: Added Database Engine Specifications to the Official Documentation Website #36248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
apache#36126) Co-authored-by: Claude <[email protected]>
…porting to mdx format
docs:Added Database Engine Specifications to the Official Documentation Website
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review Agent Run #7f070d
Actionable Suggestions - 5
-
superset/db_engine_specs/docs_lib.py - 1
- Incorrect feature reporting logic · Line 127-131
-
docs/docs/configuration/db_engine_specs.mdx - 3
- Wrong BigQuery credentials JSON example · Line 910-910
- Invalid dict syntax in SQLite example · Line 1120-1128
- Correct get_metrics documentation and grammar · Line 1201-1201
-
docs/src/components/DatabaseTable.jsx - 1
- Unsafe type assumption in filter logic · Line 50-57
Additional Suggestions - 20
-
superset/db_engine_specs/docs_lib.py - 20
-
Boolean positional argument in function · Line 231-231Consider using a keyword-only argument or dataclass for the `preserve_order` boolean parameter.
Code suggestion
@@ -225,7 +225,7 @@ def generate_focused_table( info: dict[str, dict[str, Any]], feature_keys: list[str], column_labels: list[str], filter_fn: Any = None, value_extractor: Any = None, - preserve_order: bool = False, + *, + preserve_order: bool = False, -
zip() without explicit strict parameter · Line 679-679Add `strict=True` parameter to `zip()` on line 679 to ensure both iterables have the same length.
Code suggestion
@@ -676,7 +676,7 @@ "Operational & Advanced Features", ] - for title, table in zip(titles, tables): -
Missing parameter documentation · Line 225-225Add documentation for the `preserve_order` parameter in the docstring of `generate_focused_table`.
Code suggestion
@@ -233,6 +233,7 @@ feature_keys: List of feature keys to extract from db_info column_labels: List of column header labels filter_fn: Optional function to filter databases (receives db_info dict) value_extractor: Optional function to extract value (receives db_info, key) + preserve_order: If True, preserve original order; otherwise sort by database name Returns: -
Missing module-level docstring · Line 1-1Add a module-level docstring at the top of the file to document the purpose and functionality of this module.
Code suggestion
@@ -17,6 +17,10 @@ # under the License. from __future__ import annotations + +"""Documentation generation utilities for database engine specifications. + +This module provides functions to diagnose and document features of database engine specs. +""" from typing import Any
-
Docstring summary should start first line · Line 85-85Move the docstring summary to the first line of the docstring for `has_custom_method` function.
Code suggestion
@@ -84,9 +84,8 @@ def has_custom_method(spec: type[BaseEngineSpec], method: str) -> bool: - """ - Check if a class has a custom implementation of a method. + """Check if a class has a custom implementation of a method. Since some classes don't inherit directly from ``BaseEngineSpec`` we need to check the attributes of the spec and the base class. """
-
Missing trailing comma in function · Line 95-95Add a trailing comma after the last argument in the `return` statement on line 95.
Code suggestion
@@ -91,7 +91,7 @@ return bool( getattr(spec, method, False) and getattr(BaseEngineSpec, method, False) and getattr(spec, method).__qualname__ - != getattr(BaseEngineSpec, method).__qualname__ + != getattr(BaseEngineSpec, method).__qualname__, ) -
Docstring summary should start first line · Line 100-100Move the docstring summary to the first line for `diagnose` function.
Code suggestion
@@ -99,8 +99,7 @@ def diagnose(spec: type[BaseEngineSpec]) -> dict[str, Any]: - """ - Run basic diagnostics on a given DB engine spec. - """
-
Missing trailing comma in function · Line 147-147Add a trailing comma after `get_extra_table_metadata` on line 147.
Code suggestion
@@ -145,7 +145,7 @@ "file_upload": spec.supports_file_upload, "get_extra_table_metadata": has_custom_method( - spec, "get_extra_table_metadata" + spec, "get_extra_table_metadata", ), -
Missing trailing comma in function · Line 150-150Add a trailing comma after `get_dbapi_exception_mapping` on line 150.
Code suggestion
@@ -148,7 +148,7 @@ "dbapi_exception_mapping": has_custom_method( - spec, "get_dbapi_exception_mapping" + spec, "get_dbapi_exception_mapping", ), -
Docstring summary should start first line · Line 207-207Move the docstring summary to the first line for `get_name` function.
Code suggestion
@@ -206,8 +206,7 @@ def get_name(spec: type[BaseEngineSpec]) -> str: - """ - Return a name for a given DB engine spec. - """
-
Docstring summary should start first line · Line 214-214Move the docstring summary to the first line for `format_markdown_table` function.
Code suggestion
@@ -213,8 +213,7 @@ def format_markdown_table(headers: list[str], rows: list[list[Any]]) -> str: - """ - Format headers and rows into a markdown table. - """
-
Use list comprehension instead of loop · Line 221-221Replace the manual loop on line 221 with a list comprehension for better readability and performance.
Code suggestion
@@ -219,7 +219,7 @@ lines.append("| " + " | ".join(["---"] * len(headers)) + " |") for row in rows: - lines.append("| " + " | ".join(str(col) for col in row) + " |") + lines.append("| " + " | ".join([str(col) for col in row]) + " |") -
Docstring summary should start first line · Line 233-233Move the docstring summary to the first line for `generate_focused_table` function.
Code suggestion
@@ -232,8 +232,7 @@ ) -> tuple[list[list[Any]], list[str]]: - """ - Generate a focused table as a 2D list with databases as rows. + """Generate a focused table as a 2D list with databases as rows.
-
Missing blank line after docstring section · Line 243-243Add a blank line after the "Returns" section in the docstring.
Code suggestion
@@ -242,6 +242,7 @@ Returns: Tuple of (2D list representing the table, list of excluded database names) + """ -
Use unpacking instead of concatenation · Line 260-260Use `["Database", *column_labels]` instead of list concatenation on line 260.
Code suggestion
@@ -258,7 +258,7 @@ return [], excluded_dbs # Build headers: Database + feature columns - headers = ["Database"] + column_labels + headers = ["Database", *column_labels] -
Use unpacking instead of concatenation · Line 282-282Use `[headers, *rows]` instead of list concatenation on line 282.
Code suggestion
@@ -279,7 +279,7 @@ rows.append(row) - return [headers] + rows, excluded_dbs -
Docstring summary should start first line · Line 286-286Move the docstring summary to the first line for `calculate_support_level` function.
Code suggestion
@@ -285,8 +285,7 @@ def calculate_support_level(db_info: dict[str, Any], feature_keys: list[str]) -> str: - """ - Calculate support level for a group of features. + """Calculate support level for a group of features.
-
Unnecessary elif after return · Line 306-306Remove the `elif` on line 306 and use `if` instead, as it follows a `return` statement.
Code suggestion
@@ -303,9 +303,9 @@ total = len(feature_keys) if supported == 0: return "Not supported" - elif supported == total: + if supported == total: return "Supported" - else: - return "Partial" -
Docstring summary should start first line · Line 313-313Move the docstring summary to the first line for `generate_feature_tables` function.
Code suggestion
@@ -312,8 +312,7 @@ def generate_feature_tables() -> list[list[list[Any]]]: - """ - Generate multiple focused tables organized by feature categories. + """Generate multiple focused tables organized by feature categories.
-
Use unpacking instead of concatenation · Line 406-406Use `[headers, *rows]` instead of list concatenation on line 406.
Code suggestion
@@ -403,7 +403,7 @@ rows.append(row) - output_tables.append([headers] + rows)
-
Review Details
-
Files reviewed - 3 · Commit Range:
8d73838..c03cae4- docs/docs/configuration/db_engine_specs.mdx
- docs/src/components/DatabaseTable.jsx
- superset/db_engine_specs/docs_lib.py
-
Files skipped - 0
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Default Agent You can customize the agent settings here or contact your Bito workspace admin at [email protected].
Documentation & Help
| "alias_in_orderby": spec.allows_alias_in_orderby, | ||
| "time_groupby_inline": spec.time_groupby_inline, | ||
| "alias_to_source_column": not spec.allows_alias_to_source_column, | ||
| "order_by_not_in_select": spec.allows_hidden_orderby_agg, | ||
| "expressions_in_orderby": spec.allows_hidden_cc_in_orderby, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 'alias_to_source_column' feature is incorrectly inverted with a 'not' operator, causing it to report as unsupported when the database spec actually allows using source columns when aliases overshadow them. This breaks accurate database capability diagnosis, which could mislead users about supported SQL features. Remove the 'not' to correctly reflect the spec's allows_alias_to_source_column attribute.
Code suggestion
Check the AI-generated fix before applying
| "alias_in_orderby": spec.allows_alias_in_orderby, | |
| "time_groupby_inline": spec.time_groupby_inline, | |
| "alias_to_source_column": not spec.allows_alias_to_source_column, | |
| "order_by_not_in_select": spec.allows_hidden_orderby_agg, | |
| "expressions_in_orderby": spec.allows_hidden_cc_in_orderby, | |
| "alias_in_orderby": spec.allows_alias_in_orderby, | |
| "time_groupby_inline": spec.time_groupby_inline, | |
| "alias_to_source_column": spec.allows_alias_to_source_column, | |
| "order_by_not_in_select": spec.allows_hidden_orderby_agg, | |
| "expressions_in_orderby": spec.allows_hidden_cc_in_orderby, |
Code Review Run #7f070d
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
|
|
||
| The password is not the only piece of information where security is critical. For many databases (like BigQuery), sensitive information is stored in the credentials JSON payload. For example: | ||
|
|
||
| ```json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JSON example for BigQuery credentials is incorrect. The encrypted_extra field stores a dictionary with a credentials_info key containing the service account JSON, but the example shows the service account JSON directly. This mismatch could confuse implementers and lead to incorrect masking logic, potentially exposing sensitive private keys in API responses. The example should wrap the service account details under credentials_info to match the actual data structure used by BigQueryEngineSpec.
Code suggestion
Check the AI-generated fix before applying
-{
- "type": "service_account",
- "project_id": "dbt-tutorial-347100",
- "private_key_id": "4bc71f06990c864a590fad8b94be6a5904fc171f",
- "private_key": "<SENSITIVE INFORMATION>",
- "client_email": "[email protected]",
- "client_id": "115666988796889519425",
- "auth_uri": "https://accounts.google.com/o/oauth2/auth",
- "token_uri": "https://oauth2.googleapis.com/token",
- "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
- "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/dbt-user-278%40dbt-tutorial-347100.iam.gserviceaccount.com"
-}
+{
+ "credentials_info": {
+ "type": "service_account",
+ "project_id": "dbt-tutorial-347100",
+ "private_key_id": "4bc71f06990c864a590fad8b94be6a5904fc171f",
+ "private_key": "<SENSITIVE INFORMATION>",
+ "client_email": "[email protected]",
+ "client_id": "115666988796889519425",
+ "auth_uri": "https://accounts.google.com/o/oauth2/auth",
+ "token_uri": "https://oauth2.googleapis.com/token",
+ "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
+ "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/dbt-user-278%40dbt-tutorial-347100.iam.gserviceaccount.com"
+ }
+}
Citations
- Rule Violated: dev-standard.mdc:67
- Rule Violated: AGENTS.md:75
Code Review Run #7f070d
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
| class SqliteEngineSpec(BaseEngineSpec): | ||
|
|
||
| custom_errors: dict[Pattern[str], tuple[str, SupersetErrorType, dict[str, Any]]] = | ||
| COLUMN_DOES_NOT_EXIST_REGEX: ( | ||
| __('We can\'t seem to resolve the column "%(column_name)s"'), | ||
| SupersetErrorType.COLUMN_DOES_NOT_EXIST_ERROR, | ||
| {}, | ||
| ), | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Python code example for SqliteEngineSpec custom_errors has invalid syntax. It attempts to define a dictionary with type annotation and inline assignment, which is not valid Python. This should be a proper dictionary assignment. Incorrect code examples in documentation can mislead developers and cause implementation errors.
Code suggestion
Check the AI-generated fix before applying
| class SqliteEngineSpec(BaseEngineSpec): | |
| custom_errors: dict[Pattern[str], tuple[str, SupersetErrorType, dict[str, Any]]] = | |
| COLUMN_DOES_NOT_EXIST_REGEX: ( | |
| __('We can\'t seem to resolve the column "%(column_name)s"'), | |
| SupersetErrorType.COLUMN_DOES_NOT_EXIST_ERROR, | |
| {}, | |
| ), | |
| } | |
| class SqliteEngineSpec(BaseEngineSpec): | |
| custom_errors = { | |
| COLUMN_DOES_NOT_EXIST_REGEX: ( | |
| __('We can\'t seem to resolve the column "%(column_name)s"'), | |
| SupersetErrorType.COLUMN_DOES_NOT_EXIST_ERROR, | |
| {}, | |
| ), | |
| } |
Citations
- Rule Violated: dev-standard.mdc:67
- Rule Violated: AGENTS.md:75
Code Review Run #7f070d
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
|
|
||
| ### Get metrics on dataset creation | ||
|
|
||
| When a physical dataset is first created, the `get_metrics` class method is called on the table. The base implementation returns the `COUNT(*)` metric, but DB engine specs can override `get_metrics` to return other metrics. This method is useful for semantic layers that contain their own metrics definitions; when Superset connect to them it can automatically create those metrics when a dataset is added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation inaccurately states that the get_metrics class method is called on the table, but based on the codebase, it's actually invoked on the database engine spec via self.db_engine_spec.get_metrics(self, inspector, table) in superset/models/core.py. This misleading description could confuse developers implementing or extending DB engine specs. Additionally, there's a grammar error where 'connect' should be 'connects'. The fix corrects both issues to ensure accurate developer guidance.
Code suggestion
Check the AI-generated fix before applying
| When a physical dataset is first created, the `get_metrics` class method is called on the table. The base implementation returns the `COUNT(*)` metric, but DB engine specs can override `get_metrics` to return other metrics. This method is useful for semantic layers that contain their own metrics definitions; when Superset connect to them it can automatically create those metrics when a dataset is added. | |
| When a physical dataset is first created, the `get_metrics` class method is called on the database engine spec. The base implementation returns the `COUNT(*)` metric, but DB engine specs can override `get_metrics` to return other metrics. This method is useful for semantic layers that contain their own metrics definitions; when Superset connects to them it can automatically create those metrics when a dataset is added. |
Citations
- Rule Violated: AGENTS.md:67
Code Review Run #7f070d
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
| const filteredItems = useMemo( | ||
| () => | ||
| rows.filter( | ||
| row => | ||
| row[0] && row[0].toLowerCase().includes(filter.toLowerCase()), | ||
| ), | ||
| [rows, filter], | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filtering logic calls toLowerCase() on row[0] without checking if it's a string, which will throw a TypeError if row[0] is not a string (e.g., number or null). This can cause runtime crashes when filtering database tables with non-string first columns. Add a type check to ensure row[0] is a string before calling string methods.
Code suggestion
Check the AI-generated fix before applying
| const filteredItems = useMemo( | |
| () => | |
| rows.filter( | |
| row => | |
| row[0] && row[0].toLowerCase().includes(filter.toLowerCase()), | |
| ), | |
| [rows, filter], | |
| ); | |
| const filteredItems = useMemo( | |
| () => | |
| rows.filter( | |
| row => | |
| row[0] && typeof row[0] === 'string' && row[0].toLowerCase().includes(filter.toLowerCase()), | |
| ), | |
| [rows, filter], | |
| ); |
Code Review Run #7f070d
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
| @@ -0,0 +1,689 @@ | |||
| # Licensed to the Apache Software Foundation (ASF) under one | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this pretty much a duplicate of the script that's building today's markdown file? I'm not sure if we should modify that one, or just add this new one, and remove the old one after this PR merges.
Also, should this perhaps live in the lower level /scripts folder or somewhere in the /docs directory, perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, yeah, this is pretty much a copy of it but it generates the table in a slightly different format. I added this just to keep the script backward compatible and keep the /superset/db_engine_specs/README.md maintained as well. I don't think we can move it in the /docs directory as it is bundled with the superset directory.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #36248 +/- ##
===========================================
+ Coverage 0 67.96% +67.96%
===========================================
Files 0 633 +633
Lines 0 46597 +46597
Branches 0 5057 +5057
===========================================
+ Hits 0 31671 +31671
- Misses 0 13664 +13664
- Partials 0 1262 +1262
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Superset uses Git pre-commit hooks courtesy of pre-commit. To install run the following: A series of checks will now run when you make a git commit. Alternatively it is possible to run pre-commit by running pre-commit manually: |
SUMMARY
This PR was made as part of CSCD01 course project. We made this change specifically to add the Database Engine Specifications to the main documentation website for easier access to developers. The files added are as follows:
docs\docs\configuration\db_engine_specs.mdx: This is the main MDX file added to display the engine specifications in the documentation website.docs\src\components\DatabaseTable.jsx: This is a component used in db_engine_specs.mdx to add filters to the table to allow easier lookup as the size of the tables are very large.superset\db_engine_specs\docs_lib.py: Generates the new table format compatible with the mdx file.We can extend the
docs_lib.pyfile by making slight modifications to the mdx file. That will allow us to automate making these tables instantly if that is acceptable.BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
Go to
/docs/configuration/db_engine_specsin the documentation website.ADDITIONAL INFORMATION