Skip to content

Conversation

@shubhmgrg
Copy link

SUMMARY

This PR was made as part of CSCD01 course project. We made this change specifically to add the Database Engine Specifications to the main documentation website for easier access to developers. The files added are as follows:

  • docs\docs\configuration\db_engine_specs.mdx: This is the main MDX file added to display the engine specifications in the documentation website.
  • docs\src\components\DatabaseTable.jsx: This is a component used in db_engine_specs.mdx to add filters to the table to allow easier lookup as the size of the tables are very large.
  • superset\db_engine_specs\docs_lib.py: Generates the new table format compatible with the mdx file.

We can extend the docs_lib.py file by making slight modifications to the mdx file. That will allow us to automate making these tables instantly if that is acceptable.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

Go to /docs/configuration/db_engine_specs in the documentation website.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@github-actions github-actions bot added the doc Namespace | Anything related to documentation label Nov 25, 2025
Copy link
Contributor

@bito-code-review bito-code-review bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Agent Run #7f070d

Actionable Suggestions - 5
  • superset/db_engine_specs/docs_lib.py - 1
  • docs/docs/configuration/db_engine_specs.mdx - 3
  • docs/src/components/DatabaseTable.jsx - 1
    • Unsafe type assumption in filter logic · Line 50-57
Additional Suggestions - 20
  • superset/db_engine_specs/docs_lib.py - 20
    • Boolean positional argument in function · Line 231-231
      Consider using a keyword-only argument or dataclass for the `preserve_order` boolean parameter.
      Code suggestion
       @@ -225,7 +225,7 @@
        def generate_focused_table(
            info: dict[str, dict[str, Any]],
            feature_keys: list[str],
            column_labels: list[str],
            filter_fn: Any = None,
            value_extractor: Any = None,
      -    preserve_order: bool = False,
      +    *,
      +    preserve_order: bool = False,
    • zip() without explicit strict parameter · Line 679-679
      Add `strict=True` parameter to `zip()` on line 679 to ensure both iterables have the same length.
      Code suggestion
       @@ -676,7 +676,7 @@
                "Operational & Advanced Features",
            ]
       
      -    for title, table in zip(titles, tables):
    • Missing parameter documentation · Line 225-225
      Add documentation for the `preserve_order` parameter in the docstring of `generate_focused_table`.
      Code suggestion
       @@ -233,6 +233,7 @@
                feature_keys: List of feature keys to extract from db_info
                column_labels: List of column header labels
                filter_fn: Optional function to filter databases (receives db_info dict)
                value_extractor: Optional function to extract value (receives db_info, key)
      +        preserve_order: If True, preserve original order; otherwise sort by database name
       
            Returns:
    • Missing module-level docstring · Line 1-1
      Add a module-level docstring at the top of the file to document the purpose and functionality of this module.
      Code suggestion
       @@ -17,6 +17,10 @@
        # under the License.
       
        from __future__ import annotations
      +
      +"""Documentation generation utilities for database engine specifications.
      +
      +This module provides functions to diagnose and document features of database engine specs.
      +"""
        from typing import Any
    • Docstring summary should start first line · Line 85-85
      Move the docstring summary to the first line of the docstring for `has_custom_method` function.
      Code suggestion
       @@ -84,9 +84,8 @@
        def has_custom_method(spec: type[BaseEngineSpec], method: str) -> bool:
      -    """
      -    Check if a class has a custom implementation of a method.
      +    """Check if a class has a custom implementation of a method.
       
            Since some classes don't inherit directly from ``BaseEngineSpec`` we need
            to check the attributes of the spec and the base class.
            """
    • Missing trailing comma in function · Line 95-95
      Add a trailing comma after the last argument in the `return` statement on line 95.
      Code suggestion
       @@ -91,7 +91,7 @@
            return bool(
                getattr(spec, method, False)
                and getattr(BaseEngineSpec, method, False)
                and getattr(spec, method).__qualname__
      -        != getattr(BaseEngineSpec, method).__qualname__
      +        != getattr(BaseEngineSpec, method).__qualname__,
            )
    • Docstring summary should start first line · Line 100-100
      Move the docstring summary to the first line for `diagnose` function.
      Code suggestion
       @@ -99,8 +99,7 @@
        def diagnose(spec: type[BaseEngineSpec]) -> dict[str, Any]:
      -    """
      -    Run basic diagnostics on a given DB engine spec.
      -    """
    • Missing trailing comma in function · Line 147-147
      Add a trailing comma after `get_extra_table_metadata` on line 147.
      Code suggestion
       @@ -145,7 +145,7 @@
                    "file_upload": spec.supports_file_upload,
                    "get_extra_table_metadata": has_custom_method(
      -                spec, "get_extra_table_metadata"
      +                spec, "get_extra_table_metadata",
                    ),
    • Missing trailing comma in function · Line 150-150
      Add a trailing comma after `get_dbapi_exception_mapping` on line 150.
      Code suggestion
       @@ -148,7 +148,7 @@
                    "dbapi_exception_mapping": has_custom_method(
      -                spec, "get_dbapi_exception_mapping"
      +                spec, "get_dbapi_exception_mapping",
                    ),
    • Docstring summary should start first line · Line 207-207
      Move the docstring summary to the first line for `get_name` function.
      Code suggestion
       @@ -206,8 +206,7 @@
        def get_name(spec: type[BaseEngineSpec]) -> str:
      -    """
      -    Return a name for a given DB engine spec.
      -    """
    • Docstring summary should start first line · Line 214-214
      Move the docstring summary to the first line for `format_markdown_table` function.
      Code suggestion
       @@ -213,8 +213,7 @@
        def format_markdown_table(headers: list[str], rows: list[list[Any]]) -> str:
      -    """
      -    Format headers and rows into a markdown table.
      -    """
    • Use list comprehension instead of loop · Line 221-221
      Replace the manual loop on line 221 with a list comprehension for better readability and performance.
      Code suggestion
       @@ -219,7 +219,7 @@
            lines.append("| " + " | ".join(["---"] * len(headers)) + " |")
            for row in rows:
      -        lines.append("| " + " | ".join(str(col) for col in row) + " |")
      +        lines.append("| " + " | ".join([str(col) for col in row]) + " |")
    • Docstring summary should start first line · Line 233-233
      Move the docstring summary to the first line for `generate_focused_table` function.
      Code suggestion
       @@ -232,8 +232,7 @@
        ) -> tuple[list[list[Any]], list[str]]:
      -    """
      -    Generate a focused table as a 2D list with databases as rows.
      +    """Generate a focused table as a 2D list with databases as rows.
    • Missing blank line after docstring section · Line 243-243
      Add a blank line after the "Returns" section in the docstring.
      Code suggestion
       @@ -242,6 +242,7 @@
            Returns:
                Tuple of (2D list representing the table, list of excluded database names)
      +
            """
    • Use unpacking instead of concatenation · Line 260-260
      Use `["Database", *column_labels]` instead of list concatenation on line 260.
      Code suggestion
       @@ -258,7 +258,7 @@
                return [], excluded_dbs
       
            # Build headers: Database + feature columns
      -    headers = ["Database"] + column_labels
      +    headers = ["Database", *column_labels]
    • Use unpacking instead of concatenation · Line 282-282
      Use `[headers, *rows]` instead of list concatenation on line 282.
      Code suggestion
       @@ -279,7 +279,7 @@
       
                rows.append(row)
       
      -    return [headers] + rows, excluded_dbs
    • Docstring summary should start first line · Line 286-286
      Move the docstring summary to the first line for `calculate_support_level` function.
      Code suggestion
       @@ -285,8 +285,7 @@
        def calculate_support_level(db_info: dict[str, Any], feature_keys: list[str]) -> str:
      -    """
      -    Calculate support level for a group of features.
      +    """Calculate support level for a group of features.
    • Unnecessary elif after return · Line 306-306
      Remove the `elif` on line 306 and use `if` instead, as it follows a `return` statement.
      Code suggestion
       @@ -303,9 +303,9 @@
            total = len(feature_keys)
            if supported == 0:
                return "Not supported"
      -    elif supported == total:
      +    if supported == total:
                return "Supported"
      -    else:
      -        return "Partial"
    • Docstring summary should start first line · Line 313-313
      Move the docstring summary to the first line for `generate_feature_tables` function.
      Code suggestion
       @@ -312,8 +312,7 @@
        def generate_feature_tables() -> list[list[list[Any]]]:
      -    """
      -    Generate multiple focused tables organized by feature categories.
      +    """Generate multiple focused tables organized by feature categories.
    • Use unpacking instead of concatenation · Line 406-406
      Use `[headers, *rows]` instead of list concatenation on line 406.
      Code suggestion
       @@ -403,7 +403,7 @@
                rows.append(row)
      -    output_tables.append([headers] + rows)
Review Details
  • Files reviewed - 3 · Commit Range: 8d73838..c03cae4
    • docs/docs/configuration/db_engine_specs.mdx
    • docs/src/components/DatabaseTable.jsx
    • superset/db_engine_specs/docs_lib.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Default Agent You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

Comment on lines +127 to +131
"alias_in_orderby": spec.allows_alias_in_orderby,
"time_groupby_inline": spec.time_groupby_inline,
"alias_to_source_column": not spec.allows_alias_to_source_column,
"order_by_not_in_select": spec.allows_hidden_orderby_agg,
"expressions_in_orderby": spec.allows_hidden_cc_in_orderby,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect feature reporting logic

The 'alias_to_source_column' feature is incorrectly inverted with a 'not' operator, causing it to report as unsupported when the database spec actually allows using source columns when aliases overshadow them. This breaks accurate database capability diagnosis, which could mislead users about supported SQL features. Remove the 'not' to correctly reflect the spec's allows_alias_to_source_column attribute.

Code suggestion
Check the AI-generated fix before applying
Suggested change
"alias_in_orderby": spec.allows_alias_in_orderby,
"time_groupby_inline": spec.time_groupby_inline,
"alias_to_source_column": not spec.allows_alias_to_source_column,
"order_by_not_in_select": spec.allows_hidden_orderby_agg,
"expressions_in_orderby": spec.allows_hidden_cc_in_orderby,
"alias_in_orderby": spec.allows_alias_in_orderby,
"time_groupby_inline": spec.time_groupby_inline,
"alias_to_source_column": spec.allows_alias_to_source_column,
"order_by_not_in_select": spec.allows_hidden_orderby_agg,
"expressions_in_orderby": spec.allows_hidden_cc_in_orderby,

Code Review Run #7f070d


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them


The password is not the only piece of information where security is critical. For many databases (like BigQuery), sensitive information is stored in the credentials JSON payload. For example:

```json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong BigQuery credentials JSON example

The JSON example for BigQuery credentials is incorrect. The encrypted_extra field stores a dictionary with a credentials_info key containing the service account JSON, but the example shows the service account JSON directly. This mismatch could confuse implementers and lead to incorrect masking logic, potentially exposing sensitive private keys in API responses. The example should wrap the service account details under credentials_info to match the actual data structure used by BigQueryEngineSpec.

Code suggestion
Check the AI-generated fix before applying
 -{
 -  "type": "service_account",
 -  "project_id": "dbt-tutorial-347100",
 -  "private_key_id": "4bc71f06990c864a590fad8b94be6a5904fc171f",
 -  "private_key": "<SENSITIVE INFORMATION>",
 -  "client_email": "[email protected]",
 -  "client_id": "115666988796889519425",
 -  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
 -  "token_uri": "https://oauth2.googleapis.com/token",
 -  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
 -  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/dbt-user-278%40dbt-tutorial-347100.iam.gserviceaccount.com"
 -}
 +{
 +  "credentials_info": {
 +    "type": "service_account",
 +    "project_id": "dbt-tutorial-347100",
 +    "private_key_id": "4bc71f06990c864a590fad8b94be6a5904fc171f",
 +    "private_key": "<SENSITIVE INFORMATION>",
 +    "client_email": "[email protected]",
 +    "client_id": "115666988796889519425",
 +    "auth_uri": "https://accounts.google.com/o/oauth2/auth",
 +    "token_uri": "https://oauth2.googleapis.com/token",
 +    "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
 +    "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/dbt-user-278%40dbt-tutorial-347100.iam.gserviceaccount.com"
 +  }
 +}
 
Citations

Code Review Run #7f070d


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them

Comment on lines +1120 to +1128
class SqliteEngineSpec(BaseEngineSpec):

custom_errors: dict[Pattern[str], tuple[str, SupersetErrorType, dict[str, Any]]] =
COLUMN_DOES_NOT_EXIST_REGEX: (
__('We can\'t seem to resolve the column "%(column_name)s"'),
SupersetErrorType.COLUMN_DOES_NOT_EXIST_ERROR,
{},
),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid dict syntax in SQLite example

The Python code example for SqliteEngineSpec custom_errors has invalid syntax. It attempts to define a dictionary with type annotation and inline assignment, which is not valid Python. This should be a proper dictionary assignment. Incorrect code examples in documentation can mislead developers and cause implementation errors.

Code suggestion
Check the AI-generated fix before applying
Suggested change
class SqliteEngineSpec(BaseEngineSpec):
custom_errors: dict[Pattern[str], tuple[str, SupersetErrorType, dict[str, Any]]] =
COLUMN_DOES_NOT_EXIST_REGEX: (
__('We can\'t seem to resolve the column "%(column_name)s"'),
SupersetErrorType.COLUMN_DOES_NOT_EXIST_ERROR,
{},
),
}
class SqliteEngineSpec(BaseEngineSpec):
custom_errors = {
COLUMN_DOES_NOT_EXIST_REGEX: (
__('We can\'t seem to resolve the column "%(column_name)s"'),
SupersetErrorType.COLUMN_DOES_NOT_EXIST_ERROR,
{},
),
}
Citations

Code Review Run #7f070d


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them


### Get metrics on dataset creation

When a physical dataset is first created, the `get_metrics` class method is called on the table. The base implementation returns the `COUNT(*)` metric, but DB engine specs can override `get_metrics` to return other metrics. This method is useful for semantic layers that contain their own metrics definitions; when Superset connect to them it can automatically create those metrics when a dataset is added.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct get_metrics documentation and grammar

The documentation inaccurately states that the get_metrics class method is called on the table, but based on the codebase, it's actually invoked on the database engine spec via self.db_engine_spec.get_metrics(self, inspector, table) in superset/models/core.py. This misleading description could confuse developers implementing or extending DB engine specs. Additionally, there's a grammar error where 'connect' should be 'connects'. The fix corrects both issues to ensure accurate developer guidance.

Code suggestion
Check the AI-generated fix before applying
Suggested change
When a physical dataset is first created, the `get_metrics` class method is called on the table. The base implementation returns the `COUNT(*)` metric, but DB engine specs can override `get_metrics` to return other metrics. This method is useful for semantic layers that contain their own metrics definitions; when Superset connect to them it can automatically create those metrics when a dataset is added.
When a physical dataset is first created, the `get_metrics` class method is called on the database engine spec. The base implementation returns the `COUNT(*)` metric, but DB engine specs can override `get_metrics` to return other metrics. This method is useful for semantic layers that contain their own metrics definitions; when Superset connects to them it can automatically create those metrics when a dataset is added.
Citations

Code Review Run #7f070d


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them

Comment on lines +50 to +57
const filteredItems = useMemo(
() =>
rows.filter(
row =>
row[0] && row[0].toLowerCase().includes(filter.toLowerCase()),
),
[rows, filter],
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsafe type assumption in filter logic

The filtering logic calls toLowerCase() on row[0] without checking if it's a string, which will throw a TypeError if row[0] is not a string (e.g., number or null). This can cause runtime crashes when filtering database tables with non-string first columns. Add a type check to ensure row[0] is a string before calling string methods.

Code suggestion
Check the AI-generated fix before applying
Suggested change
const filteredItems = useMemo(
() =>
rows.filter(
row =>
row[0] && row[0].toLowerCase().includes(filter.toLowerCase()),
),
[rows, filter],
);
const filteredItems = useMemo(
() =>
rows.filter(
row =>
row[0] && typeof row[0] === 'string' && row[0].toLowerCase().includes(filter.toLowerCase()),
),
[rows, filter],
);

Code Review Run #7f070d


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them

@@ -0,0 +1,689 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this pretty much a duplicate of the script that's building today's markdown file? I'm not sure if we should modify that one, or just add this new one, and remove the old one after this PR merges.

Also, should this perhaps live in the lower level /scripts folder or somewhere in the /docs directory, perhaps?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, yeah, this is pretty much a copy of it but it generates the table in a slightly different format. I added this just to keep the script backward compatible and keep the /superset/db_engine_specs/README.md maintained as well. I don't think we can move it in the /docs directory as it is bundled with the superset directory.

@codecov
Copy link

codecov bot commented Nov 26, 2025

Codecov Report

❌ Patch coverage is 12.92517% with 128 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.96%. Comparing base (ab8352e) to head (c03cae4).
⚠️ Report is 15 commits behind head on master.

Files with missing lines Patch % Lines
superset/db_engine_specs/docs_lib.py 12.92% 128 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #36248       +/-   ##
===========================================
+ Coverage        0   67.96%   +67.96%     
===========================================
  Files           0      633      +633     
  Lines           0    46597    +46597     
  Branches        0     5057     +5057     
===========================================
+ Hits            0    31671    +31671     
- Misses          0    13664    +13664     
- Partials        0     1262     +1262     
Flag Coverage Δ
hive 43.70% <12.92%> (?)
mysql 67.08% <12.92%> (?)
postgres 67.13% <12.92%> (?)
presto 47.29% <12.92%> (?)
python 67.93% <12.92%> (?)
sqlite 66.75% <12.92%> (?)
unit 100.00% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@rusackas rusackas changed the title docs:Added Database Engine Specifications to the Official Documentation Website docs: Added Database Engine Specifications to the Official Documentation Website Nov 26, 2025
@rusackas
Copy link
Member

Superset uses Git pre-commit hooks courtesy of pre-commit. To install run the following:

pip3 install -r requirements/development.txt
pre-commit install

A series of checks will now run when you make a git commit.

Alternatively it is possible to run pre-commit by running pre-commit manually:

pre-commit run --all-files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc Namespace | Anything related to documentation size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants