-
Notifications
You must be signed in to change notification settings - Fork 59
AI #1564: Added transpile helper #1560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mike-finopsorg
wants to merge
2
commits into
working_draft
Choose a base branch
from
transpile_helper_script
base: working_draft
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,70 @@ | ||
| # Helper Scripts | ||
|
|
||
| This directory contains utility scripts that assist with maintaining and processing the FOCUS specification documentation. | ||
|
|
||
| ## Requirements | ||
|
|
||
| - Python 3.7+ | ||
| - Dependencies listed in `requirements.txt` | ||
|
|
||
| ### Installation | ||
|
|
||
| Before using any helper scripts, install the required dependencies: | ||
|
|
||
| ```bash | ||
| pip install -r requirements.txt | ||
| ``` | ||
|
|
||
| ## Available Scripts | ||
|
|
||
| ### sql_transpile.py | ||
|
|
||
| A Python script that finds SQL code blocks in Markdown files and transpiles them between different SQL dialects using SQLGlot. | ||
|
|
||
| #### Purpose | ||
|
|
||
| - Extract SQL code blocks from Markdown documentation | ||
| - Detect SQL dialects automatically or use provided hints | ||
| - Transpile SQL between different database dialects (BigQuery, Trino, T-SQL, etc.) | ||
| - Validate SQL syntax across different platforms | ||
|
|
||
| #### Usage | ||
|
|
||
| ```bash | ||
| # Transpile all SQL blocks in markdown files to T-SQL | ||
| ./sql_transpile.py ../*.md --to tsql | ||
|
|
||
| # List all SQL blocks without transpiling | ||
| ./sql_transpile.py ../*.md --list | ||
|
|
||
| # Transpile with dialect preference for detection | ||
| ./sql_transpile.py ../*.md --to bigquery --prefer trino | ||
|
|
||
| # Process specific files | ||
| ./sql_transpile.py ../file1.md ../file2.md --to postgres | ||
| ``` | ||
|
|
||
| #### Options | ||
|
|
||
| - `files`: Markdown files or glob patterns to process | ||
| - `--to`: Target SQL dialect (default: ansi) | ||
| - `--prefer`: Preferred dialect for detection (can be used multiple times) | ||
| - `--list`: Only list SQL blocks found, don't transpile | ||
|
|
||
| #### Supported Dialects | ||
|
|
||
| BigQuery, Trino, Presto, DuckDB, MySQL, PostgreSQL, Snowflake, T-SQL, Spark, Hive, Redshift, SQLite, Oracle, and ANSI SQL. | ||
|
|
||
| #### Dialect Hints | ||
|
|
||
| You can specify a dialect hint in your SQL code blocks: | ||
|
|
||
| ````markdown | ||
| ```sql bigquery | ||
| SELECT * FROM dataset.table | ||
| ``` | ||
|
|
||
| ```sql:trino | ||
| SELECT * FROM catalog.schema.table | ||
| ``` | ||
| ```` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| sqlglot>=20.0.0 |
140 changes: 140 additions & 0 deletions
140
specification/supported_features/helpers/sql_transpile.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,140 @@ | ||
| #!/usr/bin/env python3 | ||
| import argparse | ||
| import glob | ||
| import re | ||
| import sys | ||
| from typing import Optional, List, Tuple | ||
|
|
||
| import sqlglot | ||
| from sqlglot.errors import ParseError | ||
|
|
||
| # Prefer the engines you most commonly use by ordering them first | ||
| CANDIDATE_DIALECTS: List[str] = [ | ||
| "bigquery", "trino", "presto", "duckdb", "mysql", "postgres", | ||
| "snowflake", "tsql", "spark", "hive", "redshift", "sqlite", "oracle", None | ||
| ] | ||
|
|
||
| # Matches fenced code blocks like: | ||
| # ```sql | ||
| # ...code... | ||
| # ``` | ||
| # and also allows a dialect hint after `sql`, e.g.: | ||
| # ```sql trino | ||
| # ```sql:bigquery | ||
| FENCE_RE = re.compile( | ||
| r"```sql(?:[ \t]+([\w:-]+))?[ \t]*\n(.*?)\n```", | ||
| re.IGNORECASE | re.DOTALL | ||
| ) | ||
|
|
||
| def parse_fenced_sql(md_text: str) -> List[Tuple[int, Optional[str], str]]: | ||
| """ | ||
| Return a list of (block_index, hint, sql_text) for each fenced ```sql block. | ||
| 'hint' is an optional dialect hint following 'sql' in the fence. | ||
| """ | ||
| blocks = [] | ||
| for i, m in enumerate(FENCE_RE.finditer(md_text), start=1): | ||
| raw_hint = m.group(1) or "" | ||
| # Normalize hint: accept "trino", "sql:trino", "sql-trino", "trino:xyz" | ||
| hint = raw_hint.strip().lower() | ||
| hint = hint.replace("sql:", "").replace("sql-", "") | ||
| hint = hint.split(":")[0] if ":" in hint else hint # keep left-most token if colon-separated | ||
| hint = hint or None | ||
| sql_text = m.group(2).strip() | ||
| blocks.append((i, hint, sql_text)) | ||
| return blocks | ||
|
|
||
| def detect_dialect(sql: str) -> Optional[str]: | ||
| """Heuristically detect a dialect by attempting to parse with common dialects.""" | ||
| successes = [] | ||
| for d in CANDIDATE_DIALECTS: | ||
| try: | ||
| sqlglot.parse(sql, read=d) | ||
| successes.append(d) | ||
| except ParseError: | ||
| continue | ||
| return successes[0] if successes else None | ||
|
|
||
| def transpile_sql(sql: str, read: Optional[str], write: str) -> str: | ||
| if write is not None and write.lower() == "ansi": | ||
| write = None # SQLGlot uses None for ANSI | ||
| if read is not None and read.lower() == "ansi": | ||
| read = None | ||
| parts = sqlglot.transpile( | ||
| sql, | ||
| read=read, # None => let SQLGlot assume ANSI-ish | ||
| write=write, | ||
| pretty=True, | ||
| ) | ||
| return ";\n".join(p.strip() for p in parts if p.strip()) | ||
|
|
||
| def main(): | ||
| ap = argparse.ArgumentParser( | ||
| description="Find ```sql blocks in Markdown, detect dialect, and transpile with SQLGlot." | ||
| ) | ||
| ap.add_argument("files", nargs='+', help="Markdown files or glob patterns (e.g., 'docs/*.md' or file1.md file2.md)") | ||
| ap.add_argument("--to", default="ansi", help="Target dialect (default: ansi)") | ||
| ap.add_argument("--prefer", action="append", default=[], | ||
| help="Prefer this dialect in detection (can be given multiple times).") | ||
| ap.add_argument("--list", action="store_true", | ||
| help="Only list sql blocks found, do not transpile.") | ||
| args = ap.parse_args() | ||
|
|
||
| # If user provided preferred dialects, move them to the front of the candidate list | ||
| if args.prefer: | ||
| preferred = [d.lower() for d in args.prefer] | ||
| uniq = [] | ||
| for d in preferred + CANDIDATE_DIALECTS: | ||
| if d not in uniq: | ||
| uniq.append(d) | ||
| CANDIDATE_DIALECTS[:] = uniq | ||
|
|
||
| # Collect all files from patterns and individual files | ||
| all_files = [] | ||
| for pattern_or_file in args.files: | ||
| matched = glob.glob(pattern_or_file) | ||
| if matched: | ||
| all_files.extend(matched) | ||
| else: | ||
| # If no glob match, treat as individual file | ||
| all_files.append(pattern_or_file) | ||
|
|
||
| files = sorted(set(all_files)) # Remove duplicates and sort | ||
| if not files: | ||
| print("No files found.", file=sys.stderr) | ||
| sys.exit(1) | ||
|
|
||
| for path in files: | ||
| with open(path, "r", encoding="utf-8") as f: | ||
| md_text = f.read() | ||
|
|
||
| blocks = parse_fenced_sql(md_text) | ||
|
|
||
| print(f"=== {path} ===") | ||
| if not blocks: | ||
| print("(no ```sql code blocks found)\n") | ||
| continue | ||
|
|
||
| for idx, hint, sql in blocks: | ||
| print(f"[block #{idx}]") | ||
| if args.list: | ||
| print(f" hint: {hint or '(none)'}") | ||
| print(f" first line: {sql.splitlines()[0] if sql.splitlines() else ''}") | ||
| continue | ||
| if hint not in CANDIDATE_DIALECTS: | ||
| hint = None # ignore unrecognized hints | ||
| read = hint or detect_dialect(sql) | ||
|
|
||
| detected_label = read or "ansi (fallback)" | ||
| try: | ||
| out = transpile_sql(sql, read=read, write=args.to) | ||
| print(f"Detected: {detected_label} -> {args.to}") | ||
| print(out) | ||
| except ParseError as e: | ||
| print(f"Detected: {detected_label} -> {args.to}") | ||
| print(f"ERROR: {e}", file=sys.stderr) | ||
| print() # spacing between blocks | ||
|
|
||
| print() # spacing between files | ||
|
|
||
| if __name__ == "__main__": | ||
| main() |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to consider: