Conversation
- Replace README TODO links with real references and add contributing link - Fix mojibake in SQL pseudo-column question and synced full index link - Add scripts and GitHub Actions workflow to prevent regressions
Default repo checks now verify link targets exist without enforcing anchor matching. Set STRICT_ANCHORS=1 to enable strict heading anchor validation.
|
Cursor Agent can help with this pull request. Just |
Introduce new topic pages (English Q&A) matching existing repo structure and add them to README and the full index.
Introduce four new practice-focused topics with English Q&A in the existing repo format and link them from README and the full index.
Add four new topic pages (English Q&A) in the existing repo format and link them from README and the full index.
There was a problem hiding this comment.
This PR is being reviewed by Cursor Bugbot
Details
You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
|
|
||
|
|
||
| def check_forbidden_text(problems: list[Problem]) -> None: | ||
| paths = [README, FULL, *sorted(CONTENT_DIR.glob("*.md"))] |
There was a problem hiding this comment.
Duplicate file checking causes redundant problem reports
Low Severity
The check_forbidden_text function builds a paths list that includes FULL explicitly and also includes it again via CONTENT_DIR.glob("*.md"). Since FULL is defined as CONTENT_DIR / "full.md", the glob matches the same file, causing full.md to appear twice in the list. This results in full.md being checked twice and any issues in that file being reported as duplicates, which wastes the 200-problem output limit and could hide distinct issues in other files.
Fixes
TODOlinks inREADME.md, corrects content encoding, and adds CI checks with helper scripts to ensure repository quality and maintainability.This PR addresses immediate content quality issues (placeholder links, mojibake) and introduces automated checks via GitHub Actions and Python scripts (
repo_checks.py,generate_full.py) to prevent future regressions, improve content consistency, and validate internal links. ACONTRIBUTING.mdfile is also added to guide future contributions.Note
Expands coverage and hardens repo quality.
content/dbt.md,iceberg.md,hudi.md,cdc.md,data-modeling.md,data-quality.md,observability.md,data-governance.md,cost-optimization.md,python.md,system-design.md; updatescontent/full.mdto index them.github/workflows/repo-checks.ymlto run Python3.12and executescripts/repo_checks.pyscripts/repo_checks.py(checks UTF-8/"mojibake", placeholder links, and validates internal anchors incontent/full.md) andscripts/generate_full.py(buildscontent/full.mdfrom headings)README.mdwith corrected links, added sections (Iceberg/Hudi/dbt/Theory topics), and useful resource links; addsCONTRIBUTING.mdwith local checks/guidelinespseudo-column) incontent/sql.mdand corresponding entries incontent/full.mdWritten by Cursor Bugbot for commit ffa29c0. This will update automatically on new commits. Configure here.