Skip to content

Conversation

delsner
Copy link
Member

@delsner delsner commented Sep 19, 2025

Motivation

Using \d in regular expressions for string columns results in non-ascii digits in sampled values. This is unexpected and breaks with the usual behavior of using \d as a metacharacter for ASCII digits only, i.e., [0-9].

Changes

  • Sanitize regular expressions passed to sample() by replacing \d with [0-9]
  • Add test cases for sampling ASCII digits for \d

Copy link

codecov bot commented Sep 19, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (1049f81) to head (d898669).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #149   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           50        50           
  Lines         2901      2901           
=========================================
  Hits          2901      2901           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@delsner delsner marked this pull request as draft September 19, 2025 16:24
@delsner delsner changed the title test: Improve test coverage for sampling digits via regex fix: Replace \d with [0-9] to avoid sampling non-ascii digits Oct 17, 2025
@github-actions github-actions bot added the fix label Oct 17, 2025
@delsner delsner changed the title fix: Replace \d with [0-9] to avoid sampling non-ascii digits fix: Replace \d with [0-9] in regexes to avoid sampling non-ascii digits Oct 17, 2025
@delsner delsner marked this pull request as ready for review October 17, 2025 08:51
@delsner delsner removed the test label Oct 17, 2025
@delsner delsner requested a review from borchero October 17, 2025 09:44
Copy link
Member

@borchero borchero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@delsner delsner merged commit 20c89f4 into main Oct 17, 2025
22 checks passed
@delsner delsner deleted the improve-regex-test-coverage branch October 17, 2025 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants