Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grep: add -o option #353

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

andrewliebenow
Copy link
Contributor

No description provided.

@andrewliebenow andrewliebenow force-pushed the grep-dash-o branch 3 times, most recently from 0c6ae8b to 86ffd89 Compare October 26, 2024 06:12
@andrewliebenow
Copy link
Contributor Author

Added an additional change to prevent grep from aborting/failing when the input contains non-UTF-8 bytes

@jgarzik
Copy link
Contributor

jgarzik commented Nov 3, 2024

Questions,

  1. I see that BSD and Linux grep both have -o. Is this option widely used?

  2. This is a significant change. Is there a performance impact, vs current code? A microbenchmark would be nice.

@andrewliebenow
Copy link
Contributor Author

I ran into a build script that was trying to use -o. I can't remember which program it was I was compiling, I'll try to find it.

Most of the changes are related to switching from UTF-8 types to raw bytes types. It's not too uncommon to want to use text processing tools on "mostly UTF-8 files" (in my case I was trying to search my shell history file, which somehow got some non-UTF-8 bytes in it).

Performance is mostly unchanged:

❯ hyperfine ' ./grep-after-pull-request -F adjective ./webster ' ' ./grep-before-pull-request -F adjective ./webster '
Benchmark 1:  ./grep-after-pull-request -F adjective ./webster 
  Time (mean ± σ):      96.5 ms ±   1.2 ms    [User: 88.4 ms, System: 7.1 ms]
  Range (min … max):    95.2 ms … 100.3 ms    30 runs
 
Benchmark 2:  ./grep-before-pull-request -F adjective ./webster 
  Time (mean ± σ):      94.7 ms ±   0.8 ms    [User: 87.1 ms, System: 6.8 ms]
  Range (min … max):    93.5 ms …  96.3 ms    30 runs
 
Summary
   ./grep-before-pull-request -F adjective ./webster  ran
    1.02 ± 0.02 times faster than  ./grep-after-pull-request -F adjective ./webster 
❯ hyperfine ' ./grep-after-pull-request adjective ./webster ' ' ./grep-before-pull-request adjective ./webster '                                                                 
Benchmark 1:  ./grep-after-pull-request adjective ./webster 
  Time (mean ± σ):     191.8 ms ±   3.9 ms    [User: 181.6 ms, System: 8.7 ms]
  Range (min … max):   183.3 ms … 195.6 ms    15 runs
 
Benchmark 2:  ./grep-before-pull-request adjective ./webster 
  Time (mean ± σ):     210.7 ms ±   3.5 ms    [User: 202.0 ms, System: 7.1 ms]
  Range (min … max):   206.6 ms … 220.4 ms    14 runs
 
Summary
   ./grep-after-pull-request adjective ./webster  ran
    1.10 ± 0.03 times faster than  ./grep-before-pull-request adjective ./webster 

"./webster" is "The 1913 Webster Unabridged Dictionary" from https://sun.aei.polsl.pl/~sdeor/index.php?page=silesia (just under 40 mebibytes).

@andrewliebenow
Copy link
Contributor Author

I couldn't find the program I had been trying to compile, but here's an interesting statistic:

I have a directory containing 42 Git repositories I've cloned. 4 of those repositories contain scripts that use grep -o:

https://github.com/hinto-janai/festival
https://github.com/landley/toybox
https://github.com/libjxl/libjxl
https://github.com/uutils/coreutils

@ghuls
Copy link

ghuls commented Feb 6, 2025

grep -o in combination with grep's regular expressions support is a quite useful combination:

# Get all https URLs.
grep -Eo 'https://.*' text.txt

# Get numbers.
grep -Eo '[0-9]+' text.txt

@jgarzik jgarzik added the enhancement New feature or request label Apr 2, 2025
@jgarzik jgarzik requested a review from Copilot April 2, 2025 21:26
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces support for the -o option in grep along with several test updates and improved error messages. Key changes include renaming the test file constant from BAD_INPUT_FILE to NONEXISTENT_FILE, updating error message formats to include the "grep:" prefix, and adding tests for the new -o option functionality.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
text/tests/grep/mod.rs Renamed file constants, updated error messages, and added tests for the -o option
text/Cargo.toml Added the memchr dependency
Comments suppressed due to low confidence (2)

text/tests/grep/mod.rs:35

  • The parameter name was updated from 'test_data' to 'pipe_to_stdin' to improve clarity. Ensure that any related documentation or comments are updated to reflect this change.
fn grep_test(

text/tests/grep/mod.rs:1415

  • Consider adding tests for scenarios where multiple matches occur within a single line when using the '-o' option to further ensure correct behavior.
fn test_dash_o() {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants