Skip to content

DOC: Document that str.match accepts a regular expression #61879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pandas/core/strings/accessor.py
Original file line number Diff line number Diff line change
Expand Up @@ -1374,7 +1374,7 @@ def match(self, pat: str, case: bool = True, flags: int = 0, na=lib.no_default):
Parameters
----------
pat : str
Character sequence.
Character sequence or regular expression.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by regular expression do you mean a string that is interpreted as a regular expression or a compiled regular expression object?

to avoid confusion, if the former then no doc change probably needed, if the later the type hints in the signature would also need to be updated and some code changes required?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think he meant a compiled regular expression, this is how we are trying to type it in the stubs.
I believe we should align all the docs, since it uses the functions of re under the hood the functions below support re.Pattern so compiled regular expression is also accepted at runtime.
If we look at the docs it seems like it is a bit unclear what regular expression means because I would assume it is just a regular string in the for r"...".
So the question is should we allow for compiled regular expression as it is supported at runtime?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the documentation is the official API. If the stubs have been updated to reflect the types that are accepted then this is the tail wagging the dog? If we update the documentation, then we also need to update the type annotations in the code as well as ensure that the behavior is tested?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point and you are correct, I think the confusion originally came from regular expression != compiled regex.
But then I went into the stubs and it seems like we are testing for it:

def test_replace_compiled_regex_mixed_object():
pat = re.compile(r"BAD_*")
ser = Series(
["aBAD", np.nan, "bBAD", True, datetime.today(), "fooBAD", None, 1, 2.0]
)
result = Series(ser).str.replace(pat, "", regex=True)
expected = Series(
["a", np.nan, "b", np.nan, np.nan, "foo", None, np.nan, np.nan], dtype=object
)
tm.assert_series_equal(result, expected)

So the question would be to clarify what do we mean by regular expression, is it compiled or not, and so we can:

  • clarify the docs
  • update the stubs according to allow or not re.Pattern[str]

Please let us know @simonjayhawkins.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is true that I understood regular expression to mean both string and compiled patterns but this PR is meant to bring the match method in line with the other str methods.

I can open another PR to clarify the docs and update the inline types if there is consensus.

case : bool, default True
If True, case sensitive.
flags : int, default 0 (no flags)
Expand Down
Loading