Skip to content

BUG: Raise MergeError when suffixes result in duplicate column names … #61422

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Farsidetfs
Copy link

This PR addresses GH#61402 by ensuring that merge() raises a MergeError if the specified suffixes fail to eliminate column name collisions.

The suffix logic now explicitly checks for overlaps after applying suffixes and raises a clear error if duplicates remain.

Includes a test in test_merge.py to confirm that suffixes like ('_dup', '_dup') raise the expected error when merging conflicting column names.

Closes GH#61402.

@Farsidetfs Farsidetfs force-pushed the fix-merge-suffixes-61402 branch 2 times, most recently from fe71e1a to c9bfc5a Compare May 10, 2025 02:48
@Farsidetfs
Copy link
Author

pre-commit.ci autofix

@Farsidetfs Farsidetfs force-pushed the fix-merge-suffixes-61402 branch from dc1bb47 to ad744df Compare May 10, 2025 05:54
@chilin0525
Copy link
Contributor

Thanks for your contribution!

Just a quick note: you don't need to write GH#61402 in the PR description — simply using #61402 in PR description, is enough, GitHub will automatically link it 😀.
Also, since this PR addresses a bug, please make sure to:

  • Add a unit test that covers this case
  • Include an entry in the doc/source/whatsnew/vx.y.z.rst file to document your fix

For reference, you can check the contributing guidelines here: https://pandas.pydata.org/docs/development/contributing_codebase.html#documenting-your-code

@Farsidetfs
Copy link
Author

Thanks for the pointers. I'll get those added in here soon. Trying to track down why the Unit Tests / Linux-32-bit(pull_request) is failing. I didn't change anything that should have effected Series, so it's kinda weird.

I also can't get the pytest to run normally on my dev yet either, so I haven't been able to fully replicate the failure locally yet. So, still a little more work to do here.

@chilin0525
Copy link
Contributor

@Farsidetfs I believe the CI failure is not related to your changes. It appears to be caused by the cython version — pandas unit tests fail with cython==3.1.0. You may notice that the same test failures have occurred in several recent PRs as well. I already address the issue in #61423.

Copy link

@nikaltipar nikaltipar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few comments from a bird's eye view. Thanks for the change.

Comment on lines 3064 to 3073
if len(left_collisions) > 0:
raise MergeError(
"Passing 'suffixes' which cause duplicate columns "
f"{set(left_collisions)} is not allowed"
)
if len(right_collisions) > 0:
raise MergeError(
"Passing 'suffixes' which cause duplicate columns "
f"{set(right_collisions)} is not allowed"
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend you combine this into a common error to reduce repetition (bonus points for combining it with the pre-existing error just a few lines below)

# Check for duplicates created by suffixes
left_collisions = llabels.intersection(right.difference(to_rename))
right_collisions = rlabels.intersection(left.difference(to_rename))
if len(left_collisions) > 0:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't just if not left_collisions.empty: work? Same for a similar check below.

@@ -3058,6 +3058,20 @@ def renamer(x, suffix: str | None):
llabels = left._transform_index(lrenamer)
rlabels = right._transform_index(rrenamer)

# Check for duplicates created by suffixes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For new readers of this code, the comment might not be descriptive enough. While your code is supposed to find suffixes that are caused by duplicated would-be created columns across dataframes, there is an extra section that does a duplicate checking just below your new code (but would-be created columns due to duplicity within the same dataframe).

@Farsidetfs Farsidetfs force-pushed the fix-merge-suffixes-61402 branch from ad744df to 9a40cd0 Compare May 12, 2025 20:45
@Farsidetfs
Copy link
Author

pre-commit.ci autofix

@Farsidetfs Farsidetfs force-pushed the fix-merge-suffixes-61402 branch 3 times, most recently from e9efd0d to 1476957 Compare May 12, 2025 22:35
@Farsidetfs
Copy link
Author

pre-commit.ci autofix

@Farsidetfs Farsidetfs force-pushed the fix-merge-suffixes-61402 branch from cb23817 to 6b82c85 Compare May 13, 2025 00:02
@Farsidetfs
Copy link
Author

pre-commit.ci autofix

@Farsidetfs Farsidetfs force-pushed the fix-merge-suffixes-61402 branch from 5aff565 to e62c70f Compare May 13, 2025 01:11
@Farsidetfs
Copy link
Author

pre-commit.ci autofix

@Farsidetfs
Copy link
Author

@nikaltipar I think this should be ready now. Please let me know if I've missed anything. I took your advice and combined the two with slight modifications to improve efficiency using sets throughout rather than just convert at the end.

@nikaltipar
Copy link

@nikaltipar I think this should be ready now. Please let me know if I've missed anything. I took your advice and combined the two with slight modifications to improve efficiency using sets throughout rather than just convert at the end.

Thanks for taking care of that, @Farsidetfs ! It looks good to me, no other comments from my side. Thanks for adding the unit-tests, too!

@chilin0525
Copy link
Contributor

@nikaltipar Could you rebase main branch to trigger CI again?

@nikaltipar
Copy link

@nikaltipar Could you rebase main branch to trigger CI again?

I am not able to, I'll have to wait for @Farsidetfs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants