Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(memtable): reusing same name for new memtable does not overwrite in REPL #10131

Closed
1 task done
IndexSeek opened this issue Sep 14, 2024 · 2 comments · Fixed by #10133
Closed
1 task done

bug(memtable): reusing same name for new memtable does not overwrite in REPL #10131

IndexSeek opened this issue Sep 14, 2024 · 2 comments · Fixed by #10133
Labels
bug Incorrect behavior inside of ibis
Milestone

Comments

@IndexSeek
Copy link
Contributor

IndexSeek commented Sep 14, 2024

What happened?

Creating an ibis.memtable using the same name as a previously defined ibis.memtable retains the existing data from when it was first defined. What's particularly interesting is that this only seems to take place in a REPL environment.

How I came across this

I was in an IPython session attempting to create a table from an ibis.memtable created from a pandas DataFrame on the MSSQL backend. It was taking a little while, so I cancelled it and recreated a new memtable with the same name with only the first 100 rows to speed things up.

What's happening

>>> df
   a  b
0  1  a
1  2  b
2  3  c

>>> t = ibis.memtable(df, name="t")
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a     ┃ b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│     1 │ a      │
│     2 │ b      │
│     3 │ c      │
└───────┴────────┘

Now, I only want the first row in the pandas DataFrame. If I use df.head(1) in an effort to only grab the top row but keep the same memtable name, here's what happens:

>>> t = ibis.memtable(df.head(1), name="t")
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a     ┃ b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│     1 │ a      │
│     2 │ b      │
│     3 │ c      │
└───────┴────────┘

But if we use a different name as an arg, here's what happens:

>>> t = ibis.memtable(df.head(1), name="t_1_row")
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a     ┃ b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│     1 │ a      │
└───────┴────────┘

Sample code/data to repro

import pandas as pd
from ibis.interactive import *

data = dict(
    a=[1, 2, 3],
    b=["a", "b", "c"],
)

df = pd.DataFrame(data)
image

What version of ibis are you using?

9.5.0

More specifically, up to the most recent commit on main at 12a235c.

What backend(s) are you using, if any?

None

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@IndexSeek IndexSeek added the bug Incorrect behavior inside of ibis label Sep 14, 2024
@cpcloud
Copy link
Member

cpcloud commented Sep 15, 2024

Thanks for the report!

This is because the way we detect whether to re-register a memtable is based on its name alone, rather than the entire operation.

I can take a look at whether this is possible to make it work as would be expected.

@IndexSeek
Copy link
Contributor Author

You're welcome! I can't imagine too many users would be doing this sort of thing, but I did find it a little odd.

Thank you for the explanation and for looking into a fix.

@github-actions github-actions bot added this to the 10.0 milestone Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
Archived in project
2 participants