Skip to content

Inconsitent typing for DataFrame.to_json #1179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sk- opened this issue Apr 4, 2025 · 1 comment · Fixed by #1183
Closed

Inconsitent typing for DataFrame.to_json #1179

sk- opened this issue Apr 4, 2025 · 1 comment · Fixed by #1183
Labels
IO JSON read_json, to_json, json_normalize

Comments

@sk-
Copy link

sk- commented Apr 4, 2025

Describe the bug
DataFrame.to_json does not work with binary buffers, even though the original types do accept them both in the base class NdFrame and in the underlying json library.

This could be fixed by just adding WriteBuffer[bytes] as a valid argument, but probably better would be to restrict that argument to the case when the compression is set (not sure though if there are any other cases when a binary buffer is accepted and no compression is set).

To Reproduce

  1. Provide a minimal runnable pandas example that is not properly checked by the stubs.
import io

import pandas as pd

buffer = io.BytesIO()

df = pd.DataFrame()
df.to_json(buffer, compression="gzip")

print(len(buffer.getvalue()))

Note that if we change the buffer to a StringIO as suggested by the types we get the runtime warning:

pandas_types.py:8: RuntimeWarning: compression has no effect when passing a non-binary object as input.
  df.to_json(buffer, compression="gzip")

and compression is disabled
2. Indicate which type checker you are using (mypy or pyright). Both
3. Show the error message received from that type checker while checking your example.
Mypy

pandas_types.py:8: error: No overload variant of "to_json" of "DataFrame" matches argument types "BytesIO", "str"  [call-overload]
pandas_types.py:8: note: Possible overload variants:
pandas_types.py:8: note:     def to_json(self, path_or_buf: str | PathLike[str] | WriteBuffer[str], *, orient: Literal['records'], date_format: Literal['epoch', 'iso'] | None = ..., double_precision: int = ..., force_ascii: bool = ..., date_unit: Literal['s', 'ms', 'us', 'ns'] = ..., default_handler: Callable[[Any], str | float | bool | list[Any] | dict[Any, Any]] | None = ..., lines: Literal[True], compression: Literal['infer', 'gzip', 'bz2', 'zip', 'xz', 'zstd'] | dict[str, Any] | None = ..., index: bool = ..., indent: int | None = ..., mode: Literal['a']) -> None
pandas_types.py:8: note:     def to_json(self, path_or_buf: None = ..., *, orient: Literal['records'], date_format: Literal['epoch', 'iso'] | None = ..., double_precision: int = ..., force_ascii: bool = ..., date_unit: Literal['s', 'ms', 'us', 'ns'] = ..., default_handler: Callable[[Any], str | float | bool | list[Any] | dict[Any, Any]] | None = ..., lines: Literal[True], compression: Literal['infer', 'gzip', 'bz2', 'zip', 'xz', 'zstd'] | dict[str, Any] | None = ..., index: bool = ..., indent: int | None = ..., mode: Literal['a']) -> str
pandas_types.py:8: note:     def to_json(self, path_or_buf: None = ..., orient: Literal['split', 'records', 'index', 'columns', 'values', 'table'] | None = ..., date_format: Literal['epoch', 'iso'] | None = ..., double_precision: int = ..., force_ascii: bool = ..., date_unit: Literal['s', 'ms', 'us', 'ns'] = ..., default_handler: Callable[[Any], str | float | bool | list[Any] | dict[Any, Any]] | None = ..., lines: bool = ..., compression: Literal['infer', 'gzip', 'bz2', 'zip', 'xz', 'zstd'] | dict[str, Any] | None = ..., index: bool = ..., indent: int | None = ..., mode: Literal['w'] = ...) -> str
pandas_types.py:8: note:     def to_json(self, path_or_buf: str | PathLike[str] | WriteBuffer[str], orient: Literal['split', 'records', 'index', 'columns', 'values', 'table'] | None = ..., date_format: Literal['epoch', 'iso'] | None = ..., double_precision: int = ..., force_ascii: bool = ..., date_unit: Literal['s', 'ms', 'us', 'ns'] = ..., default_handler: Callable[[Any], str | float | bool | list[Any] | dict[Any, Any]] | None = ..., lines: bool = ..., compression: Literal['infer', 'gzip', 'bz2', 'zip', 'xz', 'zstd'] | dict[str, Any] | None = ..., index: bool = ..., indent: int | None = ..., mode: Literal['w'] = ...) -> None
Found 1 error in 1 file (checked 1 source file)

Pyright

pandas_types.py:8:1 - error: No overloads for "to_json" match the provided arguments (reportCallIssue)
pandas_types.py:8:12 - error: Argument of type "BytesIO" cannot be assigned to parameter "path_or_buf" of type "FilePath | WriteBuffer[str]" in function "to_json"
    Type "BytesIO" is not assignable to type "FilePath | WriteBuffer[str]"
      "BytesIO" is not assignable to "str"
      "BytesIO" is incompatible with protocol "PathLike[str]"
        "__fspath__" is not present
      "BytesIO" is incompatible with protocol "WriteBuffer[str]"
        "write" is an incompatible type
          Type "(buffer: ReadableBuffer, /) -> int" is not assignable to type "(__b: AnyStr_con@WriteBuffer, /) -> Any"
            Parameter 1: type "AnyStr_con@WriteBuffer" is incompatible with type "ReadableBuffer"
    ... (reportArgumentType)
2 errors, 0 warnings, 0 informations

Please complete the following information:

  • OS: MacOS
  • OS Version: 15.3.1
  • python version: 3.12.9
  • version of type checker: mypy 1.15.0 (compiled: yes), pyright 1.1.398
  • version of installed pandas-stubs: 2.2.3.250308

Additional context
Add any other context about the problem here.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Apr 4, 2025

Thanks for the report. PR with tests welcome.

I think there are a few things to do here:

  1. Add a new overload with WriteBuffer[bytes] as an acceptable argument (I think then a BytesIO will match that, if not, use io.BufferedWriter) and use compression there as a required overload (if it is required - not sure about that based on the docs)
  2. Do this in both core/frame.pyi and core/series.pyi
  3. I think the docs for to_json() in pandas should be modified to change "file-like object implementing a write() function" to "stream-like object implementing a write() function" - so creating an issue in pandas would be useful there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants