Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: io.rst description and code inconsistent, plus the description is for deprecated behaviour #60705

Open
1 task done
wjandrea opened this issue Jan 12, 2025 · 0 comments
Open
1 task done
Labels
Docs Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@wjandrea
Copy link
Contributor

wjandrea commented Jan 12, 2025

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/dev/user_guide/io.html#reading-html-content

Read in the content of the file from the above URL and pass it to read_html as a string:

In [317]: html_str = """
   .....:          <table>
   .....:              <tr>
   .....:                  <th>A</th>
   .....:                  <th colspan="1">B</th>
   .....:                  <th rowspan="1">C</th>
   .....:              </tr>
   .....:              <tr>
   .....:                  <td>a</td>
   .....:                  <td>b</td>
   .....:                  <td>c</td>
   .....:              </tr>
   .....:          </table>
   .....:      """
   .....: 

In [318]: with open("tmp.html", "w") as f:
   .....:     f.write(html_str)
   .....: 

In [319]: df = pd.read_html("tmp.html")

In [320]: df[0]
Out[320]: 
   A  B  C
0  a  b  c

Documentation problems

Problem 1

The "above URL" is

url = 'https://www.sump.org/notes/request/' # HTTP request reflector

but data from that URL is not what's used in the code.

Problem 2

"pass it to read_html as a string" is not what's being demonstrated in the code.

Problem 3

read_html can take an HTML string, but that behaviour is deprecated, per its docs:

Deprecated since version 2.1.0: Passing html literal strings is deprecated. Wrap literal string/bytes input in io.StringIO/io.BytesIO instead.

Suggested fix for documentation

I'm not sure!

@wjandrea wjandrea added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

1 participant