Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disambiguate :type: as a directive option or field-list item #13242

Merged
merged 1 commit into from
Jan 15, 2025

Conversation

AA-Turner
Copy link
Member

@AA-Turner AA-Turner commented Jan 14, 2025

Purpose

It is currently very confusing. We should make it less so.

Notably, @erlend-aasland's change failed to create a cross-reference, with the following rST:

 .. attribute:: TarInfo.mtime
-   :type: int | float
+   :type: int or float

This is because :type: is interepreted as the directive option, which runs through sphinx.domains.python._annotations._parse_annotation() and employs an ast unparser. The correct change would have been:

 .. attribute:: TarInfo.mtime
-   :type: int | float
+
+   :type: int or float

The extra blank line seperates directive arguments and options from their content. In this context, :type: becomes a field list entry, which is then parsed with sphinx.domains.python._object.PyXrefMixin._delimiters_re, a regular expression that includes the ability to split on 'or'.


Note that I'm not suggesting that any of this makes sense, but at least we can improve the documentation as a first step...

References

Preview

https://sphinx--13242.org.readthedocs.build/en/13242/usage/domains/python.html#directive-option-py-attribute-type

https://sphinx--13242.org.readthedocs.build/en/13242/usage/domains/python.html#info-field-lists

cc @erlend-aasland @nedbat

A

@AA-Turner AA-Turner added this to the 8.2.0 milestone Jan 14, 2025
@AA-Turner AA-Turner merged commit 182c831 into sphinx-doc:master Jan 15, 2025
7 checks passed
@gvanrossum
Copy link

So, int | float is "a valid Python expression" but int or float is not? That seems am odd definition of "Python expression" -- presumably I am missing something? Also this would seem to disallow things like file-like in Sphinx types?

@AA-Turner
Copy link
Member Author

I'm not sure what the right term is, I meant that int or float is 'invalid' in this context because it is rejected by type checkers. Sphinx has two effective modes at the moment, a 'strict' mode where only "valid type expressions" are allowed, and a 'relaxed' mode where or and :term:`file-like object` are parsed.

I posted a bit more on Discourse, as I think Sphinx should attempt to standardise on one or the other -- I was quite surprised to find this behaviour whilst debugging Erlend's original issue.

@AA-Turner AA-Turner deleted the doc/attribute-type branch January 15, 2025 15:55
@gvanrossum
Copy link

gvanrossum commented Jan 15, 2025 via email

@AA-Turner
Copy link
Member Author

AA-Turner commented Jan 21, 2025

Sorry for the slight delay.

I'm proposing to re-word the warning/admonition (#13256) to:

The valid syntax for the :type: directive option differs from the syntax for the :type: info field. The :type: directive option does not understand reStructuredText markup or the or or of keywords, meaning unions must use | and sequences must use square brackets, and roles such as :ref:`...` cannot be used.

Hopefully this more accurately describes the differences between the two options.

What is "runtime typing" mentioned in your Discourse post?

Type annotations as seen in Python programmes and that pass type checkers, as opposed to prose-like descriptions of valid types ("list[int] | None" vs "list of int or None").

A

@electric-coder
Copy link

electric-coder commented Jan 21, 2025

@AA-Turner given the current 2025 resources the change should contain the exact term type expression with this link, first mention seems to be in PEP 747, to put into context:

About this specification

This document was created following the acceptance of PEP 729 to serve as a specification for the Python type system

Finding the exact taxonomy is a "gotcha" since it's not in the official docs; neither in the Data Model nor Expressions.

@gvanrossum
Copy link

Sorry for the slight delay.

I'm proposing to re-word the warning/admonition (#13256) to:

Proposing in this PR (which was merged) or in new PR? Consider me dumb but I am not good at mind reading any more.

The valid syntax for the :type: directive option differs from the syntax for the :type: info field. The :type: directive option does not understand reStructuredText markup or the or or of keywords, meaning unions must use | and sequences must use square brackets, and roles such as :ref:`...` cannot be used.

Given that I've been writing TypeScript for the past few months (!), "sequences must use square brackets" sounds ambiguous. Can you just clarify that they should use list[...] or Sequence[...] etc.?

Hopefully this more accurately describes the differences between the two options.

What is "runtime typing" mentioned in your Discourse post?

Type annotations as seen in Python programmes and that pass type checkers, as opposed to prose-like descriptions of valid types ("list[int] | None" vs "list of int or None").

That is the opposite of runtime typing. :-) Those are statically checkable types (using mypy, pyright etc.). Runtime typing refers to the runtime type checks that the runtime does based on the actual (not inferred, nor user-declared) object types present at runtime like str, int and list.

As @electric-coder already pointed out, the phrase to use for that syntax is type expressions.

I am still unclear why Sphinx decided to make its own syntax for types -- is that feature so ancient that it predated PEP 484? Anyway, since the rest of the EB seems to prefer it, I am okay with using it, as long as it is supported everywhere. Otherwise we'd end up with two different syntaxes being used, depending on what kind of thing it describes, and that feels even more confusing (for end users and occasional documentation writers alike).

@AA-Turner
Copy link
Member Author

Proposing in this PR (which was merged) or in new PR?

Ah, sorry -- the updated text is in PR #13256. I merged that a few hours ago to avoid having incorrect/imprecise text on the website, but happy to make further improvement if there's remaining imprecision.

I mainly follow typing developments through Discourse and PEPs so my vocabulary isn't great, I have ended up using "assignment expression" as I think this is the closest match between a defined/specified term and what Sphinx expects.

Given that I've been writing TypeScript for the past few months (!)

Wow!

"sequences must use square brackets" sounds ambiguous. Can you just clarify that they should use list[...] or Sequence[...] etc.?

Sounds good, I will update the text tomorrow.

That is the opposite of runtime typing. :-) Those are statically checkable types (using mypy, pyright etc.). Runtime typing refers to the runtime type checks that the runtime does based on the actual (not inferred, nor user-declared) object types present at runtime like str, int and list.

As @electric-coder already pointed out, the phrase to use for that syntax is type expressions.

I've used 'annotation expressions' in the update to the text, which indeed are of course static typing -- I was trying to distinguish between content in .rst files and .py and was using entirely the wrong terminology -- sorry for causing confusion here!

I am still unclear why Sphinx decided to make its own syntax for types -- is that feature so ancient that it predated PEP 484?

Sphinx's or/of syntax is from 2008, so a few years before 484! Another relic is that container types can be subscripted with brackets (eg dict(str, int)) which I only discovered due to these discussions!

Anyway, since the rest of the EB seems to prefer it, I am okay with using it, as long as it is supported everywhere. Otherwise we'd end up with two different syntaxes being used, depending on what kind of thing it describes, and that feels even more confusing (for end users and occasional documentation writers alike).

If I had a free choice, I would deprecate the 2008-style or/of syntax now with a view to removing it from Sphinx. I agree that having two syntaxes is confusing, even for experienced users.

Does the editorial board have a strong view either way? Would the EB prefer that we keep the prose-style type syntax?

@gvanrossum
Copy link

I will ask the EB next time we meet. Last time we talked there was a lot of confusion over what was supported where and what could be easily added to the supported syntax.

A majority of the EB prefers the ‘or’ syntax; it’s the first time I hear about the ‘of’ syntax.

We have a bit more clarity now over what’s supported (two separate :type: thingies with different capabilities).

It would be nice to know:

  • How feasible it is to make both :type: thingies support ‘or’ (and I guess also ‘of’ and whatever else is there).
  • How feasible it would be to add support for markup like :ref: to the one that only supports type expressions (I.e., ‘|’).

Depending on which is easier we might decide differently.

@nedbat Anything I missed or mischaracterized? @Mariatta? @willingc?

@electric-coder
Copy link

electric-coder commented Jan 24, 2025

A majority of the EB prefers the ‘or’ syntax

This is rooted in "tradition", if we randomly google top 20 Python libraries most are scientific/numerical and dominated by the old type syntax (int or str works for simple numerical stuff).

‘or’ (and I guess also ‘of’ and whatever else is there).

There's also a with, e.g. numpy.strings.islower

Parameters:

xarray_like, with StringDType, bytes_ or str_ dtype
outndarray, None, or tuple of ndarray and None, optional

example of and with e.g. plotly.express.line_ternary

line_dash_sequence (list of str)
line_dash_map (dict with str keys and str values (default {}))

Contrast with a more modern pytest_report_teststatus (much better for long combinations of complex nominal types):

:Return type:

TestShortLogReport | tuple[str, str, str | tuple[str, Mapping[str, bool]]]


The main issue is: what should be adopted by the the official Python documentation?

@nedbat I think the initial argument that int or str is easier isn't true, modern type expressions only require [], | and ,. By adopting the old syntax you're pointing new users in the wrong direction, namely to the past instead of to the future.

@merwok
Copy link
Contributor

merwok commented Jan 24, 2025

Yes int or str is easier, as it reads like english prose!

@electric-coder
Copy link

electric-coder commented Jan 24, 2025

as it reads like english prose!

That is an old mentality incompatible with Python:

PEP 3131 - Rationale - 2007

Python code is written by many people in the world who are not familiar with the English language, or even well-acquainted with the Latin writing system.

In light of PEP 8 (the EB discussion at hand is about the standard library documentation):

Introduction - 2001

conventions for the Python code comprising the standard library

(...)

A style guide is about consistency.


An inventory of style guides and consistency (and their problems in taxonomy):

  1. three docstring styles:

    1.1. reStructuredText style (?) (first major problem: The Sphinx documentation has never mentioned a proper name for the old style - every time it's mentioned users don't know what to call it exactly. Looking at the Sphinx docs it's just dubbed the "Are you tired of writing docstrings that look like this" style. Is it officially reST style, infolist list, or something else? No one knows.)

    1.2. Google style
    1.3. Numpy style

  2. three type expression styles:

    2.1. PEP 484 type expressions (list[str | int]).

    2.2. or, of, with (second and third major problems: this style lacks a proper name/taxonomy, what to call it? And it lacks a PEP 20 - "one-- and preferably only one --obvious way to do it." style guide. If you put this into the stdlib, where are you going to refer readers to? - Because nowhere in the Numpy style guides is there a comprehensive overview synthesized.)

    2.3. list(str) or set(int) - recently a mixture of round bracket Subscription with prose emerged (in obvious contradiction of Python Grammar syntax). Look no further - list(str) is still included in both the classic Numpy style and Google style napoleon guides - these long standing examples should be sanitized ASAP as a stumbling stone for every beginner who ever wrote a typed Python docstring. As far as I can see they shouldn't even be mentioned going forward to minimize adding confusion. (Forth and fifth major problems: is there a multi-language use case that warrants it? And, what's this type expression style's proper name?)


The upcoming EB decision needs foremost to be very clear on these fundamental problems. Because by not addressing them (hopefully with the community doing most of the hard work here) your are propagating preexisting historical issues that have gone unresolved thus far - and, what's worst, enshrining these ongoing problems into the standard lib docs.


And notice very carefully one thing: the prose type expressions have never been formalized neither as mandatory nor recommended by the numpydoc style. They are just an example that became practice (in the sense of a habit) and predating PEP 484 continues propagating and proliferating - whereas PEP 484 type expressions are already uniform and unambiguously defined.

@gvanrossum
Copy link

You are preaching to the choir here, I agree that type expressions ought to be the future (but I am having a hard time convincing the rest of the EB, who admittedly have more experience in what newbies can understand).

I do want to call out that IMO the EB's opinion was asked about the Python stdlib docs, which are never generated from docstrings, but written by hand to maximize clarity. The stdlib docstrings never use Sphinx syntax. So what Numpy and others who use typed docstrings do is of no concern here. We only care about the Sphinx "code" we write to generate nice cross-linked function signatures.

@AA-Turner
Copy link
Member Author

I agree that type expressions ought to be the future (but I am having a hard time convincing the rest of the EB, who admittedly have more experience in what newbies can understand).

I believe at the moment that the pre PEP 484-style syntax isn't in use at all in Python's documentation. As such, I'd be hesistant to start introducing it, as the PEP 484 style is used in several places (search for ^ +:type).

How feasible it is to make both :type: thingies support ‘or’ (and I guess also ‘of’ and whatever else is there).

The PEP 484-style :type: parser uses ast.parse(), so quite easy. The only concern would be operator precedence if some future extension to the static typing syntax used comparisons, not, or and.

How feasible it would be to add support for markup like :ref: to the one that only supports type expressions (I.e., ‘|’).

More difficult, for the same reason (ast.parse(':ref:`file-like object` | None') is obviously a syntax error). However, the 2008-style parser just uses re.split() with a pattern, so it can't support Python's PEG grammar, so it isn't possible to just switch to that parser for everything.


Taking the motivating example of path-like object, I wonder if it is the wrong solution to use a type annotation here? I would document the parameter possibly without a type, or with ..., and then describe it in prose.

If useful, I could try and find the time to attend the EB meeting in Feb, but I don't know how helpful that would be!

A

@gvanrossum
Copy link

IMO the goal of having types in the documentation is not to have the exact type found in the code -- while str | list[str] is great, many practical examples are to convoluted, and better written out in words. Maybe we can come up with some convention where certain more complex types are written as blah_blah where that is recognized as either a glossary entry or something to be explained in the free-form part of the documentation. Still no :ref:, but it should be enough. For example a useful one that I imagine would be useful in many places might be stream_or_filename. Another example: in the asyncio stubs in typeshed I found this type: Callable[[Unpack[_Ts]], object] -- we definitely do not want to put that in the docs!

@gvanrossum
Copy link

If useful, I could try and find the time to attend the EB meeting in Feb, but I don't know how helpful that would be!

I think that would be super useful! I'll check if the rest of the EB has any objections. It's Tue Feb 11 at 1:30pm US Pacific time. (Sorry, I know that's rather late for you.)

@electric-coder
Copy link

electric-coder commented Jan 25, 2025

So what Numpy and others who use typed docstrings do is of no concern here.

An argument needs to be made: the first place a dev looks to for inspiration is the standard library. It sets the trends so a new type-prose-syntax adds to the learning curve instead of easing it - those beginner devs will find the proposed syntax to be a waste of their time (and they'll resent there's more than one such syntax to learn, because PEP 484 type expressions are also part of the standard library).

At least that's how I feel about it, having been a newbie more recently than the EB members.

@gvanrossum
Copy link

So taking a step back, why does Sphinx need to parse the type expression at all? Why can’t it just treat it as regular text?

@electric-coder
Copy link

electric-coder commented Jan 25, 2025

why does Sphinx need to parse the type expression at all?

It needs to be parsed for the link substitutions to be processed and resolved. So the authors can write an int in the .rst that gets transformed into <a href="https://docs.python.org/3/library/functions.html#int">int</a> for the readers to click thus making the type in doc completely unambiguous.

So that in the built docs you get this:

(int | str)

instead of this:

(int | str)

Up until three years ago the feature was still marked as "experimental" in Sphinx, but it's stable enough now that it's become the expected norm in the ecosystem's documentation. (Quite tricky because Python is dynamic and a lot of magic goes into resolving and linking the types).

@nedbat
Copy link
Contributor

nedbat commented Jan 25, 2025

@electric-coder I appreciate your advocacy. These are not easy questions.

You quoted:

Python code is written by many people in the world who are not familiar with the English language,

We accommodate non-English speakers by translating the docs. It doesn't make sense to choose a typing style that avoids English when the vast majority of the page content is in English.

I agree with @gvanrossum here:

IMO the goal of having types in the documentation is not to have the exact type found in the code -- while str | list[str] is great, many practical examples are too convoluted, and better written out in words.

@electric-coder It's easy to champion int | str, but what will you do with Callable[[Unpack[_Ts]], object]? How should we document that type?

Maybe we can come up with some convention where certain more complex types are written as blah_blah where that is recognized as either a glossary entry or something to be explained in the free-form part of the documentation.

The goal of documentation is to explain and educate. Sometimes that is best accomplished with precision (using the exact Python types), but sometimes words work better. How can we let authors choose the approach that works best for the situation?

@electric-coder as an aside, you've characterized the English approach as old, outdated, and the past. This is unfair. There are reasons English sometimes works better in documentation than formal type syntax, which can get quite complex.

@gvanrossum
Copy link

gvanrossum commented Jan 25, 2025

@electric-coder

It needs to be parsed for the link substitutions to be processed and resolved. So the authors can write an int in the .rst that gets transformed into <a href="https://docs.python.org/3/library/functions.html#int">int</a> for the readers to click thus making the type in doc completely unambiguous.

For that, all you need to do is tokenize, and for doc usage you don't need to do a very thorough job (no need to know that := is one token, just that abc_123 is one identifier.

But if you do have a full parser, you could write "file-like" and all is well, since string literals are valid in type expressions (normally meaning forward references, which if you squint enough, "file-like" is.

@gvanrossum
Copy link

gvanrossum commented Jan 25, 2025

@nedbat

@electric-coder It's easy to champion int | str, but what will you do with Callable[[Unpack[_Ts]], object]? How should we document that type?

A couple of choices:

  • Just Callable might be sufficient, with its signature explained in the text. (This was also a generic function, and I don't think we should use that notation in the docs!). We should auto-link Callable to a glossary entry or to its docs in typing.py.
  • The term callback-function which can be auto-linked to a glossary entry. (See my previous post for how we can render that if we choose type expressions.)
  • Some name or hyphenated phrase made up by the author of the current module/function's docs, to be explained in the text below.

In all cases the text has to explain the signature of the callback.

I'm pretty sure we can use this tactic for all types deemed too complex to appear in the docs.

EDIT

The goal of documentation is to explain and educate. Sometimes that is best accomplished with precision (using the exact Python types), but sometimes words work better. How can we let authors choose the approach that works best for the situation?

I think my proposal here can solve that by letting authors choose between str | list[str] and a brief phrase that describes it in words, in quotation marks.

@nedbat
Copy link
Contributor

nedbat commented Jan 25, 2025

I think my proposal here can solve that by letting authors choose between str | list[str] and a brief phrase that describes it in words, in quotation marks.

It would be great to see this in action. Ideally the paragraph styling would be similar for the two choices.

@electric-coder
Copy link

electric-coder commented Jan 26, 2025

but what will you do with Callable[[Unpack[_Ts]], object]?

Here's how a documentation writer can't do it: first they get hit by the double #11007 meaning neither typing.Unpack nor typing.TypeVarTuple link to their anchors in typing.py's doc. Then, Python's docs aren't written with autodoc meaning you don't get the autodoc_type_aliases facility to at least try and alias your way out of it, and again without autodoc you're deprived of Docstring preprocessing... Next step would be using docutils -that thanks to @AA-Turner initiative is being migrated to GitHub- but continues without having a fully documented API available. Thus, upwards of 95% devs are left without any practical means to do it.


Conceptually there'd be several choices of explicitness, e.g. for call_soon:

    def call_soon(
        self, callback: Callable[[Unpack[_Ts]], object], *args: Unpack[_Ts], context: Context | None = None
    ) -> Handle: ...

Use an alias for _Ts linking directly to typing.TypeVarTuple

.. py:function:: call_soon(callback, *args, context) -> Handle

    :param callback: A :term:`term-callback` receiving a variadic generic.
    :type callback: typing.Callable[[typing.Unpack[\*_Ts]], object]

would look like:

Callable[Unpack[*_Ts], object]

or

Callable[[*_Ts], object]

The usual choice is that if you're explaining it in prose then omit :type callback: and just explain it in :param callback:. It's how default and sentinel arguments are often explained.

@gvanrossum
Copy link

@electric-coder, you seem to be missing entirely the point. The rhetorical question was not asking how to get that complicated type expression to render and properly link to other doc entries. Ned meant that this is clearly not something we want to see in docs, so complaints about why rendering this is broken or suggestions on how it can be rendered fail to address the point entirely. And I also fail to understand why you keep bringing up autodoc etc. -- we have no choice in that matter.

What Ned asked for in the end is something that shows how you can describe a function argument or object attribute using something like

:type: str | path-like

I just did a little experiment with the TarInfo.mode attribute, and learned the following:

  • If I write :type: int | path-like it is rendered as int | path-like, with no link on int.
  • If I write :type: int | path_like it is rendered as int | path_like, with a link on int.
  • If I write :type: int | "path-like" it is rendered as int | 'path-like', with a link on int.

From this I conclude that if it's a valid type expression it is linkified, while if it's not a valid type expression (and apparently path-like is deemed invalid, not considered a subtraction, which is indeed not a valid type), it's rendered as found, without any linkifying.

I still don't understand why it needs to parse this before it can linkify it (I'd think tokenizing would be enough), but that seems to be what it does, and from a comment above I understand it's hard to get it changed.

@gvanrossum
Copy link

Also, I grepped the Doc directory, and found that actually there is almost no use of :type: (which is used for attributes).

What I found:

  • 17 attributes in Doc/library/tarfile.rst
  • 1 use in Doc/library/enum.rst that appears to be an unrelated use of :type: -- source

So only one file out of 521 .rst files under Doc uses :type: for attributes properly.

The syntax for parameters is actually different, you write :type param_name: text where text may include markup like :class:`name` or :ref:`code object <code-objects>` etc. A good example is sqlite3.

As you can see, this uses str | None where str is auto-linked (the source markup is simply str | None), as well as path-like object linking to the glossary -- here the source markup is :term:`path-like object` . I have found little evidence of "parsing" this type -- it just seems to render what you type, interpreting markup correctly. (Well, there's a little bit of parsing. If you write str or or int, the int isn't auto-linked. But if you write str or or or int it is! This points to a regex-based "parser" all right. :-)

I think I have exhausted my fact-finding research here. We'll discuss our answer at the EB meeting on Feb 11.

@gvanrossum
Copy link

Here are the docs for the two things in Sphinx:

@electric-coder
Copy link

I just did a little experiment with the TarInfo.mode attribute, and learned the following:

If I write :type: int | path-like it is rendered as int | path-like, with no link on int.
If I write :type: int | path_like it is rendered as int | path_like, with a link on int.
If I write :type: int | "path-like" it is rendered as int | 'path-like', with a link on int.

From this I conclude that if it's a valid type expression it is linkified,

I don't think it's that simple (ping @picnixz ) there's a thread centralizing over 50 linking bugs and the best available explanation is give in #9813 . Most were surfaced by autodoc but a lot are caused by how imports/types are handled by the Sphinx machinery. I can't explain it further but last I heard this is something that will need to be fixed over the next couple of years.

@electric-coder
Copy link

electric-coder commented Jan 26, 2025

It doesn't make sense to choose a typing style that avoids English when the vast majority of the page content is in English.

The only thing that has to be in English are exposed objects and keywords (those don't translate). The proposal goes further, by saying type expressions should also include English. But I have yet to see one clear fact in support of that claim (that's what bothers me - and others will think the same). Besides the obvious technical problems that enshrining a new prose-type-expression style in the Python docs raises for the ecosystem, there's a long history of... besides also a history of... so just saying "it's English" isn't self-evident for me, out of technical reasons and otherwise.

@gvanrossum
Copy link

The EB has come to a conclusion, with AA-Turner's help. Suggestion to get back to this in the Discourse thread.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 12, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants