-
Notifications
You must be signed in to change notification settings - Fork 128
implement actualtext, artifact from recent graphicx #2684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
6d07dc3 to
32635a0
Compare
|
ngpdf (which implements the derivation from pdf to html) uses a |
|
Thanks @u-fischer for the clarifications; that really helps! It's perhaps quite fortuitous that we're also dealing with a related issue (ACM's Both these systems potentially supply material for two distinct use cases: (1) replacement text to be spoken in place of a given image or construct or (2) descriptive text used when a reader asks for more information. Of course, that distinction will easily be confused, and the way that the material is best delivered to the reader will depend on the eventual medium or delivery device. The And, although some of the documentation for these encourage using concise language free of markup, authors will do what they want, so I'm more inclined to store this data in XML elements, rather than attributes; and in particular, I'd want to avoid diving directly into aria and defer that until postprocessing (or XSLT) when the target format is dealt with. |
|
As much as I appreciate the broader discussion on related work (thank you for raising awareness!), I am a little worried we'll hurt the primary use case without good cause. Citing the graphicx documentation:
The ngpdf implementation seems to be in conflict with the actualtext description (to me), as HTML as a format uses images, very similarly to PDF. I think keeping the image and adding an accessible label fits more naturally. As to handling I am quite happy with Vincenzo's approach - there is a lot of work already done by the ARIA group which is clearly documented and standard. It will be great if we can use that without reinventing the full breadth of their spec with alternatives in LaTeXML's schema. We can also separately create XSLT rules mapping any attributes in the |
No, the graphicx documentation doesn't try to explain when to use the keys, it only says that the keys exist and that their value will be passed to PDF keys ( ngpdf then handles these keys and it does it as specified in the derivation algorithm. |
Well, the text I read exactly explains the intent of when to use the keys. What am I missing? I understand there are direct mappings for accessible PDF, but that isn't a supported target of LaTeXML. |
|
The description only says "ActualText text for accessibility uses", which doesn't say much unless you know what the PDF term The actual documentation about the intent of the keys is in latex-lab-graphic.pdf |
|
The key snippet from the description is "or as the text to use in formats that do not use the image." That is also extremely standard - text alternatives are needed when the reader doesn't consume the image for some reason or another. But most of latexml's target formats can use the underlying image asset. So they should, in my opinion. |
Yes, but that too is not very specific as it doesn't say which format are used. How you handle the keys is really up to you, but do not take alternative text and actualtext as two words for the same thing. In PDF there is a clear distinction and PDF is here different to html. Users do not expect to hear "graphic" if actualtext is used, see e.g. nvaccess/nvda#18843. |
|
Addressing several points.... Next, aria is indeed quite complex, and I wouldn't want to reinvent a whole alternative representation for it within LaTeXML. On the other hand, exactly because of that, if we've already cast the data into aria I have zero expectation that we'll be able to deconstruct it afterwards to reassemble into some other representation such as JATS. Thus I'd prefer a more abstract representation at the earlier, XML, stage. Experience with too early insertion of too much CSS would seem to have been a lesson. |
PDF is a major output format of LaTeX, and so it is unavoidable that there are keys and commands that target mostly PDF specific needs, e.g. pdftitle or pdflang from hyperref, or pdfversion and pdfstandard in For such keys other output formats will have to decide if they ignore them or map to something in their format.
well |
I completely understand & agree; it's unavoidable there's going to be pdf specific data in LaTeX. My point was that some things (like accessibility), the data are not pdf-specific (though the implementation will be); in those cases avoiding too much pdf specificity in the documentation would be good (for users, implementers, converters & latex longevity in general).
Apparently authors are using it on arXiv; I'd be surprised if the maintainer hasn't got some plan for tying it into your pdf-accessibility machinery if not already. But my reason for mentioning it here was purely from a LaTeXML POV. Its intention as an object "description" (rather than speech replacement) seems clear, but I'm trying to assure that "descriptions", whatever package/binding they come from, are represented & treated consistently within LaTeXML. |
|
This generated more discussion than I expected! I see two sticking points:
My intention was to introduce initial support that can be merged in 0.9 so I will still campaign for merging as is, but I agree that the XML should preserve the info. I'd rather use additional attributes and/or children of the |
|
I'll take the view that the discussion only sounds more contentious than it really is. I'm not sure I understand @xworld21 first sticking point; Don't the two pieces of information we're discussing just map to aria's description (for alt and So for me, the sticking point is whether I can convince yall that using elements for these two (which would need to be within the meta-class and be allowed content within an |
|
Changing to using an element to hold the accessible text, giving it an Reinventing a custom LaTeXML technique to do the pointing just doesn't strike me as helpful. We'll increase the learning curve and confuse anyone external about what LaTeXML is trying to do differently. I am quite happy we are using the |
| # Trickery to set @description EVEN IF empty string (constructor shorthand omits it) | ||
| $node->setAttribute(description => ToString($actualtext // $alt // '')); | ||
| # when both actualtext and alt are specified, use aria-description for alt | ||
| $document->setAttribute($node, 'aria:description', ToString($alt)) if defined $alt && defined $actualtext; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be aria:label?
Perhaps you were thrown by the awkward naming of ltx:graphics@description (as I was), which is a stand-in for html's img@alt, whose specification seems to shift over time.
The graphics keywords alt and actualtext are apparently only very subtly different; there's not (yet) a key for a detailed description, which is what aria:description should be (as I understand it).
Fix #2679 more or less as we discussed, but with more care about how the keys interact with each other:
@descriptionis set to empty unless actualtext, alt are presentThe latter might be slightly controversial. I played quickly with VoiceOver and it reads both alt and aria-description in sequence, with a proper pause (feels like "actualtext: alt"), which seems reasonable.
We could add a further heuristic that if actualtext and alt are equal, then just
@descriptionis enough.I have added a test with all 8 combinations to clarify the intended semantics.