Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for "role" attributes for the DocBook reader #10665

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open
Changes from 8 commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
f361555
Adding support for "role" attributes for the DocBook reader
yanntrividic Feb 19, 2025
193fa21
Making lines shorter than 80 characters.
yanntrividic Feb 26, 2025
48f14ee
Wrapping elements with role attribute in a Div when needed
yanntrividic Mar 4, 2025
2ab3c3f
Merge branch 'jgm:main' into main
yanntrividic Mar 4, 2025
2af15a6
Modifying the approach following the advice in https://github.com/jgm…
yanntrividic Mar 6, 2025
578c9c1
Merge branch 'jgm:main' into main
yanntrividic Mar 6, 2025
f0827ee
When inlines are of type "emphasis", don't add "role" attributes
yanntrividic Mar 6, 2025
57d61da
Merge branch 'main' of https://github.com/yanntrividic/pandoc
yanntrividic Mar 6, 2025
007e0c8
Removing some code to avoid double execution of addPandocAttributes o…
yanntrividic Mar 9, 2025
f4109d6
Restoring the code from 2af15a6c425a16b5b59d117f43a6aab71b99f22e rega…
yanntrividic Mar 9, 2025
17d3ad2
Putting back again the discrimination between `emphasis` and other In…
yanntrividic Mar 10, 2025
a232161
Wrapping section in Div so that role attributes don't get propagated …
yanntrividic Mar 12, 2025
3703757
Attempt at solving parsing for emphasis elements (see https://github.…
yanntrividic Mar 12, 2025
06214b6
Got things mixed up, function addPandocAttributes added to the return…
yanntrividic Mar 12, 2025
e104e39
Removing attempt from 370375728f298a6f00340bb36eb5405214be35bd for a …
yanntrividic Mar 13, 2025
fa815ce
Headers were getting the role attributes, now only sections do!
yanntrividic Mar 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 20 additions & 12 deletions src/Text/Pandoc/Readers/DocBook.hs
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ import Text.Pandoc.Builder
import Text.Pandoc.Class.PandocMonad (PandocMonad, report)
import Text.Pandoc.Options
import Text.Pandoc.Logging (LogMessage(..))
import Text.Pandoc.Shared (safeRead, extractSpaces)
import Text.Pandoc.Shared (safeRead, extractSpaces, addPandocAttributes)
import Text.Pandoc.Sources (ToSources(..), sourcesToText)
import Text.Pandoc.Transforms (headerShift)
import Text.TeXMath (readMathML, writeTeX)
Expand Down Expand Up @@ -851,15 +851,19 @@ getBlocks :: PandocMonad m => Element -> DB m Blocks
getBlocks e = mconcat <$>
mapM parseBlock (elContent e)

getRoleAttr :: Element -> [(Text, Text)] -- extract role attribute and add it to the attribute list
getRoleAttr e = case attrValue "role" e of
"" -> []
r -> [("role", r)]

parseBlock :: PandocMonad m => Content -> DB m Blocks
parseBlock (Text (CData CDataRaw _ _)) = return mempty -- DOCTYPE
parseBlock (Text (CData _ s _)) = if T.all isSpace s
then return mempty
else return $ plain $ trimInlines $ text s
parseBlock (CRef x) = return $ plain $ str $ T.toUpper x
parseBlock (Elem e) =
case qName (elName e) of
parseBlock (Elem e) = do
parsedBlock <- case qName (elName e) of
"toc" -> skip -- skip TOC, since in pandoc it's autogenerated
"index" -> skip -- skip index, since page numbers meaningless
"para" -> parseMixed para (elContent e)
Expand Down Expand Up @@ -973,6 +977,7 @@ parseBlock (Elem e) =
"title" -> return mempty -- handled in parent element
"subtitle" -> return mempty -- handled in parent element
_ -> skip >> getBlocks e
return $ addPandocAttributes (getRoleAttr e) parsedBlock
where skip = do
let qn = qName $ elName e
let name = if "pi-" `T.isPrefixOf` qn
Expand Down Expand Up @@ -1025,8 +1030,8 @@ parseBlock (Elem e) =
parseTable = do
let elId = attrValue "id" e
let attrs = case attrValue "tabstyle" e of
"" -> []
x -> [("custom-style", x)]
"" -> getRoleAttr e
x -> ("custom-style", x) : getRoleAttr e
let classes = T.words $ attrValue "class" e
let isCaption x = named "title" x || named "caption" x
capt <- case filterChild isCaption e of
Expand Down Expand Up @@ -1099,7 +1104,8 @@ parseBlock (Elem e) =
modify $ \st -> st{ dbSectionLevel = n }
b <- getBlocks e
modify $ \st -> st{ dbSectionLevel = n - 1 }
return $ headerWith (elId, classes, maybeToList titleabbrevElAsAttr++attrs) n' headerText <> b
return $ headerWith (elId, classes, maybeToList titleabbrevElAsAttr++attrs++getRoleAttr e) n'
headerText <> b
titleabbrevElAsAttr =
case filterChild (named "titleabbrev") e `mplus`
(filterChild (named "info") e >>=
Expand All @@ -1122,9 +1128,8 @@ parseBlock (Elem e) =
b <- p
case mbt of
Nothing -> return b
Just t -> return $ divWith (attrValue "id" e,[],[])
(divWith ("", ["title"], []) (plain t) <> b)

Just t -> return $ divWith (attrValue "id" e, [], getRoleAttr e) -- Updated!
(divWith ("", ["title"], getRoleAttr e) (plain t) <> b)
-- Admonitions are parsed into a div. Following other Docbook tools that output HTML,
-- we parse the optional title as a div with the @title@ class, and give the
-- block itself a class corresponding to the admonition name.
Expand All @@ -1134,7 +1139,7 @@ parseBlock (Elem e) =
b <- getBlocks e
let t = divWith ("", ["title"], []) (plain $ fromMaybe mempty mbt)
-- we also attach the label as a class, so it can be styled properly
return $ divWith (attrValue "id" e,[label],[]) (t <> b)
return $ divWith (attrValue "id" e, [label], getRoleAttr e) (t <> b)

toAlignment :: Element -> Alignment
toAlignment c = case findAttr (unqual "align") c of
Expand Down Expand Up @@ -1206,8 +1211,8 @@ parseInline :: PandocMonad m => Content -> DB m Inlines
parseInline (Text (CData _ s _)) = return $ text s
parseInline (CRef ref) =
return $ text $ fromMaybe (T.toUpper ref) $ lookupEntity ref
parseInline (Elem e) =
case qName (elName e) of
parseInline (Elem e) = do
parsedInline <- case qName (elName e) of
"anchor" -> do
return $ spanWith (attrValue "id" e, [], []) mempty
"phrase" -> do
Expand Down Expand Up @@ -1329,6 +1334,9 @@ parseInline (Elem e) =
-- <?asciidor-br?> to in handleInstructions, above.
"pi-asciidoc-br" -> return linebreak
_ -> skip >> innerInlines id
return $ case qName (elName e) of
"emphasis" -> parsedInline
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this special case for "emphasis"?

Copy link
Author

@yanntrividic yanntrividic Mar 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried something, which failed, now I need to figure out a way to handle this.

Currently, Pandoc supports role attributes for some Inlines. It helps specifying Strong, Strikeout, Underline and Emph elements there. All those elements need a wrapper to add attributes to them. But if we try to apply this updated code to:

<emphasis role="strong">word</emphasis>

We get:

Span
    ( ""
        , []
        , [ ( "wrapper" , "1" ) , ( "role" , "strong" ) ]
    )
    [ Strong [ Str "word" ] ]

Which is not what we would want... So my previous attempt in f0827ee was to circumvent this issue, but then no emphasis DocBook element would have attributes, which is not what we want either.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. Well, you're just preventing a role attribute from going on something parsed from an emphasis tag, and that's fine given that we already handle the roles in another way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then, that would mean that we can't have any other role value than bf, bold, strong, strikethrough, or underline, on emphasis elements ? Maybe in that case the role attributes should be assigned to phrase elements... That's a compromise I'm willing to make yes, if you feel it makes sense.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If other roles are used, then in the part of the code that checks for role on emphasis and gives you Strong, Underline or whatever in response, you could also check for other roles and simply add them as attributes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand from the documentation, I think that DocBook documents expect only one role attribute per element. To specify more precise roles, it is recommended to parameterize a pattern to do so, but you still only have one role attribute.

So, knowing this, I think that it shouldn't be possible to have a role on a Strong element, because that would imply that there are necessarily two role attributes on the DocBook element, which shouldn't be possible.

And in that case, we would only need to change the last line from this part:

"emphasis" -> case attrValue "role" e of
                             "bf"            -> innerInlines strong
                             "bold"          -> innerInlines strong
                             "strong"        -> innerInlines strong
                             "strikethrough" -> innerInlines strikeout
                             "underline"     -> innerInlines underline
                             _               -> innerInlines emph

Does it make sense like this? What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yanntrividic Not strictly true, docbook would allow multiple tokens in a role attribute, so e.g. <emphasis role="bold special"> is entirely valid. However, I think that is an extreme edge case and I've never seen it used. I think it would be perfectly fine to only process the known role tokens for emphasis as you describe.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, you are right... Well I guess it would not be so much work to support this feature, but in that case we would support only one pattern, that is to say, space-separated values for role attributes. I think at this point, I would prefer leaving this for a filter to handle.

Copy link
Author

@yanntrividic yanntrividic Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On another note, I've been trying to work on something to discriminate the last case from the others... Not sure how to do it properly though... Here is my attempt: 3703757.

I know there is a parsing error in the code, but as I said, I'm quite new to Haskell, and I am really not sure how I could make this kind of type assertion.

Basically, the thought process here is that if the element is an emphasis element, and that has been parsed as an Emph element, then we can add a role attribute. But I'm really not sure what would be the right way to check the second condition. Any ideas?

Edit: Oops, it is better with this line corrected: 06214b6

_ -> addPandocAttributes (getRoleAttr e) parsedInline
where skip = do
let qn = qName $ elName e
let name = if "pi-" `T.isPrefixOf` qn
Expand Down
Loading