-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for "role" attributes for the DocBook reader #10665
base: main
Are you sure you want to change the base?
Changes from all commits
f361555
193fa21
48f14ee
2ab3c3f
2af15a6
578c9c1
f0827ee
57d61da
007e0c8
f4109d6
17d3ad2
a232161
3703757
06214b6
e104e39
fa815ce
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -44,7 +44,7 @@ import Text.Pandoc.Builder | |
import Text.Pandoc.Class.PandocMonad (PandocMonad, report) | ||
import Text.Pandoc.Options | ||
import Text.Pandoc.Logging (LogMessage(..)) | ||
import Text.Pandoc.Shared (safeRead, extractSpaces) | ||
import Text.Pandoc.Shared (safeRead, extractSpaces, addPandocAttributes) | ||
import Text.Pandoc.Sources (ToSources(..), sourcesToText) | ||
import Text.Pandoc.Transforms (headerShift) | ||
import Text.TeXMath (readMathML, writeTeX) | ||
|
@@ -851,15 +851,19 @@ getBlocks :: PandocMonad m => Element -> DB m Blocks | |
getBlocks e = mconcat <$> | ||
mapM parseBlock (elContent e) | ||
|
||
getRoleAttr :: Element -> [(Text, Text)] -- extract role attribute and add it to the attribute list | ||
getRoleAttr e = case attrValue "role" e of | ||
"" -> [] | ||
r -> [("role", r)] | ||
|
||
parseBlock :: PandocMonad m => Content -> DB m Blocks | ||
parseBlock (Text (CData CDataRaw _ _)) = return mempty -- DOCTYPE | ||
parseBlock (Text (CData _ s _)) = if T.all isSpace s | ||
then return mempty | ||
else return $ plain $ trimInlines $ text s | ||
parseBlock (CRef x) = return $ plain $ str $ T.toUpper x | ||
parseBlock (Elem e) = | ||
case qName (elName e) of | ||
parseBlock (Elem e) = do | ||
parsedBlock <- case qName (elName e) of | ||
"toc" -> skip -- skip TOC, since in pandoc it's autogenerated | ||
"index" -> skip -- skip index, since page numbers meaningless | ||
"para" -> parseMixed para (elContent e) | ||
|
@@ -973,6 +977,7 @@ parseBlock (Elem e) = | |
"title" -> return mempty -- handled in parent element | ||
"subtitle" -> return mempty -- handled in parent element | ||
_ -> skip >> getBlocks e | ||
return $ addPandocAttributes (getRoleAttr e) parsedBlock | ||
where skip = do | ||
let qn = qName $ elName e | ||
let name = if "pi-" `T.isPrefixOf` qn | ||
|
@@ -1099,7 +1104,12 @@ parseBlock (Elem e) = | |
modify $ \st -> st{ dbSectionLevel = n } | ||
b <- getBlocks e | ||
modify $ \st -> st{ dbSectionLevel = n - 1 } | ||
return $ headerWith (elId, classes, maybeToList titleabbrevElAsAttr++attrs) n' headerText <> b | ||
let content = headerWith (elId, classes, maybeToList titleabbrevElAsAttr) | ||
n' headerText <> b | ||
return $ case attrValue "role" e of | ||
"" -> content | ||
_ -> divWith ("", ["section"], | ||
("level", T.pack $ show n') : attrs) content | ||
titleabbrevElAsAttr = | ||
case filterChild (named "titleabbrev") e `mplus` | ||
(filterChild (named "info") e >>= | ||
|
@@ -1124,7 +1134,6 @@ parseBlock (Elem e) = | |
Nothing -> return b | ||
Just t -> return $ divWith (attrValue "id" e,[],[]) | ||
(divWith ("", ["title"], []) (plain t) <> b) | ||
|
||
-- Admonitions are parsed into a div. Following other Docbook tools that output HTML, | ||
-- we parse the optional title as a div with the @title@ class, and give the | ||
-- block itself a class corresponding to the admonition name. | ||
|
@@ -1206,8 +1215,8 @@ parseInline :: PandocMonad m => Content -> DB m Inlines | |
parseInline (Text (CData _ s _)) = return $ text s | ||
parseInline (CRef ref) = | ||
return $ text $ fromMaybe (T.toUpper ref) $ lookupEntity ref | ||
parseInline (Elem e) = | ||
case qName (elName e) of | ||
parseInline (Elem e) = do | ||
parsedInline <- case qName (elName e) of | ||
"anchor" -> do | ||
return $ spanWith (attrValue "id" e, [], []) mempty | ||
"phrase" -> do | ||
|
@@ -1320,7 +1329,8 @@ parseInline (Elem e) = | |
"strong" -> innerInlines strong | ||
"strikethrough" -> innerInlines strikeout | ||
"underline" -> innerInlines underline | ||
_ -> innerInlines emph | ||
_ -> innerInlines $ | ||
spanWith ("", ["emphasis"], getRoleAttr e) | ||
"footnote" -> note . mconcat <$> | ||
mapM parseBlock (elContent e) | ||
"title" -> return mempty | ||
|
@@ -1329,6 +1339,9 @@ parseInline (Elem e) = | |
-- <?asciidor-br?> to in handleInstructions, above. | ||
"pi-asciidoc-br" -> return linebreak | ||
_ -> skip >> innerInlines id | ||
return $ case qName (elName e) of | ||
"emphasis" -> parsedInline | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why this special case for "emphasis"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried something, which failed, now I need to figure out a way to handle this. Currently, Pandoc supports <emphasis role="strong">word</emphasis> We get:
Which is not what we would want... So my previous attempt in f0827ee was to circumvent this issue, but then no There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, right. Well, you're just preventing a role attribute from going on something parsed from an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But then, that would mean that we can't have any other There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If other roles are used, then in the part of the code that checks for role on emphasis and gives you Strong, Underline or whatever in response, you could also check for other roles and simply add them as attributes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From what I understand from the documentation, I think that DocBook documents expect only one So, knowing this, I think that it shouldn't be possible to have a And in that case, we would only need to change the last line from this part: "emphasis" -> case attrValue "role" e of
"bf" -> innerInlines strong
"bold" -> innerInlines strong
"strong" -> innerInlines strong
"strikethrough" -> innerInlines strikeout
"underline" -> innerInlines underline
_ -> innerInlines emph Does it make sense like this? What do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @yanntrividic Not strictly true, docbook would allow multiple tokens in a role attribute, so e.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see, you are right... Well I guess it would not be so much work to support this feature, but in that case we would support only one pattern, that is to say, space-separated values for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On another note, I've been trying to work on something to discriminate the last case from the others... Not sure how to do it properly though... Here is my attempt: 3703757. I know there is a parsing error in the code, but as I said, I'm quite new to Haskell, and I am really not sure how I could make this kind of type assertion. Basically, the thought process here is that if the element is an Edit: Oops, it is better with this line corrected: 06214b6 |
||
_ -> addPandocAttributes (getRoleAttr e) parsedInline | ||
where skip = do | ||
let qn = qName $ elName e | ||
let name = if "pi-" `T.isPrefixOf` qn | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to create a span with the role attribute but then just the string content. I think it needs to become an Emph to not break existing writers too much. If it's difficult to achieve, I think reverting the _ case to
innerInlines emph
(and not getting the Span for this particular case) is better than not creating an EmphThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is what I thought you meant in your first comment. As I said, I am not sure how to achieve what we want there... So if nobody has an idea to pull that without too much effort, I think it is fine to admit that if you want a specific element on an
Emph
, it could be put on aphrase
element and do the work. What do you think @jgm?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the context; you'd have to explain the issue to me more fully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgm As far as I can discern from testing,
<emphasis>emphasis</emphasis>
(without arole
attribute or with a role attribute that is notbf
,strong
,bold
,strikethrough
, orunderline
) becomesSpan ( "" , [ "emphasis" ] , [] ) [ Str "emphasis" ]
, I would preferSpan ( "" , [ "emphasis" ] , [] ) [ Emph [ Str "emphasis" ] ]
or, failing that, revert back to the current behaviorEmph [ Str "emphasis" ]
. I fear not creating theEmph
will be breaking for a lot of conversions.@yanntrividic My apologies if I've made an unclear comment. I may have misunderstood the intentions of a particular change at some point. The overall change is a really useful one and I think we're close to a good implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem! Though yes, I tried again to work around those types for a bit, but I just can't figure out a nice way to do wrap the
Emph
around aSpan
. I'm fine with reverting the changes if nobody has a better proposition :)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit lost here: surely,
<emphasis>emphasis</emphasis>
should be converted as simplyEmph [ Str "emphasis" ]
. I'm not sure why one would even consider the Span conversion? I may be missing more of the context, though?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so let's start again from the beginning.
By default,
addPandocAttributes
is now applied to all theInline
elements to add arole
attribute, if present. But for a few DocBook elements, and especially here foremphasis
, therole
attribute was already taken into account.For the
emphasis
element, therole
attribute is to discriminateStrong
,Strikeout
,Underline
andEmph
. So if we apply theaddPandocAttributes
function after doing that, we can get outputs such as:Which is unnecessary.
On the other end, we would like to be able to get outputs such as:
Or even:
But not (I think?):
But if we want to get this output, we have to modify this bit of code.
But I don't know how to modify those lines (or modify others?) in a good way to achieve this, because the
emph
function's types work well with theinnerInlines
function, but not with theaddPandocAttributes
function.Is it a bit clearer now @jgm?