Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure numbering is broken in HTML output with pandoc 3.x when using Markdown image syntax #1467

Open
N0rbert opened this issue Apr 30, 2024 · 3 comments
Assignees
Labels
bug an unexpected problem or unintended behavior next to consider for next release

Comments

@N0rbert
Copy link

N0rbert commented Apr 30, 2024

Below is the minimal reproducible example (index.Rmd file):

---
title: Figure cross-reference issue in HTML
documentclass: book
site: bookdown::bookdown_site
output:
    bookdown::html_document2: default
---

```{r eval=TRUE, include=TRUE}
sessionInfo('bookdown')
rmarkdown::pandoc_version()
xfun::session_info('bookdown')
```

# Zero 1 {-#zero1}

Some text in 1st unnumbered section.

# Practically long header about some theories {#theory}

## Markdown Syntax {#markdown-syntax}

...

### Images {#markdown-syntax-media}

First figure of first section is below, it reference is Fig. \@ref(fig:md-logo).

![(\#fig:md-logo) Markdown logo](https://upload.wikimedia.org/wikipedia/commons/thumb/4/48/Markdown-mark.svg/208px-Markdown-mark.svg.png)

...

While knitting this document using RStudio 2024.04.0 Build 735 with integrated latest pandoc 3.1.11 to HTML I get the wrong rendering

figure-numbering-issue

Here my figure is not numbered. So the cross-reference works as expected, but "Figure 1.1: " is not shown.

It is fresh installation of Ubuntu MATE 24.04 LTS.
Other details:

> xfun::session_info('bookdown')
## R version 4.3.3 (2024-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 24.04 LTS
## 
## Locale:
##   LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##   LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##   LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##   LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##   LC_ADDRESS=C               LC_TELEPHONE=C            
##   LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## Package version:
##   base64enc_0.1.3   bookdown_0.39.1   bslib_0.7.0       cachem_1.0.8     
##   cli_3.6.2         digest_0.6.35     evaluate_0.23     fastmap_1.1.1    
##   fontawesome_0.5.2 fs_1.6.4          glue_1.7.0        graphics_4.3.3   
##   grDevices_4.3.3   highr_0.10        htmltools_0.5.8.1 jquerylib_0.1.4  
##   jsonlite_1.8.8    knitr_1.46        lifecycle_1.0.4   memoise_2.0.1    
##   methods_4.3.3     mime_0.12         R6_2.5.1          rappdirs_0.3.3   
##   rlang_1.1.3       rmarkdown_2.26    sass_0.4.9        stats_4.3.3      
##   tinytex_0.50      tools_4.3.3       utils_4.3.3       xfun_0.43        
##   yaml_2.3.8

> rmarkdown::pandoc_version()
## [1] '3.1.11'

Running remotes::install_github("rstudio/bookdown") does not help.
Also please note that RStudio 2023.03.2-454 with pandoc 2.19.2 produces correct rendering.

@cderv
Copy link
Collaborator

cderv commented Aug 26, 2024

Using a code cell to include the image works

```{r md-logo, fig.cap = "Markdown logo", echo = FALSE}
knitr::include_graphics("https://upload.wikimedia.org/wikipedia/commons/thumb/4/48/Markdown-mark.svg/208px-Markdown-mark.svg.png")
```

The html produced at the end is

<div class="figure"><span style="display:block;" id="fig:md-logo"></span>
<img role="img" aria-label="Markdown logo" src="data:image/png;base64<...encoded...>" alt="Markdown logo" />
<p class="caption">
Figure 1.1: Markdown logo
</p>
</div>

When using the Markdown syntax as in example, Pandoc will produce this HTML

<div class="float">
<img role="img" aria-label=" Markdown logo" src="data:image/png;base64<...encoded...>" alt=" Markdown logo" />
<div class="figcaption"> Markdown logo</div>
</div>

bookdown is counting based on figure class

figs = grep('^<div class="figure', content)

and it does try to match caption not figcaption

bookdown/R/html.R

Lines 718 to 724 in 5dcce03

if (length(grep('^<p class="caption', content[i - 0:1])) == 0) {
# remove these labels, because there must be a caption on this or
# previous line (possible negative case: the label appears in the alt
# text of <img>)
labs[[i]] = character(length(lab))
next
}

This is due to change in how images are rendered in Pandoc I believe

This new API for figure was introduced in Pandoc 3

Pandoc 2.19.2

For the image syntax, we still get a Para with an Image

> pandoc::pandoc_convert(text = "![(\\#fig:md-logo) Markdown logo](https://upload.wikimedia.org/wikipedia/commons/thumb/4/48/Markdown-mark.svg/208px-Markdown-mark.svg.png)", to = "native", version = "2.19.2")
[ Para
    [ Image
        ( "" , [] , [] )
        [ Str "(#fig:md-logo)"
        , Space
        , Str "Markdown"
        , Space
        , Str "logo"
        ]
        ( "https://upload.wikimedia.org/wikipedia/commons/thumb/4/48/Markdown-mark.svg/208px-Markdown-mark.svg.png"
        , "fig:"
        )
    ]
]

Converted to html using figure

> pandoc::pandoc_convert(text = "![(\\#fig:md-logo) Markdown logo](https://upload.wikimedia.org/wikipedia/commons/thumb/4/48/Markdown-mark.svg/208px-Markdown-mark.svg.png)", to = "html4", version = "2.19.2")
<div class="figure">
<img
src="https://upload.wikimedia.org/wikipedia/commons/thumb/4/48/Markdown-mark.svg/208px-Markdown-mark.svg.png"
alt="" />
<p class="caption">(#fig:md-logo) Markdown logo</p>
</div>

Pandoc 3.0

This now is a Figure node

> pandoc::pandoc_convert(text = "![(\\#fig:md-logo) Markdown logo](https://upload.wikimedia.org/wikipedia/commons/thumb/4/48/Markdown-mark.svg/208px-Markdown-mark.svg.png)", to = "native", version = "3.0")
[ Figure
    ( "" , [] , [] )
    (Caption
       Nothing
       [ Plain
           [ Str "(#fig:md-logo)"
           , Space
           , Str "Markdown"
           , Space
           , Str "logo"
           ]
       ])
    [ Plain
        [ Image
            ( "" , [] , [] )
            [ Str "(#fig:md-logo)"
            , Space
            , Str "Markdown"
            , Space
            , Str "logo"
            ]
            ( "https://upload.wikimedia.org/wikipedia/commons/thumb/4/48/Markdown-mark.svg/208px-Markdown-mark.svg.png"
            , ""
            )
        ]
    ]
]

which leads to a different HTML

> pandoc::pandoc_convert(text = "![(\\#fig:md-logo) Markdown logo](https://upload.wikimedia.org/wikipedia/commons/thumb/4/48/Markdown-mark.svg/208px-Markdown-mark.svg.png)", to = "html4", version = "3.0")
<div class="float">
<img
src="https://upload.wikimedia.org/wikipedia/commons/thumb/4/48/Markdown-mark.svg/208px-Markdown-mark.svg.png"
alt="(#fig:md-logo) Markdown logo" />
<div class="figcaption">(#fig:md-logo) Markdown logo</div>
</div>

@yihui we need to adapt bookdown for this specific change when Markdown figure are used with labels

@N0rbert using knitr::include_graphics() writes html directly, so it will work as it still writes the correct HTML

@cderv cderv moved this from Backlog to To discuss / To plan in R Markdown Team Projects Aug 26, 2024
@cderv cderv added bug an unexpected problem or unintended behavior pandoc concerns upstream pandoc and removed pandoc concerns upstream pandoc labels Aug 26, 2024
@cderv cderv changed the title Figure numbering is broken in HTML output while using newest RStudio 2024.04.0 with pandoc 3.x Figure numbering is broken in HTML output with pandoc 3.x when using Markdown image syntax Aug 26, 2024
@cderv cderv added the next to consider for next release label Aug 26, 2024
@yihui
Copy link
Member

yihui commented Sep 3, 2024

I feel it's not a matter of simply changing two regular expressions, but will require us to write a Lua filter to transform the figure, which may not be trivial. Since a workaround exists, I tend to suggest using the workaround (i.e., knitr::include_graphics()).

@cderv
Copy link
Collaborator

cderv commented Sep 4, 2024

I feel it's not a matter of simply changing two regular expressions,

Thanks for your look into it.

write a Lua filter to transform the figure,

I'll see what I can come up with. Maybe there is way to not produce this float from Pandoc 🤔

Since a workaround exists, I tend to suggest using the workaround (i.e., knitr::include_graphics()).

This is surely the best workaround in this case !

@cderv cderv self-assigned this Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior next to consider for next release
Projects
Status: To discuss / To plan
Development

No branches or pull requests

3 participants