Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False positive WARNING: Could not match a code example to HTML with jupytext #191

Open
tovrstra opened this issue Mar 9, 2025 · 5 comments
Labels
bug Something isn't working

Comments

@tovrstra
Copy link

tovrstra commented Mar 9, 2025

Issue

I get WARNING: Could not match a code example to HTML when building documentation with jupytext sources (using the mystnb extension in combination with sphinx_codeautolink.) This happens when code cells have leading empty lines.

Expected behavior

No warnings

Steps to reproduce

source/conf.py

extensions = [
    "myst_nb",
    "sphinx_codeautolink",
]
nb_custom_formats = {
    ".py": ["jupytext.reads", {"fmt": "py:percent"}],
}
exclude_patterns = ["conf.py"]

source/index.py

#!/usr/bin/env python3

# %% [markdown]
#
# This is a bit of text

# %%

print("This is code. Mind the leading empty line.")

Run:

sphinx-build source build

Output (starting without a build directory):

loading translations [en]... done
making output directory... done
myst v4.0.1: MdParserConfig(commonmark_only=False, gfm_only=False, enable_extensions=set(), disable_syntax=[], all_links_external=False, links_external_new_tab=False, url_schemes=('http', 'https', 'mailto', 'ftp'), ref_domains=None, fence_as_directive=set(), number_code_blocks=[], title_to_header=False, heading_anchors=0, heading_slug_func=None, html_meta={}, footnote_sort=True, footnote_transition=True, words_per_minute=200, substitutions={}, linkify_fuzzy_links=True, dmath_allow_labels=True, dmath_allow_space=True, dmath_allow_digits=True, dmath_double_inline=False, update_mathjax=True, mathjax_classes='tex2jax_process|mathjax_process|math|output_area', enable_checkboxes=False, suppress_warnings=[], highlight_code_blocks=True)
myst-nb v1.2.0: NbParserConfig(custom_formats={'.py': ('jupytext.reads', {'fmt': 'py:percent'}, False)}, metadata_key='mystnb', cell_metadata_key='mystnb', kernel_rgx_aliases={}, eval_name_regex='^[a-zA-Z_][a-zA-Z0-9_]*$', execution_mode='auto', execution_cache_path='', execution_excludepatterns=(), execution_timeout=30, execution_in_temp=False, execution_allow_errors=False, execution_raise_on_error=False, execution_show_tb=False, merge_streams=False, render_plugin='default', remove_code_source=False, remove_code_outputs=False, code_prompt_show='Show code cell {type}', code_prompt_hide='Hide code cell {type}', number_source_lines=False, output_stderr='show', render_text_lexer='myst-ansi', render_error_lexer='ipythontb', render_image_options={}, render_figure_options={}, render_markdown_format='commonmark', output_folder='build', append_css=True, metadata_to_fm=False)
Using jupyter-cache at: /home/toon/tmp/.jupyter_cache
building [mo]: targets for 0 po files that are out of date
writing output... 
building [html]: targets for 1 source files that are out of date
updating environment: [new config] 1 added, 0 changed, 0 removed
/home/toon/tmp/source/index.py: Executing notebook using local CWD [mystnb]
/home/toon/tmp/source/index.py: Executed notebook in 0.90 seconds [mystnb]

looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
copying assets... 
copying static files... 
Writing evaluated template result to /home/toon/tmp/build/_static/basic.css
Writing evaluated template result to /home/toon/tmp/build/_static/documentation_options.js
Writing evaluated template result to /home/toon/tmp/build/_static/language_data.js
Writing evaluated template result to /home/toon/tmp/build/_static/alabaster.css
copying static files: done
copying extra files... 
copying extra files: done
copying assets: done
writing output... [100%] index
generating indices... genindex done
writing additional pages... search done
dumping search index in English (code: en)... done
dumping object inventory... done
/home/toon/tmp/source/index.py: WARNING: Could not match a code example to HTML, source:

print("This is code. Mind the leading empty line.") [codeautolink.match_block]
build succeeded, 1 warning.

The HTML pages are in build.

Comments

This might also be an issue with mystnb, but this is not clear to me at the moment. I don't see where the empty line gets removed, which causes the warning.

The matching fails because the code with and without leading empty line are compared here:

for trans in transforms:
for ix in range(len(inners)):
candidate = copy(inners[ix])
# remove line numbers for matching
for lineno in candidate.find_all("span", attrs={"class": "linenos"}):
lineno.extract()
if trans.source.rstrip() == "".join(candidate.strings).rstrip():
inner = inners.pop(ix)
break
else:
msg = f"Could not match a code example to HTML, source:\n{trans.source}"
logger.warning(
msg, type=warn_type, subtype="match_block", location=document
)
continue

@tovrstra tovrstra added the bug Something isn't working label Mar 9, 2025
@felix-hilden
Copy link
Owner

Thanks for submitting! It looks like we preserve the line, at least judging from the warning. Could you paste here the resulting HTML block?

@tovrstra
Copy link
Author

tovrstra commented Mar 9, 2025

Sure. This is the relevant part:

<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="s2">"This is code. Mind the leading empty line."</span><span class="p">)</span>
</pre></div>
</div>
</div>
<div class="cell_output docutils container">
<div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>This is code. Mind the leading empty line.
</pre></div>
</div>

It doesn't show the empty line.

Complete file
<!DOCTYPE html>

<html data-content_root="./" lang="en">
<head>
<meta charset="utf-8"/>
<meta content="width=device-width, initial-scale=1.0" name="viewport"/><meta content="width=device-width, initial-scale=1" name="viewport"/>
<title>&lt;no title&gt; — Project name not set  documentation</title>
<link href="_static/pygments.css?v=5ecbeea2" rel="stylesheet" type="text/css"/>
<link href="_static/basic.css?v=b08954a9" rel="stylesheet" type="text/css"/>
<link href="_static/alabaster.css?v=27fed22d" rel="stylesheet" type="text/css"/>
<link href="_static/mystnb.4510f1fc1dee50b3e5859aac5469c37c29e427902b24a333a5f9fcb2f0b3ac41.css" rel="stylesheet" type="text/css"/>
<link href="_static/sphinx-codeautolink.css?v=b2176991" rel="stylesheet" type="text/css"/>
<script src="_static/documentation_options.js?v=5929fcd5"></script>
<script src="_static/doctools.js?v=9bcbadda"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<link href="genindex.html" rel="index" title="Index"/>
<link href="search.html" rel="search" title="Search"/>
<link href="_static/custom.css" rel="stylesheet" type="text/css"/>
</head><body>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<p>This is a bit of text</p>
<div class="cell docutils container">
<div class="cell_input docutils container">
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="s2">"This is code. Mind the leading empty line."</span><span class="p">)</span>
</pre></div>
</div>
</div>
<div class="cell_output docutils container">
<div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>This is code. Mind the leading empty line.
</pre></div>
</div>
</div>
</div>
</div>
</div>
</div>
<div aria-label="Main" class="sphinxsidebar" role="navigation">
<div class="sphinxsidebarwrapper">
<h1 class="logo"><a href="#">Project name not set</a></h1>
<search id="searchbox" role="search" style="display: none">
<div class="searchformwrapper">
<form action="search.html" class="search" method="get">
<input aria-labelledby="searchlabel" autocapitalize="off" autocomplete="off" autocorrect="off" name="q" placeholder="Search" spellcheck="false" type="text"/>
<input type="submit" value="Go"/>
</form>
</div>
</search>
<script>document.getElementById('searchbox').style.display = "block"</script><h3>Navigation</h3>
<div class="relations">
<h3>Related Topics</h3>
<ul>
<li><a href="#">Documentation overview</a><ul>
</ul></li>
</ul>
</div>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="footer">
      ©.
      
      |
      Powered by <a href="https://www.sphinx-doc.org/">Sphinx 8.2.3</a>
      &amp; <a href="https://alabaster.readthedocs.io">Alabaster 1.0.0</a>
      
      |
      <a href="_sources/index.py.txt" rel="nofollow">Page source</a>
</div>
</body>
</html>

@felix-hilden
Copy link
Owner

Interesting, and you'd expect the empty line to be there or removed? Is the code block the same if you add even more empty lines? We could be doing everything correctly and the other extension removes the leading newlines unexpectedly afterwards. Unless this happens more broadly for notebooks or something. In any case, custom parsers might help.

If you have time to look into it, that'd be great, but otherwise I'll try to look around in a week or so!

@tovrstra
Copy link
Author

tovrstra commented Mar 9, 2025

No rush, I primarily wanted to post this in case others run into the same problem, because the cause was not so straightforward to find.

Taking a closer look, it seems that jupytext preserves the empty lines when it converts the Python file to a notebook. The same example, replacing the Python file by a notebook, produces the same error. Myst_nb is stripping the empty lines.

What is most remarkable is that (only) trailing empty lines, which also get stripped, do not cause the same warning. I'm not sure why.

Updated source/conf.py

extensions = [
    "myst_nb",
    "sphinx_codeautolink",
]
`source/index.ipynb`
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d38f3cf4",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "3d5679c7",
   "metadata": {},
   "source": [
    "\n",
    "This is a bit of text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "6a5952f0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This is code. Mind the leading empty line.\n"
     ]
    }
   ],
   "source": [
    "\n",
    "\n",
    "\n",
    "print(\"This is code. Mind the leading empty line.\")"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "cell_metadata_filter": "-all",
   "executable": "/usr/bin/env python3",
   "main_language": "python",
   "notebook_metadata_filter": "-all",
   "text_representation": {
    "extension": ".py",
    "format_name": "percent"
   }
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}

@felix-hilden
Copy link
Owner

Yep, then I'd consider it an issue (or maybe just an incompatibility) with mystnb. In the matching code we .rstrip both sources, that's why the trailing newlines don't matter. Removing leading empty lines would have an effect on line numbers, which is why I've been more careful about that. But maybe we could also do something smarter 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants