Skip to content

fix: detect JSON-LD injected by Next.js App Router __next_s SSR mechanism#1864

Open
cyanistes wants to merge 1 commit intohhursev:mainfrom
cyanistes:feat/nextjs-ssr-jsonld-normalization
Open

fix: detect JSON-LD injected by Next.js App Router __next_s SSR mechanism#1864
cyanistes wants to merge 1 commit intohhursev:mainfrom
cyanistes:feat/nextjs-ssr-jsonld-normalization

Conversation

@cyanistes
Copy link
Copy Markdown

Problem

Next.js App Router SSR pages inject JSON-LD via a JavaScript push call rather than a static <script type="application/ld+json"> tag:

<script>
(self.__next_s=self.__next_s||[]).push([0,
    {"type":"application/ld+json","async":true,"children":"{...recipe JSON...}"}
])
</script>

extruct (which recipe-scrapers uses for JSON-LD extraction) only finds static script tags, so it misses this markup entirely. The result is that SchemaOrg finds no recipe data and scraping fails silently on any site built with Next.js App Router that uses structured Recipe markup. meny.no is one confirmed affected site.

Fix

Add SchemaOrg._normalize_nextjs_jsonld(html) — a static pre-processing step called at the top of __init__ that detects the __next_s.push() pattern and lifts the embedded children JSON-LD string into a proper <script type="application/ld+json"> tag before the HTML reaches extruct.

Fast path

Two cheap in checks ("__next_s" not in html and "application/ld+json" not in html) return the original HTML unchanged on all non-Next.js pages, so there is zero overhead for the vast majority of requests.

No new dependencies

json and re are stdlib modules.

Before / after (meny.no example)

BeforeSchemaOrg finds no Recipe data → scraping raises SchemaOrgException.

After — the injected JSON-LD is promoted to a static script tag before extruct runs → title, ingredients, and instructions are parsed correctly.

Changes

  • recipe_scrapers/_schemaorg.py — add import json, import re; add _normalize_nextjs_jsonld() static method; call it as the first line of __init__
  • tests/library/test_schemaorg.py — add test_nextjs_ssr_jsonld to TestSchemaOrg

References

…nism

Next.js App Router SSR pages inject JSON-LD via a JavaScript push call:

    (self.__next_s=self.__next_s||[]).push([0,
        {"type":"application/ld+json","children":"{...recipe JSON...}"}
    ])

instead of a static <script type="application/ld+json"> tag, so extruct
misses it entirely and SchemaOrg finds no recipe data on affected sites
(meny.no and any other Next.js App Router site with Recipe markup).

Add SchemaOrg._normalize_nextjs_jsonld() — a static pre-processing step
that lifts the embedded JSON-LD children into a proper script tag before
the HTML reaches extruct. The fast-path guard (two substring checks)
returns the HTML unchanged on all non-Next.js pages with zero overhead.

Closes #(TBD)
@hhursev
Copy link
Copy Markdown
Owner

hhursev commented Mar 31, 2026

That's a good one! Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants