fix: detect JSON-LD injected by Next.js App Router __next_s SSR mechanism#1864
Open
cyanistes wants to merge 1 commit intohhursev:mainfrom
Open
fix: detect JSON-LD injected by Next.js App Router __next_s SSR mechanism#1864cyanistes wants to merge 1 commit intohhursev:mainfrom
cyanistes wants to merge 1 commit intohhursev:mainfrom
Conversation
…nism
Next.js App Router SSR pages inject JSON-LD via a JavaScript push call:
(self.__next_s=self.__next_s||[]).push([0,
{"type":"application/ld+json","children":"{...recipe JSON...}"}
])
instead of a static <script type="application/ld+json"> tag, so extruct
misses it entirely and SchemaOrg finds no recipe data on affected sites
(meny.no and any other Next.js App Router site with Recipe markup).
Add SchemaOrg._normalize_nextjs_jsonld() — a static pre-processing step
that lifts the embedded JSON-LD children into a proper script tag before
the HTML reaches extruct. The fast-path guard (two substring checks)
returns the HTML unchanged on all non-Next.js pages with zero overhead.
Closes #(TBD)
Owner
|
That's a good one! Thanks |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Next.js App Router SSR pages inject JSON-LD via a JavaScript push call rather than a static
<script type="application/ld+json">tag:extruct(which recipe-scrapers uses for JSON-LD extraction) only finds static script tags, so it misses this markup entirely. The result is thatSchemaOrgfinds no recipe data and scraping fails silently on any site built with Next.js App Router that uses structuredRecipemarkup. meny.no is one confirmed affected site.Fix
Add
SchemaOrg._normalize_nextjs_jsonld(html)— a static pre-processing step called at the top of__init__that detects the__next_s.push()pattern and lifts the embeddedchildrenJSON-LD string into a proper<script type="application/ld+json">tag before the HTML reachesextruct.Fast path
Two cheap
inchecks ("__next_s" not in htmland"application/ld+json" not in html) return the original HTML unchanged on all non-Next.js pages, so there is zero overhead for the vast majority of requests.No new dependencies
jsonandreare stdlib modules.Before / after (meny.no example)
Before —
SchemaOrgfinds no Recipe data → scraping raisesSchemaOrgException.After — the injected JSON-LD is promoted to a static script tag before
extructruns → title, ingredients, and instructions are parsed correctly.Changes
recipe_scrapers/_schemaorg.py— addimport json,import re; add_normalize_nextjs_jsonld()static method; call it as the first line of__init__tests/library/test_schemaorg.py— addtest_nextjs_ssr_jsonldtoTestSchemaOrgReferences
__next_s: https://github.com/vercel/next.js/search?q=__next_s