Add scraper for rickbayless.com by joeygerovac · Pull Request #1889 · hhursev/recipe-scrapers

joeygerovac · 2026-04-14T06:50:54Z

Add scraper for rickbayless.com

Adds support for scraping recipes from rickbayless.com.

Site structure

Rick Bayless's site uses custom HTML markup with no JSON-LD structured
data, requiring a dedicated scraper. Recipe data is stored in semantic
class names:

Title: <h1> inside div.page-header
Description: div.recipe-description
Ingredients: <li itemprop="ingredients"> with separate spans for
quantity/unit, name, and preparation notes
Instructions: <p> tags inside div.recipe-instructions

Testing

Tested against multiple recipes including:

Simple recipes (classic guacamole)
Recipes with quantity-less ingredients ("Salt")
Recipes with linked ingredient names (anchor tags)
Single and multi-paragraph instruction sets
Recipes with and without serving size data

All 1030 tests pass.

Edge cases handled

Ingredients with no quantity parse correctly
Anchor tags within ingredient names are stripped to plain text
yields() returns None when serving size div is empty
image() falls back gracefully when no og:image tag exists

Adds support for scraping recipes from rickbayless.com using custom HTML parsing (no schema.org markup on the site). Extracts title, description, ingredients, instructions, image, and yields. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jknndy

Hi @joeygerovac , thanks for the contribution! I made some comments throughout for your review.

jknndy · 2026-04-29T21:28:08Z

+    def image(self):
+        og_image = self.soup.find("meta", {"property": "og:image"})
+        return og_image["content"] if og_image else None
+


Suggested change

def image(self):

og_image = self.soup.find("meta", {"property": "og:image"})

return og_image["content"] if og_image else None

The built in best_image functionality (see here) returns the same result so we can exclude the custom code here.

jknndy · 2026-04-29T21:39:02Z

+            return None
+        text = servings.get_text(strip=True)
+        result = text.replace("Servings:", "").strip()
+        return result or None


Suggested change

return result or None

return get_yields(result)

def total_time(self):

raise FieldNotProvidedByWebsiteException(return_value=None)

def author(self):

raise StaticValueException(return_value="Rick Bayless")

yields: There is a built in util that can be used to help standardize the output of the yields which helps maintain consistency in the output. There is a bug with this specific site that i'll be opening an issue for.
total_time: Since total time is a mandatory field we should raise FieldNotProvidedByWebsiteException to alert that this field isn't available from this site
author: Another mandatory field here but this time we can raise StaticValueException with the author's name as it seems they are the sole contributor to the site.

jknndy · 2026-04-29T21:39:22Z

@@ -0,0 +1,44 @@
+from ._abstract import AbstractScraper
+from ._exceptions import StaticValueException


Suggested change

from ._exceptions import StaticValueException

from ._exceptions import StaticValueException, FieldNotProvidedByWebsiteException

from ._utils import get_yields

jknndy · 2026-04-29T21:40:17Z

@@ -0,0 +1,24 @@
+{
+  "canonical_url": "https://www.rickbayless.com/recipe/mango-guacamole/",


Suggested change

"canonical_url": "https://www.rickbayless.com/recipe/mango-guacamole/",

"author": "Rick Bayless",

"canonical_url": "https://www.rickbayless.com/recipe/mango-guacamole/",

jknndy · 2026-04-29T21:40:30Z

+    "Mix in 2/3 of the diced mango. Taste and season with salt. If not using immediately, cover with plastic wrap pressed directly on the surface of the guacamole and refrigerate—preferably for no more than a few hours.",
+    "When you're ready to serve, scoop the guacamole into a serving bowl and garnish with the remaining diced mango and cilantro sprigs. Serve with tortilla chips, slices of cucumber or jicama."
+  ],
+  "yields": "2 1/2 cups",


Suggested change

"yields": "2 1/2 cups",

"yields": "2 cups",

jknndy · 2026-04-29T21:40:44Z

+  ],
+  "yields": "2 1/2 cups",
+  "description": "Recipe from Season 6, Mexico—One Plate at a Time",
+  "image": "https://www.rickbayless.com/wp-content/uploads/2013/12/Screen-Shot-2014-04-01-at-12.12.04-PM.png"


Suggested change

"image": "https://www.rickbayless.com/wp-content/uploads/2013/12/Screen-Shot-2014-04-01-at-12.12.04-PM.png"

"total_time": null,

"image": "https://www.rickbayless.com/wp-content/uploads/2013/12/Screen-Shot-2014-04-01-at-12.12.04-PM.png"

Use get_yields() for yield normalization, raise StaticValueException for author, raise FieldNotProvidedByWebsiteException for total_time, and remove the custom image method in favor of the built-in best_image plugin. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jknndy requested changes Apr 29, 2026

View reviewed changes

jknndy mentioned this pull request Apr 29, 2026

Scraper issue with rickbayless #1916

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add scraper for rickbayless.com#1889

Add scraper for rickbayless.com#1889
joeygerovac wants to merge 2 commits into
hhursev:mainfrom
joeygerovac:add-rickbayless-scraper

joeygerovac commented Apr 14, 2026

Uh oh!

jknndy left a comment

Uh oh!

jknndy Apr 29, 2026

Uh oh!

jknndy Apr 29, 2026

Uh oh!

jknndy Apr 29, 2026

Uh oh!

jknndy Apr 29, 2026

Uh oh!

jknndy Apr 29, 2026

Uh oh!

jknndy Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	def image(self):
	og_image = self.soup.find("meta", {"property": "og:image"})
	return og_image["content"] if og_image else None

-        return result or None
+        return get_yields(result)
+    def total_time(self):
+        raise FieldNotProvidedByWebsiteException(return_value=None)
+    def author(self):
+        raise StaticValueException(return_value="Rick Bayless")

		@@ -0,0 +1,44 @@
		from ._abstract import AbstractScraper
		from ._exceptions import StaticValueException

	from ._exceptions import StaticValueException
	from ._exceptions import StaticValueException, FieldNotProvidedByWebsiteException
	from ._utils import get_yields

		@@ -0,0 +1,24 @@
		{
		"canonical_url": "https://www.rickbayless.com/recipe/mango-guacamole/",

	"canonical_url": "https://www.rickbayless.com/recipe/mango-guacamole/",
	"author": "Rick Bayless",
	"canonical_url": "https://www.rickbayless.com/recipe/mango-guacamole/",

	"image": "https://www.rickbayless.com/wp-content/uploads/2013/12/Screen-Shot-2014-04-01-at-12.12.04-PM.png"
	"total_time": null,
	"image": "https://www.rickbayless.com/wp-content/uploads/2013/12/Screen-Shot-2014-04-01-at-12.12.04-PM.png"

Conversation

joeygerovac commented Apr 14, 2026