Skip to content

Add scraper for rickbayless.com#1889

Open
joeygerovac wants to merge 2 commits into
hhursev:mainfrom
joeygerovac:add-rickbayless-scraper
Open

Add scraper for rickbayless.com#1889
joeygerovac wants to merge 2 commits into
hhursev:mainfrom
joeygerovac:add-rickbayless-scraper

Conversation

@joeygerovac
Copy link
Copy Markdown

Add scraper for rickbayless.com

Adds support for scraping recipes from rickbayless.com.

Site structure

Rick Bayless's site uses custom HTML markup with no JSON-LD structured
data, requiring a dedicated scraper. Recipe data is stored in semantic
class names:

  • Title: <h1> inside div.page-header
  • Description: div.recipe-description
  • Ingredients: <li itemprop="ingredients"> with separate spans for
    quantity/unit, name, and preparation notes
  • Instructions: <p> tags inside div.recipe-instructions

Testing

Tested against multiple recipes including:

  • Simple recipes (classic guacamole)
  • Recipes with quantity-less ingredients ("Salt")
  • Recipes with linked ingredient names (anchor tags)
  • Single and multi-paragraph instruction sets
  • Recipes with and without serving size data

All 1030 tests pass.

Edge cases handled

  • Ingredients with no quantity parse correctly
  • Anchor tags within ingredient names are stripped to plain text
  • yields() returns None when serving size div is empty
  • image() falls back gracefully when no og:image tag exists

Adds support for scraping recipes from rickbayless.com using custom
HTML parsing (no schema.org markup on the site). Extracts title,
description, ingredients, instructions, image, and yields.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@jknndy jknndy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @joeygerovac , thanks for the contribution! I made some comments throughout for your review.

Comment thread recipe_scrapers/rickbayless.py Outdated
Comment on lines +34 to +37
def image(self):
og_image = self.soup.find("meta", {"property": "og:image"})
return og_image["content"] if og_image else None

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def image(self):
og_image = self.soup.find("meta", {"property": "og:image"})
return og_image["content"] if og_image else None

The built in best_image functionality (see here) returns the same result so we can exclude the custom code here.

Comment thread recipe_scrapers/rickbayless.py Outdated
return None
text = servings.get_text(strip=True)
result = text.replace("Servings:", "").strip()
return result or None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return result or None
return get_yields(result)
def total_time(self):
raise FieldNotProvidedByWebsiteException(return_value=None)
def author(self):
raise StaticValueException(return_value="Rick Bayless")

yields: There is a built in util that can be used to help standardize the output of the yields which helps maintain consistency in the output. There is a bug with this specific site that i'll be opening an issue for.
total_time: Since total time is a mandatory field we should raise FieldNotProvidedByWebsiteException to alert that this field isn't available from this site
author: Another mandatory field here but this time we can raise StaticValueException with the author's name as it seems they are the sole contributor to the site.

Comment thread recipe_scrapers/rickbayless.py Outdated
@@ -0,0 +1,44 @@
from ._abstract import AbstractScraper
from ._exceptions import StaticValueException
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from ._exceptions import StaticValueException
from ._exceptions import StaticValueException, FieldNotProvidedByWebsiteException
from ._utils import get_yields

@@ -0,0 +1,24 @@
{
"canonical_url": "https://www.rickbayless.com/recipe/mango-guacamole/",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"canonical_url": "https://www.rickbayless.com/recipe/mango-guacamole/",
"author": "Rick Bayless",
"canonical_url": "https://www.rickbayless.com/recipe/mango-guacamole/",

"Mix in 2/3 of the diced mango. Taste and season with salt. If not using immediately, cover with plastic wrap pressed directly on the surface of the guacamole and refrigerate—preferably for no more than a few hours.",
"When you're ready to serve, scoop the guacamole into a serving bowl and garnish with the remaining diced mango and cilantro sprigs. Serve with tortilla chips, slices of cucumber or jicama."
],
"yields": "2 1/2 cups",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"yields": "2 1/2 cups",
"yields": "2 cups",

],
"yields": "2 1/2 cups",
"description": "Recipe from Season 6, Mexico—One Plate at a Time",
"image": "https://www.rickbayless.com/wp-content/uploads/2013/12/Screen-Shot-2014-04-01-at-12.12.04-PM.png"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"image": "https://www.rickbayless.com/wp-content/uploads/2013/12/Screen-Shot-2014-04-01-at-12.12.04-PM.png"
"total_time": null,
"image": "https://www.rickbayless.com/wp-content/uploads/2013/12/Screen-Shot-2014-04-01-at-12.12.04-PM.png"

Use get_yields() for yield normalization, raise StaticValueException for
author, raise FieldNotProvidedByWebsiteException for total_time, and
remove the custom image method in favor of the built-in best_image plugin.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants