Add scraper for rickbayless.com#1889
Conversation
Adds support for scraping recipes from rickbayless.com using custom HTML parsing (no schema.org markup on the site). Extracts title, description, ingredients, instructions, image, and yields. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jknndy
left a comment
There was a problem hiding this comment.
Hi @joeygerovac , thanks for the contribution! I made some comments throughout for your review.
| def image(self): | ||
| og_image = self.soup.find("meta", {"property": "og:image"}) | ||
| return og_image["content"] if og_image else None | ||
|
|
There was a problem hiding this comment.
| def image(self): | |
| og_image = self.soup.find("meta", {"property": "og:image"}) | |
| return og_image["content"] if og_image else None |
The built in best_image functionality (see here) returns the same result so we can exclude the custom code here.
| return None | ||
| text = servings.get_text(strip=True) | ||
| result = text.replace("Servings:", "").strip() | ||
| return result or None |
There was a problem hiding this comment.
| return result or None | |
| return get_yields(result) | |
| def total_time(self): | |
| raise FieldNotProvidedByWebsiteException(return_value=None) | |
| def author(self): | |
| raise StaticValueException(return_value="Rick Bayless") |
yields: There is a built in util that can be used to help standardize the output of the yields which helps maintain consistency in the output. There is a bug with this specific site that i'll be opening an issue for.
total_time: Since total time is a mandatory field we should raise FieldNotProvidedByWebsiteException to alert that this field isn't available from this site
author: Another mandatory field here but this time we can raise StaticValueException with the author's name as it seems they are the sole contributor to the site.
| @@ -0,0 +1,44 @@ | |||
| from ._abstract import AbstractScraper | |||
| from ._exceptions import StaticValueException | |||
There was a problem hiding this comment.
| from ._exceptions import StaticValueException | |
| from ._exceptions import StaticValueException, FieldNotProvidedByWebsiteException | |
| from ._utils import get_yields |
| @@ -0,0 +1,24 @@ | |||
| { | |||
| "canonical_url": "https://www.rickbayless.com/recipe/mango-guacamole/", | |||
There was a problem hiding this comment.
| "canonical_url": "https://www.rickbayless.com/recipe/mango-guacamole/", | |
| "author": "Rick Bayless", | |
| "canonical_url": "https://www.rickbayless.com/recipe/mango-guacamole/", |
| "Mix in 2/3 of the diced mango. Taste and season with salt. If not using immediately, cover with plastic wrap pressed directly on the surface of the guacamole and refrigerate—preferably for no more than a few hours.", | ||
| "When you're ready to serve, scoop the guacamole into a serving bowl and garnish with the remaining diced mango and cilantro sprigs. Serve with tortilla chips, slices of cucumber or jicama." | ||
| ], | ||
| "yields": "2 1/2 cups", |
There was a problem hiding this comment.
| "yields": "2 1/2 cups", | |
| "yields": "2 cups", |
| ], | ||
| "yields": "2 1/2 cups", | ||
| "description": "Recipe from Season 6, Mexico—One Plate at a Time", | ||
| "image": "https://www.rickbayless.com/wp-content/uploads/2013/12/Screen-Shot-2014-04-01-at-12.12.04-PM.png" |
There was a problem hiding this comment.
| "image": "https://www.rickbayless.com/wp-content/uploads/2013/12/Screen-Shot-2014-04-01-at-12.12.04-PM.png" | |
| "total_time": null, | |
| "image": "https://www.rickbayless.com/wp-content/uploads/2013/12/Screen-Shot-2014-04-01-at-12.12.04-PM.png" |
Use get_yields() for yield normalization, raise StaticValueException for author, raise FieldNotProvidedByWebsiteException for total_time, and remove the custom image method in favor of the built-in best_image plugin. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add scraper for rickbayless.com
Adds support for scraping recipes from rickbayless.com.
Site structure
Rick Bayless's site uses custom HTML markup with no JSON-LD structured
data, requiring a dedicated scraper. Recipe data is stored in semantic
class names:
<h1>insidediv.page-headerdiv.recipe-description<li itemprop="ingredients">with separate spans forquantity/unit, name, and preparation notes
<p>tags insidediv.recipe-instructionsTesting
Tested against multiple recipes including:
All 1030 tests pass.
Edge cases handled
yields()returnsNonewhen serving size div is emptyimage()falls back gracefully when no og:image tag exists