apply the rest of non-suggestion feedback

apify · Feb 5, 2025 · 4276746 · 4276746
1 parent ed1e9e0
commit 4276746
Show file tree

Hide file tree

Showing 4 changed files with 18 additions and 20 deletions.
diff --git a/sources/academy/webscraping/advanced_web_scraping/crawling/crawling-sitemaps.md b/sources/academy/webscraping/advanced_web_scraping/crawling/crawling-sitemaps.md
@@ -28,26 +28,26 @@ If the website has a `robots.txt` file, it often contains sitemap URLs. The site
 
 ### Common URL paths
 
-You can try to iterate over common URL paths like:
-
-- /sitemap.xml
-- /product_index.xml
-- /product_template.xml
-- /sitemap_index.xml
-- /sitemaps/sitemap_index.xml
-- /sitemap/product_index.xml
-- /media/sitemap.xml
-- /media/sitemap/sitemap.xml
-- /media/sitemap/index.xml
+You can check some common URL paths, such as the following:
+
+/sitemap.xml
+/product_index.xml
+/product_template.xml
+/sitemap_index.xml
+/sitemaps/sitemap_index.xml
+/sitemap/product_index.xml
+/media/sitemap.xml
+/media/sitemap/sitemap.xml
+/media/sitemap/index.xml
 
 Make also sure you test the list with `.gz`, `.tar.gz` and `.tgz` extensions and by capitalizing the words (e.g. `/Sitemap_index.xml.tar.gz`).
 
 Some websites also provide an HTML version, to help indexing bots find new content. Those include:
 
-- /sitemap
-- /category-sitemap
-- /sitemap.html
-- /sitemap_index
+/sitemap
+/category-sitemap
+/sitemap.html
+/sitemap_index
 
 Apify provides the [Sitemap Sniffer](https://apify.com/vaclavrut/sitemap-sniffer), an open source actor that scans the URL variations automatically for you so that you don't have to check them manually.
 

diff --git a/sources/academy/webscraping/advanced_web_scraping/crawling/crawling-with-search.md b/sources/academy/webscraping/advanced_web_scraping/crawling/crawling-with-search.md
@@ -9,7 +9,7 @@ slug: /advanced-web-scraping/crawling/crawling-with-search
 
 In this lesson, we will start with a simpler example of scraping HTML based websites with limited pagination.
 
-Limited pagination is a common practice on e-commerce sites and is becoming more popular over time. It makes sense: a real user will never want to look through more than 200 pages of results – only bots love unlimited pagination. Fortunately, there are ways to overcome this limit while keeping our code clean and generic.
+Limiting pagination is a common practice on e-commerce sites. It makes sense: a real user will never want to look through more than 200 pages of results – only bots love unlimited pagination. Fortunately, there are ways to overcome this limit while keeping our code clean and generic.
 
 ![Pagination in on Google search results page](./images/pagination.png)
 

diff --git a/sources/academy/webscraping/advanced_web_scraping/crawling/sitemaps-vs-search.md b/sources/academy/webscraping/advanced_web_scraping/crawling/sitemaps-vs-search.md
@@ -5,7 +5,7 @@ sidebar_position: 1
 slug: /advanced-web-scraping/crawling/sitemaps-vs-search
 ---
 
-The core crawling problem comes to down to ensuring that we reliably find all detail pages on the target website or inside its categories. This is trivial for small sites. We just open the home page or category pages and paginate to the end as we did in the Web Scraping for Beginners course.
+The core crawling problem comes to down to ensuring that we reliably find all detail pages on the target website or inside its categories. This is trivial for small sites. We just open the home page or category pages and paginate to the end as we did in the [Web Scraping for Beginners course](/academy/web-scraping-for-beginners).
 
 Unfortunately, _most modern websites restrict pagination_ only to somewhere between 1 and 10,000 products. Solving this problem might seem relatively straightforward at first but there are multiple hurdles that we will explore in this lesson.
 

diff --git a/sources/academy/webscraping/advanced_web_scraping/index.md b/sources/academy/webscraping/advanced_web_scraping/index.md
@@ -6,8 +6,6 @@ category: web scraping & automation
 slug: /advanced-web-scraping
 ---
 
-# Advanced web scraping
-
 In [Web scraping for beginners](/academy/web-scraping-for-beginners) course, we have learned the necessary basics required to create a scraper. In the following courses, we learned more about specific practices and techniques that will help us to solve most of the problems we will face.
 
 In this course, we will take all of that knowledge, add a few more advanced concepts, and apply them to learn how to build a production-ready web scraper.
@@ -16,7 +14,7 @@ In this course, we will take all of that knowledge, add a few more advanced conc
 
 To scrape large and complex websites, we need to scale two essential aspects of the scraper: crawling and data extraction. Big websites can have millions of pages and the data we want to extract requires more sophisticated parsing techniques than just selecting elements by CSS selectors or using APIs as they are.
 
-<!--
+<!-- WIP: We want to split this into crawling and data extraction
 The following sections will cover the core concepts that will ensure that your scraper is production-ready:
 The advanced crawling section will cover how to ensure we find all pages or products on the website.
 - The advanced data extraction will cover how to efficiently extract data from a particular page or API.