The actor crawls the Amazon Best Sellers categories and extracts the 100 top selling items. It extracts the item's name, price, URL, and its thumbnail image.
By default, the actor extracts the 37 main Best Seller subcategories. A further level of crawl depth can be added, which also extracts the top selling items from the first level of the main categories' sub-divisions.
The actor can currently extract the .com, .co.uk, .de, .fr, .es, and .it domains. If you would like to add support for another domain, please get in touch or edit the source code yourself.
- Researching retail trends
- Researching marketing trends
- The domain you want to extract
- Depth of crawl - how many subcategories you want to extract
- Proxy
{
"category": "Amazon.co.uk Best Sellers: The most popular items in Books",
"categoryUrl": "https://www.amazon.co.uk/Best-Sellers-Books/zgbs/books/ref=zg_bs_nav_0/261-6986927-7102013",
"items": {
"0": {
"name": "The Mirror and the Light (The Wolf Hall Trilogy)",
"price": "£15.49",
"url": "https://www.amazon.co.uk/Mirror-Light-Wolf-Hall-Trilogy/dp/0007480997/ref=zg_bs_books_1?_encoding=UTF8&psc=1&refRID=3PNZSWBH3A0H1QCWYPP6",
"thumbnail": "https://images-eu.ssl-images-amazon.com/images/I/91-UvTTh4lL._AC_UL200_SR200,200_.jpg"
},
}
}
Please ensure that Memory is set to at least 1024 MB to ensure that the crawler has enough power to complete the task in a timely manner. If your machine allows, feel free to increase the memory allocation for more speed.
During the run, the actor will output messages notifying you of which page is being extracted. When the items are extracted, the actor will notify you that they are being saved.
Due to concurrent extraction of pages, these notifications may not be displayed in order.
In case of an error, the actor will complete its run immediately, without adding any data to the dataset.
When it is finished, the actor will display a Crawl complete. message.
For every 100 pages scraped, the actor will consume 0.6 Compute Units. This means that with 1 Compute Unit, you can scrape around 160 pages.
The Proxy configuration (proxyConfiguration
) option enables you to set
proxies that will be used by the scraper in order to prevent its detection by target websites.
You can use both Apify Proxy
as well as custom HTTP or SOCKS5 proxy servers.
The following table lists the available options of the proxy configuration setting:
None | The scraper will not use any proxies. All web pages will be loaded directly from IP addresses of Apify servers running on Amazon Web Services. |
---|---|
Apify Proxy (automatic) | The scraper will load all web pages using Apify Proxy in the automatic mode. In this mode, the proxy uses all proxy groups that are available to the user, and for each new web page it automatically selects the proxy that hasn't been used in the longest time for the specific hostname, in order to reduce the chance of detection by the website. You can view the list of available proxy groups on the Proxy page in the app. |
Apify Proxy (selected groups) | The scraper will load all web pages using Apify Proxy with specific groups of target proxy servers. |
Custom proxies |
The scraper will use a custom list of proxy servers.
The proxies must be specified in the Example:
|
The proxy configuration can be set programmatically when calling the actor using the API
by setting the proxyConfiguration
field.
It accepts a JSON object with the following structure:
{
// Indicates whether to use Apify Proxy or not.
"useApifyProxy": Boolean,
// Array of Apify Proxy groups, only used if "useApifyProxy" is true.
// If missing or null, Apify Proxy will use the automatic mode.
"apifyProxyGroups": String[],
// Array of custom proxy URLs, in "scheme://user:password@host:port" format.
// If missing or null, custom proxies are not used.
"proxyUrls": String[],
}
For more information on the Apify platform, Apify actors, and the Apify CLI, please consult the links below.