Skip to content

oxylabs/web-scraping-with-rust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Web Scraping with Rust

Oxylabs promo code

Rust is rapidly gaining attention as a programming language offering performance just as high as C/C++, especially regarding web scraping. However, unlike Python, which is relatively easy to learn, oftentimes at the cost of performance, Rust can be tricky to figure out.

It doesn't mean that scraping with Rust is not possible or extremely hard. Scraping with Rust can be challenging only if you don't know how to begin.

This article will guide you an overview of the process of writing a fast and efficient Rust web scraper.

For a detailed explanation, see this blog post.

Installing and running Rust

Download rustup from https://www.rust-lang.org/tools/install page. For Windows, download RUSTUP-INIT and run it to install Rust.

On macOS and Linux, this page will show you the command to install rustup. The command will be similar to the following:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Run this command from the terminal and follow the prompts to install Rust.

Rust web scraper for scraping book data

Setup

Open the terminal and run the following command to initialize an empty project:

$ cargo new book_scraper

Now, open the Cargo.toml file, and add the following lines:

[dependencies]
reqwest = {version = "0.11", features = ["blocking"]} 
scraper = "0.13.0"

Making an HTTP request

// main.rs
fn main() {
    let url = "https://books.toscrape.com/";
    let response = reqwest::blocking::get(url).expect("Could not load url.");
    let body = response.text().unwrap();
    print!("{}",body);
}

Parsing HTML with scraper

Open https://books.toscrape.com/ in Chrome and examine the HTML markup of the web page.

books to scrape source code

use scraper::{Html, Selector};
// ...
let book_selector = Selector::parse("article.product_pod").unwrap();
Now the selector is ready to be used. Add the following lines to the main function:
for element in document.select(&book_selector) {
	// more code here
} 

Extracting product description

for element in document.select(&book_selector) {
    let book_name_element = element.select(&book_name_selector).next().expect("Could not select book name.");
    let book_name = book_name_element.value().attr("title").expect("Could not find title attribute.");
    let price_element = element.select(&book_price_selector).next().expect("Could not find price");
    let price = price_element.text().collect::<String>();
   println!("{:?} - {:?}",book_name, price);
}

Extracting product links

Within the for loop, add the following line:

let book_link_element = element.select(&book_name_selector).next().expect("Could not find book link element.");
let book_link= book_link_element.value().attr("href").expect("Could not find href attribute");

Writing scraped data to a CSV file

First, add the following to Cargo.toml dependencies:

csv="1.1"

Update main.rs as follows:

// main.rs
use scraper::{Html, Selector};
fn main() {
    let url = "https://books.toscrape.com/";
    let response = reqwest::blocking::get(url).expect("Could not load url.");
    let body = response.text().expect("No response body found.");
    let document = Html::parse_document(&body);
    let book_selector = Selector::parse("article.product_pod").expect("Could not create selector.");
    let book_name_selector = Selector::parse("h3 a").expect("Could not create selector.");
    let book_price_selector = Selector::parse(".price_color").expect("Could not create selector.");
    let mut wtr = csv::Writer::from_path("books.csv").expect("Could not create file.");
    wtr.write_record(&["Book Name", "Price", "Link"])
        .expect("Could not write header.");
    for element in document.select(&book_selector) {
        let book_name_element = element
            .select(&book_name_selector)
            .next()
            .expect("Could not select book name.");
        let book_name = book_name_element
            .value()
            .attr("title")
            .expect("Could not find title attribute.");
        let price_element = element
            .select(&book_price_selector)
            .next()
            .expect("Could not find price");
        let price = price_element.text().collect::<String>();
        let book_link_element = element
            .select(&book_name_selector)
            .next()
            .expect("Could not find book link element.");
        let book_link = book_link_element
            .value()
            .attr("href")
            .expect("Could not find href attribute");
        wtr.write_record([book_name, &price, &book_link])
            .expect("Could not create selector.");
    }
    wtr.flush().expect("Could not close file");
    println!("Done");
}

Running the code

Enter the following command to run the code:

$ cargo run

If you wish to find out more about web scraping with Rust, see our blog post.

About

Learn how to develop your own web scraper using Rust.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages