Web Scraping with Rust

Installing and running Rust
- Installing Rust on Windows
- Installing Rust on macOS and Linux
Rust web scraper for scraping book data
Running the code

Rust is rapidly gaining attention as a programming language offering performance just as high as C/C++, especially regarding web scraping. However, unlike Python, which is relatively easy to learn, oftentimes at the cost of performance, Rust can be tricky to figure out.

It doesn't mean that scraping with Rust is not possible or extremely hard. Scraping with Rust can be challenging only if you don't know how to begin.

This article will guide you an overview of the process of writing a fast and efficient Rust web scraper.

For a detailed explanation, see this blog post.

Installing and running Rust

Download rustup from https://www.rust-lang.org/tools/install page. For Windows, download RUSTUP-INIT and run it to install Rust.

On macOS and Linux, this page will show you the command to install rustup. The command will be similar to the following:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Run this command from the terminal and follow the prompts to install Rust.

Rust web scraper for scraping book data

Setup

Open the terminal and run the following command to initialize an empty project:

$ cargo new book_scraper

Now, open the Cargo.toml file, and add the following lines:

[dependencies]
reqwest = {version = "0.11", features = ["blocking"]} 
scraper = "0.13.0"

Making an HTTP request

// main.rs
fn main() {
    let url = "https://books.toscrape.com/";
    let response = reqwest::blocking::get(url).expect("Could not load url.");
    let body = response.text().unwrap();
    print!("{}",body);
}

Parsing HTML with scraper

Open https://books.toscrape.com/ in Chrome and examine the HTML markup of the web page.

use scraper::{Html, Selector};
// ...
let book_selector = Selector::parse("article.product_pod").unwrap();
Now the selector is ready to be used. Add the following lines to the main function:
for element in document.select(&book_selector) {
	// more code here
}

Extracting product description

for element in document.select(&book_selector) {
    let book_name_element = element.select(&book_name_selector).next().expect("Could not select book name.");
    let book_name = book_name_element.value().attr("title").expect("Could not find title attribute.");
    let price_element = element.select(&book_price_selector).next().expect("Could not find price");
    let price = price_element.text().collect::<String>();
   println!("{:?} - {:?}",book_name, price);
}

Extracting product links

Within the for loop, add the following line:

let book_link_element = element.select(&book_name_selector).next().expect("Could not find book link element.");
let book_link= book_link_element.value().attr("href").expect("Could not find href attribute");

Writing scraped data to a CSV file

First, add the following to Cargo.toml dependencies:

csv="1.1"

Update main.rs as follows:

// main.rs
use scraper::{Html, Selector};
fn main() {
    let url = "https://books.toscrape.com/";
    let response = reqwest::blocking::get(url).expect("Could not load url.");
    let body = response.text().expect("No response body found.");
    let document = Html::parse_document(&body);
    let book_selector = Selector::parse("article.product_pod").expect("Could not create selector.");
    let book_name_selector = Selector::parse("h3 a").expect("Could not create selector.");
    let book_price_selector = Selector::parse(".price_color").expect("Could not create selector.");
    let mut wtr = csv::Writer::from_path("books.csv").expect("Could not create file.");
    wtr.write_record(&["Book Name", "Price", "Link"])
        .expect("Could not write header.");
    for element in document.select(&book_selector) {
        let book_name_element = element
            .select(&book_name_selector)
            .next()
            .expect("Could not select book name.");
        let book_name = book_name_element
            .value()
            .attr("title")
            .expect("Could not find title attribute.");
        let price_element = element
            .select(&book_price_selector)
            .next()
            .expect("Could not find price");
        let price = price_element.text().collect::<String>();
        let book_link_element = element
            .select(&book_name_selector)
            .next()
            .expect("Could not find book link element.");
        let book_link = book_link_element
            .value()
            .attr("href")
            .expect("Could not find href attribute");
        wtr.write_record([book_name, &price, &book_link])
            .expect("Could not create selector.");
    }
    wtr.flush().expect("Could not close file");
    println!("Done");
}

Running the code

Enter the following command to run the code:

$ cargo run

If you wish to find out more about web scraping with Rust, see our blog post.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
book_scraper		book_scraper
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping with Rust

Installing and running Rust

Rust web scraper for scraping book data

Setup

Making an HTTP request

Parsing HTML with scraper

Writing scraped data to a CSV file

Running the code

About

Releases

Packages

Contributors 3

Languages

oxylabs/web-scraping-with-rust

Folders and files

Latest commit

History

Repository files navigation

Web Scraping with Rust

Installing and running Rust

Rust web scraper for scraping book data

Setup

Making an HTTP request

Parsing HTML with scraper

Writing scraped data to a CSV file

Running the code

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages