Skip to content

Commit 2b809b6

Browse files
committed
adding web scraping with rvest in Rmd
1 parent e80d727 commit 2b809b6

File tree

2 files changed

+265
-0
lines changed

2 files changed

+265
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
---
2+
title: "Scientific Reports Editors"
3+
author: "Ben Best"
4+
date: "November 4, 2015"
5+
output:
6+
html_document:
7+
toc: yes
8+
toc_depth: 2
9+
pdf_document:
10+
toc: yes
11+
word_document: default
12+
---
13+
14+
# Introduction
15+
16+
Need to find a list of relevant editors from this [Scientific Reports list of editors](http://www.nature.com/srep/about/editorial-board).
17+
18+
# Methods
19+
Here's a list of relevant authors:
20+
21+
```{r}
22+
library(dplyr) # data.frame manipulation, %>%
23+
library(stringr) # string manipulation for R
24+
library(readr) # read and write nicely
25+
library(rvest) # web scraping
26+
library(DT) # data table
27+
#library(scholar)
28+
#source('GScholarScraper_3.2.R')
29+
30+
url = 'http://www.nature.com/srep/about/editorial-board'
31+
32+
# get authors
33+
a = read_html(url) %>%
34+
html_nodes('b') %>%
35+
html_text() %>%
36+
data.frame() %>%
37+
select_(name = '.') %>%
38+
mutate(
39+
i = row_number())
40+
41+
# get groups based on first of sections
42+
a_env = a %>%
43+
filter(
44+
i >= i[name=='Amir AghaKouchak'],
45+
i < i[name=='Venugopal Achanta']) %>%
46+
mutate(group='env')
47+
a_evo = a %>%
48+
filter(
49+
i >= i[name=='Arhat Abzhanov'],
50+
i < i[name=='Vineet Ahuja']) %>%
51+
mutate(group='eco')
52+
53+
# bind environmental and evolutionary
54+
a = bind_rows(
55+
a_env,
56+
a_evo) %>%
57+
mutate(
58+
first = str_replace(name, '^(.*?) (.*)$', '\\1'),
59+
last = str_replace(name, '^(.*?) (.*)$', '\\2')
60+
)
61+
#write_csv(a, 'scientific-reports_authors-env-evo.csv')
62+
```
63+
64+
# Results
65+
66+
```{r out, echo=F}
67+
datatable(a)
68+
#library(xtable)
69+
#xtable(a[1:3,1:10])
70+
```
71+
72+
## Fancy Plot
73+

web_scraping/scientific_reports_editors.html

+192
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)