Skip to content

Commit 8e4ac8c

Browse files
committed
Add files for package data
1 parent 61319ad commit 8e4ac8c

File tree

2 files changed

+21
-0
lines changed

2 files changed

+21
-0
lines changed

R/data.R

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
#' Yahoo News summaries from 2014
2+
#'
3+
#' A corpus object containing 2,000 news summaries collected from Yahoo News via
4+
#' RSS feeds in 2014.
5+
#' @name data_corpus_news2014
6+
#' @references Watanabe, K. (2018). Newsmap: A semi-supervised approach to
7+
#' geographical news classification. Digital Journalism, 6(3), 294–309.
8+
#' https://doi.org/10.1080/21670811.2017.1293487
9+
#' @source <https://www.yahoo.com/news/>
10+
"data_corpus_news2014"

inst/data.R

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
library(quanteda)
2+
3+
dat <- readRDS('~/yahoo-news.RDS')
4+
dat$text <- paste0(dat$head, ". ", dat$body)
5+
dat$body <- NULL
6+
corp <- corpus(dat, text_field = 'text')
7+
8+
set.seed(1234)
9+
data_corpus_news2014 <- corpus_sample(corp, 20000)
10+
11+
usethis::use_data(data_corpus_news2014)

0 commit comments

Comments
 (0)