Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 1.0.0 #40

Merged
merged 150 commits into from
Sep 10, 2024
Merged
Show file tree
Hide file tree
Changes from 141 commits
Commits
Show all changes
150 commits
Select commit Hold shift + click to select a range
99598c3
encode colons in plos search
jeanetteclark May 5, 2021
6bab7ad
Merge branch 'main' into develop
mbjones May 28, 2021
8298dbb
Fixed Jeanette's email address.
mbjones May 28, 2021
58a970a
Add .jpg for logo. From Wikimedia Commons, 17th century pottery, man …
theamarks Mar 15, 2022
e365cf8
Create R script to build logo
theamarks Mar 15, 2022
38a59fe
Added required packages to create hexsticker
theamarks Mar 16, 2022
a9ca649
Added basic hexSticker arguments. Succsessful creation of hex sticker
theamarks Mar 16, 2022
d45dd0d
Tweaked hexsticker. Dark grey color theme, package title position, re…
theamarks Mar 16, 2022
61af61e
Changed package title font. Used google font
theamarks Mar 16, 2022
a96e00d
Tried different fonts - handwriting style
theamarks Mar 16, 2022
be7452b
narrowed down font choices. Google fonts - 'Architects Daughter' and …
theamarks Mar 16, 2022
1534d22
1st draft of logo complete - Architects Daugther font
theamarks Mar 16, 2022
c345359
Add 2 hex sticker logos .png files
theamarks Mar 17, 2022
ad6ee66
Add package test to logo code
theamarks Mar 28, 2022
d60f474
deleted runner up logo .png
theamarks Mar 28, 2022
8c26c0e
created man/figures folder moved logo source .png to folder
theamarks Mar 28, 2022
9973792
changed logo location to man/figures
theamarks Mar 28, 2022
ee1cdbc
added code skeleton to build favicon
theamarks Mar 28, 2022
b05f476
simplified logo file name
theamarks Mar 28, 2022
45ac224
changed logo folder name to lower case
theamarks Mar 28, 2022
27bdf5a
Added logo R code to .Rbuildignore
theamarks Mar 28, 2022
91d787b
Add logo folder to .Rbuildignore not code script only
theamarks Mar 28, 2022
2292abb
changed logo save location to man/figures
theamarks Mar 28, 2022
76f8849
update code to read logo source image in from man/figures
theamarks Mar 28, 2022
4f003e2
updated logo code with image without a background. Darkened boarder g…
theamarks Mar 29, 2022
79cbf61
logo file V2
theamarks Mar 29, 2022
70c7802
cleaned up code for git conflict
theamarks Mar 29, 2022
ccd6feb
removed unused google font
theamarks Mar 29, 2022
57a93ab
Created logo with backgroundless image. Cleaner cuts than previous ve…
theamarks Mar 29, 2022
67eb146
Added image source url and information in script
theamarks Mar 30, 2022
e3ac259
Merge pull request #30 from theamarks/develop
mbjones Mar 30, 2022
c498fd0
create new citation_search_xDD function script. Copied from citation_…
theamarks Apr 1, 2022
1491e01
remove get_key function. xDD does not require API key
theamarks Apr 1, 2022
8c14ca2
temporarily removed API throttle message until known if xDD has a tro…
theamarks Apr 1, 2022
91679c6
Added xDD API 'snippets' URL to search 'term' parameter in documents …
theamarks Apr 1, 2022
335e972
create working example code outside of function
theamarks Apr 1, 2022
4145cf8
rename results dataframe
theamarks Apr 1, 2022
d069cec
extract the number of citations (hits) for a single search result
theamarks Apr 1, 2022
8575159
pull number of citaitons (hits), article DOI, and title form results …
theamarks Apr 1, 2022
7fa1d60
updated returned dataframe code. Not working in for loop. Check how o…
theamarks Apr 1, 2022
ea2b4e3
Next try using example of xDD results extraction found on github
theamarks Apr 2, 2022
808f4df
successfully extracted doc info from xDD API results
theamarks Apr 4, 2022
20cdedb
not working for multiple identifers input. Need to deal with null res…
theamarks Apr 4, 2022
8e10390
restructured adding results to data.frame. Function not working
theamarks Apr 4, 2022
99ffcad
not working. error: 'list' object cannot be coerced to type 'double'.…
theamarks Apr 4, 2022
d841830
fixed else if () statement to run clean with example dois. error now …
theamarks Apr 5, 2022
b9c82dc
Updated if() for 0 returned results. Full function running clean. Nee…
theamarks Apr 5, 2022
1500e6e
added to else printout statement if function fails
theamarks Apr 5, 2022
80e18e6
Add xDD to sources in citation_search() function
theamarks Apr 5, 2022
76541dc
updated xDD funtion documentation
theamarks Apr 5, 2022
d394a80
Added xDD to citation_search function
theamarks Apr 5, 2022
c0f6407
modified xDD API URL input to search beyond xDD full-text holdings de…
theamarks Apr 5, 2022
a867c21
updated XDD function Roxygen2 citation
theamarks Apr 7, 2022
e387a5e
update package description and namespace with `usethis::use_testthat(…
theamarks Apr 7, 2022
db63efb
Added xDD as source to test-sycthe.R
theamarks Apr 7, 2022
0e5c6d3
Added xDD documentation with devtools::document()
theamarks Apr 11, 2022
35cd676
Add data citation only found by xDD to test-citations.csv. (dataset d…
theamarks Apr 12, 2022
245ccf9
deleted sys.sleep() in xDD function. Not needed
theamarks Apr 13, 2022
09ecd7c
remove unnecessary else() statement in citation_search_xDD()
theamarks Apr 13, 2022
6b849fb
Added dataset doi to test identifiers that produced 2 results in xDD
theamarks Apr 13, 2022
9d8e0c0
renamed xDD to xdd (lowercase) across functions and filenames. Congru…
theamarks Apr 13, 2022
51da109
Merge pull request #34 from theamarks/develop
jeanetteclark Apr 14, 2022
eb6c69f
add result column for journal title that the citation was found in. x…
theamarks Apr 25, 2022
577b0d3
added citation source column ("xdd") and corrected a couple upper cas…
theamarks Apr 25, 2022
1de9e29
convert uppercase xDD to lowercase xdd withing function
theamarks Apr 25, 2022
0463f36
add source column to citation_search_plos() output
theamarks Apr 26, 2022
f2f68ed
added source column in output of citation_search_scopus()
theamarks Apr 26, 2022
0f4c2a2
add source output column to ciation_search_springer() function
theamarks Apr 26, 2022
8b04bb6
removed NA results from citation_search_xdd()
theamarks Apr 26, 2022
d59fa61
Updated test-citation.csv to lowercase xdd. Was filtering out in tests.
theamarks Apr 27, 2022
6950db3
Working on new test-sycthe.R test. Replacing pmap() test on all citat…
theamarks Apr 27, 2022
8074b9a
changed output column name to source in plos search function
theamarks Apr 27, 2022
0a76e0c
renamed source output in plos search function
theamarks Apr 27, 2022
a69ef60
added underscore to output source
theamarks Apr 27, 2022
67c1f42
Added test citation to test-citiatons.csv only found on springer, tha…
theamarks Apr 29, 2022
5c468b5
clean up test file
theamarks Apr 29, 2022
6fd9962
reordered plos search output to align with other search functions
theamarks Apr 29, 2022
76486b6
reordered xdd search output columns to align with other search functions
theamarks Apr 29, 2022
98dac56
delete empty data.frame. Do not need to construct new data.frame for …
theamarks Apr 30, 2022
618a1b8
reorganize scopus search function. Works with single citation search …
theamarks Apr 30, 2022
8c22b2e
convert dplyr filter function to base r in testing script
theamarks Apr 30, 2022
b54c44f
corrected scopus search function when adding source column
theamarks May 2, 2022
60804b6
fixed how source column is added to springer search function
theamarks May 2, 2022
64a7849
Added missing ) to springer search function
theamarks May 2, 2022
0378203
removed uncessary result dataframe column rename for source in plos s…
theamarks May 2, 2022
ee8926b
Fixed error (column names not match) in plos search function rbind.
theamarks May 2, 2022
4aca20e
corrected typo in scopus search function
theamarks May 2, 2022
e08c2dd
changed source column name in scopus search function to align with ot…
theamarks May 6, 2022
c184149
reordered xdd search function internal dataframe columns to match wit…
theamarks May 6, 2022
c16dcb9
change springer search function's column order to align with other so…
theamarks May 6, 2022
9a7014a
reordering xdd results to align with other functions
theamarks May 6, 2022
d17e3bb
Created test for multiple dois in citation_search()
theamarks May 6, 2022
415182e
Wrote out expectations for each individual source search function. On…
theamarks May 6, 2022
71f41a8
removed purr package and changed filter method to dplyr
theamarks May 6, 2022
f7b23d4
Remove single citation tests from test-scythe.R Creating more test sc…
theamarks May 9, 2022
f28b874
Create internal helper function for testing single DOIs across indivi…
theamarks May 9, 2022
1abc100
break test helper functions into two separate functions
theamarks May 9, 2022
afe134d
Fixed citation_test_doi() function to pull dataframe value not select…
theamarks May 10, 2022
2c6c514
Created springer specific test script to test if single known doi cit…
theamarks May 10, 2022
0635335
commented out citation_test_doi() and removed unused line
theamarks May 10, 2022
1979d70
Add api key to citation_test_doi() and change springer test to expect…
theamarks May 10, 2022
7b499a2
added clarification to citation_test_doi() api message if API key not…
theamarks May 10, 2022
77ed5da
Created single doi test script for plos
theamarks May 10, 2022
4799e6c
Add test-xdd scipt and move test-plos to test folder
theamarks May 10, 2022
929f7da
Added test-scopus and changed all library expect terms to expect_equa…
theamarks May 10, 2022
25ab230
Added doi filter into scythe test script. Pass test()
theamarks May 10, 2022
aecb656
updated xdd documentation .rd with lowercase name
theamarks May 10, 2022
8e2e4ba
Generated documentation for helper test functions. NOT exported (user…
theamarks May 10, 2022
ec02232
Created 3rd internal helper function to pull in API keys or to stop t…
theamarks May 10, 2022
ab1af75
updated helper function documentation
theamarks May 10, 2022
34d67a1
trying to account for NULL values in single doi test with API key not…
theamarks May 10, 2022
e8abf2b
Trying to add NULL stop to test-scythe to stop test if API not availa…
theamarks May 10, 2022
40d0479
Added logic to springer and scopus single doi tests to deal with NULL…
theamarks May 10, 2022
7b0c15a
Move citation_test_doi helper function script to test folder
theamarks May 10, 2022
f5b8021
fixed null message in test-scythe
theamarks May 10, 2022
fbaa0a8
switched `library` parameter to `source` for clarify
theamarks May 10, 2022
a79d85d
updated roxygen2 documentation for helper functions
theamarks May 10, 2022
8d6f8a6
moved citation_test_doi.R back to /R, helper functions not accessable…
theamarks May 11, 2022
9c70128
replace get_test_doi() parameter back to library. source messes with …
theamarks May 11, 2022
2a68874
function documentation updated
theamarks May 11, 2022
156bb4f
Renamed citation_test_doi.R to helper-test-functions.R and moved to /…
theamarks May 11, 2022
97296f1
update roxygen2 documentation
theamarks May 11, 2022
9127272
Merge pull request #36 from theamarks/develop
jeanetteclark May 19, 2022
af78707
reformated all scripts within scythe package
theamarks May 19, 2022
89436af
Merge branch 'DataONEorg:develop' into develop
theamarks Jun 6, 2022
49a9aae
added @export to xdd search function
theamarks Jun 6, 2022
4fc2865
Merge branch 'develop' of https://github.com/theamarks/scythe into de…
theamarks Jun 6, 2022
d814a4a
Merge pull request #37 from theamarks/develop
jeanetteclark Jun 6, 2022
59bb194
fix a couple small bugs, one where the total number of results can be…
jeanetteclark Jul 15, 2022
5990d63
housekeeping
jeanetteclark Jul 15, 2022
af408d8
remove rplos dependency
jeanetteclark Aug 6, 2024
72879a5
update setup r GHA
jeanetteclark Aug 6, 2024
e2310ae
remove bash script
jeanetteclark Aug 6, 2024
4051e99
remove message, not needed and too loud
jeanetteclark Aug 15, 2024
a199ab7
make the write citations function more robust
jeanetteclark Aug 15, 2024
a3b1ab5
ignore data folder
jeanetteclark Aug 15, 2024
b1a5a6b
depend on devel version of bib2df for now
jeanetteclark Aug 15, 2024
0661ef5
cordon off the bib2df package into suggests
jeanetteclark Aug 20, 2024
1a44a8a
add optional arg to write_citation_pairs to pass to bib2df
jeanetteclark Aug 26, 2024
e524968
skip scopus and springer tests if no key is set
jeanetteclark Sep 6, 2024
48e979c
update readme and description to prepare for release
jeanetteclark Sep 6, 2024
9c87b4e
make argument checking better, and refactor how the source functions …
jeanetteclark Sep 9, 2024
6115374
fix indentation
jeanetteclark Sep 9, 2024
ff179ed
styling and remove unnecessary lapply
jeanetteclark Sep 9, 2024
a0fd521
formatting
jeanetteclark Sep 9, 2024
0387870
add helper report estimated wait function
jeanetteclark Sep 9, 2024
81244e3
update docs
jeanetteclark Sep 9, 2024
fec558e
assign wait seconds early for easier time changing it if needed
jeanetteclark Sep 9, 2024
816395b
remove unnecessary cleanup
jeanetteclark Sep 9, 2024
5d5e063
add DOI and fix minor issue in springer search
jeanetteclark Sep 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,6 @@
^CONTRIBUTING\.md$
^api-scopus-search.sh$
^\.github$
^logo$
^data$
^results$
2 changes: 1 addition & 1 deletion .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
steps:
- uses: actions/checkout@v2
- uses: r-lib/actions/setup-r@v1
- uses: r-lib/actions/setup-r@v2
- name: Install dependencies
run: |
install.packages(c("remotes", "rcmdcheck"))
Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,6 @@
/results

key.txt

data-raw
# Ignore Vim's swap files
.*.swp
22 changes: 13 additions & 9 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,25 +1,29 @@
Package: scythe
Title: Harvest and register data package citations
Version: 0.9.1
Version: 1.0.0
Authors@R: c(
person("Jeanette", "Clark", role = c("aut", "cre"), email = "jeanetteclark@nceas.ucsb.edu", comment=c(ORCID = "0000-0003-4703-1974")),
person("Jeanette", "Clark", role = c("aut", "cre"), email = "jclark@nceas.ucsb.edu", comment=c(ORCID = "0000-0003-4703-1974")),
person("Matthew B.", "Jones", role = "aut", email = "[email protected]", comment=c(ORCID = "0000-0003-0077-4738")),
person("Maya", "Samet", role = "aut", email = "[email protected]", comment=c(ORCID = "0000-0002-5248-9712"))
person("Maya", "Samet", role = "aut", email = "[email protected]", comment=c(ORCID = "0000-0002-5248-9712")),
person("Althea", "Marks", role = "aut", email = "[email protected]", comment=c(ORCID = "0000-0002-9370-9128"))
)
Description: Harvests data package citations from several API sources, including PLOS, Scopus, and Springer.
Description: Harvests data package citations from several API sources, including PLOS, Scopus, and Springer. This package uses modified functions from `rplos`, which is no longer maintained.
License: Apache License (>= 2.0)
Encoding: UTF-8
LazyData: true
Imports:
bib2df,
curl,
dplyr,
jsonlite,
keyring,
rcrossref,
rplos
Suggests:
solrium,
stats
Remotes: ropensci/bib2df@a8e96e13f5
Suggests:
bib2df,
covr,
purrr,
testthat (>= 2.1.0)
RoxygenNote: 7.1.1
testthat (>= 3.0.0)
RoxygenNote: 7.3.1
Config/testthat/edition: 3
3 changes: 2 additions & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,11 @@ export(citation_search)
export(citation_search_plos)
export(citation_search_scopus)
export(citation_search_springer)
export(citation_search_xdd)
export(scythe_get_key)
export(scythe_set_key)
export(write_citation_pairs)
import(dplyr)
importFrom(curl,curl)
importFrom(jsonlite,fromJSON)
importFrom(rplos,searchplos)
importFrom(stats,complete.cases)
40 changes: 20 additions & 20 deletions R/citation_search.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,42 +11,42 @@
#' result <- citation_search(identifiers, sources = c("plos"))
#' }
citation_search <- function(identifiers,
sources = c("plos", "scopus", "springer")) {

if(!("character" %in% class(identifiers))){
sources = c("plos", "scopus", "springer", "xdd")) {
if (!("character" %in% class(identifiers))) {
jeanetteclark marked this conversation as resolved.
Show resolved Hide resolved
stop("Identifiers must be a character vector.")
}

# run the 'citation_search_*' function for each source
jeanetteclark marked this conversation as resolved.
Show resolved Hide resolved
for (source in sources) {
search_function <- paste0(source, " <- citation_search_", source, "(identifiers)")
search_function <-
paste0(source, " <- citation_search_", source, "(identifiers)")
eval(parse(text = search_function))
}

# combine all of the resulting data frames and return the result df
bind_function <- paste0("rbind(", paste0(sources, collapse = ","), ")")
bind_function <-
paste0("rbind(", paste0(sources, collapse = ","), ")")
result <- eval(parse(text = bind_function))

return(result)

}

# Check identifiers to remove characters that interfere with query strings

check_identifiers <- function(identifiers){
if (any(!grepl("10\\.|urn:uuid", identifiers))){
warning(call. = FALSE,
"One or more identifiers does not appear to be a DOI or uuid",
immediate. = TRUE)
check_identifiers <- function(identifiers) {
if (any(!grepl("10\\.|urn:uuid", identifiers))) {
warning(
call. = FALSE,
"One or more identifiers does not appear to be a DOI or uuid",
immediate. = TRUE
)
}

if (any(grepl("doi:|urn:uuid", identifiers))){
if (any(grepl("doi:|urn:uuid", identifiers))) {
identifiers <- gsub("(doi:)|(urn:uuid:)", "", identifiers)
message("Identifier prefix (doi: or urn:uuid) has been stripped out of the search term.")
}


return(identifiers)
}


199 changes: 165 additions & 34 deletions R/citation_search_plos.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,57 +2,188 @@
#'
#' This function searches for citations in PLOS. Requests are throttled
#' at one identifier every 6 seconds so as to not overload the PLOS
#' API.
#' API. This function uses modified source code from the `rplos` package,
#' which is no longer maintained.
#'
#' @param identifiers a vector of identifiers to be searched for
#'
#' @return tibble of matching dataset and publication identifiers
#' @export
#' @importFrom rplos searchplos
#' @examples
#' \dontrun{
#' identifiers <- c("10.18739/A22274", "10.18739/A2D08X", "10.5063/F1T151VR")
#' result <- citation_search_plos(identifiers)
#' }
citation_search_plos <- function(identifiers) {
if (length(identifiers) > 1){
message(paste0("Your result will take ~", length(identifiers)*6 ," seconds to return,
since this function is rate limited to one call every 6 seconds."))
if (length(identifiers) > 1) {
message(
paste0(
"Your result will take ~",
length(identifiers) * 6 ,
" seconds to return,
since this function is rate limited to one call every 6 seconds."
)
)
}

identifiers <- check_identifiers(identifiers)

# encode colons to not break PLOS API
identifiers <- gsub(":", "%3A", identifiers)
jeanetteclark marked this conversation as resolved.
Show resolved Hide resolved

# search for identifier
results <- lapply(identifiers, function(x) {
Sys.sleep(6)
jeanetteclark marked this conversation as resolved.
Show resolved Hide resolved
v <- searchplos(q = x,
fl = c("id", "title"),
limit = 1000)
return(v)

})

plos_results <- list()
# assign dataset identifier to each result
for (i in 1:length(results)) {
if (results[[i]]$meta$numFound == 0 | is.null(results[[i]])) {
plos_results[[i]] <- data.frame(
id = NA,
dataset_id = identifiers[i],
title = NA,
source = "plos"
)
}
else if (results[[i]]$meta$numFound > 0) {
plos_results[[i]] <- results[[i]]$data
plos_results[[i]]$dataset_id <- identifiers[i]
plos_results[[i]]$source <- "plos"
}

}

# bind resulting tibbles
plos_results <- do.call(rbind, plos_results)
names(plos_results)[which(names(plos_results) == "id")] <-
"article_id"
names(plos_results)[which(names(plos_results) == "title")] <-
"article_title"
plos_results <-
plos_results[stats::complete.cases(plos_results),] # remove incomplete cases (NAs)

return(plos_results)
}

identifiers <- check_identifiers(identifiers)

# search for identifier
results <- lapply(identifiers, function(x){
Sys.sleep(6)
v <- rplos::searchplos(q = x,
fl = c("id","title"),
limit = 1000)
return(v)

#' A Modified Version of rplos::searchplos
#'
#' This function is adapted from the searchplos in the `rplos` package, which is no longer maintained.
#'
#' @param q Search terms, eg: field:query
#' @param fl Fields to return
#' @param fq Fields to filter query on
#' @param sort Sort results according to field
#' @param start Record to start at for pagination
#' @param limit Number of results to return for pagination
#' @param sleep Seconds to wait between requests
#' @param errors One of simple or complete
#' @param proxy List of args for proxy connection
#' @param callopts Optional curl options
#' @param progress Optional logic for progress bar
#' @param ... Addtl Solr arguments
searchplos <- function(q = NULL, fl = 'id', fq = NULL, sort = NULL, start = 0,
limit = 10, sleep = 6, errors = "simple", proxy = NULL, callopts = list(),
progress = NULL, ...) {

# Make sure limit is a numeric or integer
limit <- tryCatch(as.numeric(as.character(limit)), warning=function(e) e)
if("warning" %in% class(limit)){
stop("limit should be a numeric or integer class value", call. = FALSE)
}
if(!inherits(limit, "numeric") | is.na(limit))
stop("limit should be a numeric or integer class value", call. = FALSE)

if (is.null(limit)) limit <- 999
if (limit == 0) fl <- NULL
fl <- paste(fl, collapse = ",")

args <- list()
if (!is.null(fq[[1]])) {
if (length(fq) == 1) {
args$fq <- fq
} else {
args <- fq
names(args) <- rep("fq",length(args))
}
}
args <- c(args, ploscompact(list(q = q, fl = fl, start = as.integer(start),
rows = as.integer(limit), sort = sort, wt = 'json')))

conn_plos <- solrium::SolrClient$new(host = "api.plos.org", path = "search", port = NULL)

getnum_tmp <- suppressMessages(
conn_plos$search(params = list(q = q, fl = fl, rows = 0, wt = "json"))
)

plos_results <- list()
# assign dataset identifier to each result
for (i in 1:length(results)){
if (results[[i]]$meta$numFound == 0 | is.null(results[[i]])){
plos_results[[i]] <- data.frame(id = NA,
dataset_id = identifiers[i],
title = NA)
getnumrecords <- attr(getnum_tmp, "numFound")

if (getnumrecords > limit) {
getnumrecords <- limit
} else {
getnumrecords <- getnumrecords
}

if (min(getnumrecords, limit) < 1000) {
if (!is.null(limit)) args$rows <- limit
if (length(args) == 0) args <- NULL
jsonout <- suppressMessages(
conn_plos$search(params = args, callopts = callopts,
minOptimizedRows = FALSE, progress = progress, ...)
)
meta <- dplyr::tibble(
numFound = attr(jsonout, "numFound"),
start = attr(jsonout, "start")
)
return(list(meta = meta, data = jsonout))
} else {
byby <- 500
getvecs <- seq(from = 0, to = getnumrecords - 1, by = byby)
lastnum <- as.numeric(strextract(getnumrecords, "[0-9]{3}$"))
if (lastnum == 0)
lastnum <- byby
if (lastnum > byby) {
lastnum <- getnumrecords - getvecs[length(getvecs)]
} else {
lastnum <- lastnum
}
else if (results[[i]]$meta$numFound > 0){
plos_results[[i]] <- results[[i]]$data
plos_results[[i]]$dataset_id <- identifiers[i]
getrows <- c(rep(byby, length(getvecs) - 1), lastnum)
out <- list()
for (i in seq_along(getvecs)) {
args$start <- as.integer(getvecs[i])
args$rows <- as.integer(getrows[i])
if (length(args) == 0) args <- NULL
jsonout <- suppressMessages(conn_plos$search(
params = ploscompact(list(q = args$q, fl = args$fl,
fq = args[names(args) == "fq"],
sort = args$sort,
rows = as.integer(args$rows), start = as.integer(args$start),
wt = "json")), minOptimizedRows = FALSE, callopts = callopts,
progress = progress, ...
))
out[[i]] <- jsonout
}

resdf <- dplyr::bind_rows(out)
meta <- dplyr::tibble(
numFound = attr(jsonout, "numFound"),
start = attr(jsonout, "start")
)
return(list(meta = meta, data = resdf))
}
}
#' This function is from the `rplos` package, which is no longer maintained.
#' @param l a list
ploscompact <- function(l) Filter(Negate(is.null), l)

# bind resulting tibbles
plos_results <- do.call(rbind, plos_results)
names(plos_results)[which(names(plos_results) == "id")] <- "article_id"
names(plos_results)[which(names(plos_results) == "title")] <- "article_title"
plos_results <- plos_results[complete.cases(plos_results), ]

return(plos_results)
#' This function is from the `rplos` package, which is no longer maintained.
#'
#' @param str A string
#' @param pattern A regex pattern
strextract <- function(str, pattern) {
regmatches(str, regexpr(pattern, str))
}
Loading
Loading