This package provides tools to search, download, and extract coordinates from Elseivier articles.
pip install elsvier-coordinate-extraction
or a local install:
git clone https://github.com/yourusername/elsevier-coordinate-extraction.git
cd elsevier-coordinate-extraction
pip install -e .from elsvier_coordinate_extraction import search_articles, download_articles, extract_coordinates
# Search for articles
articles = search_articles(query="fmri", max_results=5)
# Download full-text XML for the first article using its DOI/PMID
records = [{"doi": articles[0]["doi"], "pmid": articles[0].get("pmid")}] # type: ignore[index]
downloaded = download_articles(records)
# Extract coordinates
coordinates = extract_coordinates(downloaded)
print(coordinates)After installing the package, the elsevier-extract script becomes available via pip install . (or from PyPI). It accepts three mutually exclusive identifier inputs:
--pmidsfor comma-separated PMIDs or a text file containing one PMID per line--doisfor comma-separated DOIs or a text file containing one DOI per line--jsonlfor a JSON Lines file where each line is{"doi": "...", "pmid": "..."}
Additional flags allow users to skip writing specific outputs (--skip-xml, --skip-text, --skip-tables, --skip-coordinates), continue past failures (--continue-on-error), disable caching (--no-cache), or adjust verbosity (-v/--verbose, -q/--quiet). --output-dir controls the base directory for results, and the CLI honors ELSEVIER_EXTRACTION_WORKERS when no --max-workers override is provided.
Each article is saved under output-dir/{identifier} where {identifier} is the filesystem-friendly DOI (slashes replaced with _) or PMID. Inside that directory you will find:
article.xml– the raw XML payloadmetadata.json– download metadata, rate-limit snapshot, and supplementary attachmentstext.txt– formatted article text (title/abstract/body)coordinates.json– NIMADS-style evaluation of extracted coordinatestables/*.csv– extracted tables named after their labels/captions
The CLI also appends every run to manifest.jsonl (with status, timing, and file list) and records failures in errors.jsonl, enabling audit and resumable processing.