Elsvier Coordinate Extraction

This package provides tools to search, download, and extract coordinates from Elseivier articles.

Installation

pip install elsvier-coordinate-extraction

or a local install:

git clone https://github.com/yourusername/elsevier-coordinate-extraction.git
cd elsevier-coordinate-extraction
pip install -e .

Usage

from elsvier_coordinate_extraction import search_articles, download_articles, extract_coordinates

# Search for articles
articles = search_articles(query="fmri", max_results=5)

# Download full-text XML for the first article using its DOI/PMID
records = [{"doi": articles[0]["doi"], "pmid": articles[0].get("pmid")}]  # type: ignore[index]
downloaded = download_articles(records)

# Extract coordinates
coordinates = extract_coordinates(downloaded)
print(coordinates)

Command-Line Interface

After installing the package, the elsevier-extract script becomes available via pip install . (or from PyPI). It accepts three mutually exclusive identifier inputs:

--pmids for comma-separated PMIDs or a text file containing one PMID per line
--dois for comma-separated DOIs or a text file containing one DOI per line
--jsonl for a JSON Lines file where each line is {"doi": "...", "pmid": "..."}

Additional flags allow users to skip writing specific outputs (--skip-xml, --skip-text, --skip-tables, --skip-coordinates), continue past failures (--continue-on-error), disable caching (--no-cache), or adjust verbosity (-v/--verbose, -q/--quiet). --output-dir controls the base directory for results, and the CLI honors ELSEVIER_EXTRACTION_WORKERS when no --max-workers override is provided.

Output layout

Each article is saved under output-dir/{identifier} where {identifier} is the filesystem-friendly DOI (slashes replaced with _) or PMID. Inside that directory you will find:

article.xml – the raw XML payload
metadata.json – download metadata, rate-limit snapshot, and supplementary attachments
text.txt – formatted article text (title/abstract/body)
coordinates.json – NIMADS-style evaluation of extracted coordinates
tables/*.csv – extracted tables named after their labels/captions

The CLI also appends every run to manifest.jsonl (with status, timing, and file list) and records failures in errors.jsonl, enabling audit and resumable processing.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
elsevier_coordinate_extraction		elsevier_coordinate_extraction
evaluation		evaluation
tests		tests
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
BUILD_STRATEGY.md		BUILD_STRATEGY.md
DEVELOPMENT_PLAN.md		DEVELOPMENT_PLAN.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Elsvier Coordinate Extraction

Installation

Usage

Command-Line Interface

Output layout

About

Uh oh!

Releases

Packages

Languages

neurostuff/elsevier_coordinate_extractor

Folders and files

Latest commit

History

Repository files navigation

Elsvier Coordinate Extraction

Installation

Usage

Command-Line Interface

Output layout

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages