Skip to content

Latest commit

 

History

History
135 lines (87 loc) · 6.18 KB

README.md

File metadata and controls

135 lines (87 loc) · 6.18 KB

Vega Datasets

npm version Build Status

Vega Datasets is the centralized hub for over 70 datasets featured in the examples and documentation of Vega, Vega-Lite, Altair and related projects. A dataset catalog conforming to the Data Package Standard v2 provides information on data structure, sourcing, and licensing. Generation scripts document data provenance and transformation, enabling reproducibility and transparency throughout the data preparation process. Each dataset is curated to illustrate essential visualization concepts, statistical methods, or domain-specific applications.

This data lives at https://github.com/vega/vega-datasets and can be accessed via CDN at https://cdn.jsdelivr.net/npm/vega-datasets.

Contributing

Modifications of existing datasets should be kept to a minimum as other projects (Vega, Vega Editor, Vega-Lite, Polestar, Voyager) use this data in their tests and examples. Contributions of new datasets, documentation, scripts, corrections and bug fixes are encouraged. Please review the contribution guidelines.

Important

Dataset Licensing: Each dataset hosted in this repository maintains its original license as documented in the datapackage metadata. While we've made efforts to provide accurate licensing information, this metadata should be considered a starting point rather than definitive guidance. Users should verify their intended use complies with original source licensing terms.

Installation

Install Vega Datasets via npm:

npm install vega-datasets

Usage

HTTP Direct Access

You can get the data directly via HTTP served by GitHub or jsDelivr (a fast CDN):

You can find a full listing of available datasets at https://cdn.jsdelivr.net/npm/vega-datasets/data/.

Using ESM Import

import data from 'vega-datasets';

const cars = await data['cars.json']();
// equivalent to
// const cars = await (await fetch(data['cars.json'].url)).json();

console.log(cars);

In Vega/Vega-Lite Specifications

Reference a dataset via URL:

{
  "data": {
    "url": "https://cdn.jsdelivr.net/npm/vega-datasets@latest/data/cars.json"
  },
  "mark": "point",
  "encoding": {
    "x": {"field": "Horsepower", "type": "quantitative"},
    "y": {"field": "Miles_per_Gallon", "type": "quantitative"}
  }
}

Language Interfaces

Available Datasets

Repository highlights include:

For the complete list and details, see the data directory or review the datapackage.md file.

Dataset Information

Each dataset comes with:

  • Detailed Metadata: Source, structure, and licensing information, following Data Package Standard v2 for enhanced interoperability.
  • Generation Scripts: Automation tools that facilitate data processing and updates, ensuring consistency and reproducibility.

Further information is available in datapackage.md (human-readable) and datapackage.json (machine-readable).

Example Galleries

Visualizations built with these datasets are showcased in several galleries:

Data Usage Note

  • The datasets are designed for instructional and demonstration purposes.
  • Some datasets include intentional inconsistencies to offer opportunities for data cleaning exercises.

Versioning

Vega Datasets follows semantic versioning with additional data-specific guidelines:

  • Patch Releases: Minor formatting or documentation updates without changes to the data.
  • Minor Releases: Data content updates that maintain existing file and field names, including new datasets.
  • Major Releases: Potential changes to file names or removal of datasets that may break backward compatibility.

Development and Release

For development setup:

npm install

For releasing:

npm run release

License

The repository code is licensed under the BSD-3-Clause License. Note that individual datasets have distinct licensing terms as specified in their metadata.

Acknowledgments

Appreciation is extended to the numerous organizations and individuals who have generously shared their data for use in this collection.