Vega Datasets is the centralized hub for over 70 datasets featured in the examples and documentation of Vega, Vega-Lite, Altair and related projects. A dataset catalog conforming to the Data Package Standard v2 provides information on data structure, sourcing, and licensing. Generation scripts document data provenance and transformation, enabling reproducibility and transparency throughout the data preparation process. Each dataset is curated to illustrate essential visualization concepts, statistical methods, or domain-specific applications.
This data lives at https://github.com/vega/vega-datasets and can be accessed via CDN at https://cdn.jsdelivr.net/npm/vega-datasets.
Modifications of existing datasets should be kept to a minimum as other projects (Vega, Vega Editor, Vega-Lite, Polestar, Voyager) use this data in their tests and examples. Contributions of new datasets, documentation, scripts, corrections and bug fixes are encouraged. Please review the contribution guidelines.
Important
Dataset Licensing: Each dataset hosted in this repository maintains its original license as documented in the datapackage metadata. While we've made efforts to provide accurate licensing information, this metadata should be considered a starting point rather than definitive guidance. Users should verify their intended use complies with original source licensing terms.
Install Vega Datasets via npm:
npm install vega-datasets
You can get the data directly via HTTP served by GitHub or jsDelivr (a fast CDN):
- GitHub: https://vega.github.io/vega-datasets/data/cars.json
- jsDelivr (with fixed version, recommended): https://cdn.jsdelivr.net/npm/vega-datasets@3/data/cars.json
You can find a full listing of available datasets at https://cdn.jsdelivr.net/npm/vega-datasets/data/.
import data from 'vega-datasets';
const cars = await data['cars.json']();
// equivalent to
// const cars = await (await fetch(data['cars.json'].url)).json();
console.log(cars);
Reference a dataset via URL:
{
"data": {
"url": "https://cdn.jsdelivr.net/npm/vega-datasets@latest/data/cars.json"
},
"mark": "point",
"encoding": {
"x": {"field": "Horsepower", "type": "quantitative"},
"y": {"field": "Miles_per_Gallon", "type": "quantitative"}
}
}
- JavaScript/Observable: Directly import Vega Datasets in Observable. See the example notebook.
- Python: Access datasets using the Vega Datasets Python package.
- Julia: Utilize the VegaDatasets.jl package for Julia integrations.
Repository highlights include:
- Geographic data (world maps, US states, country boundaries)
- Economic indicators (unemployment, stock data, budgets)
- Scientific measurements (weather patterns, earthquake data)
- Statistical examples (Anscombe's quartet, iris dataset)
- Historical records (wheat prices, monarch data)
For the complete list and details, see the data directory or review the datapackage.md file.
Each dataset comes with:
- Detailed Metadata: Source, structure, and licensing information, following Data Package Standard v2 for enhanced interoperability.
- Generation Scripts: Automation tools that facilitate data processing and updates, ensuring consistency and reproducibility.
Further information is available in datapackage.md (human-readable) and datapackage.json (machine-readable).
Visualizations built with these datasets are showcased in several galleries:
- The datasets are designed for instructional and demonstration purposes.
- Some datasets include intentional inconsistencies to offer opportunities for data cleaning exercises.
Vega Datasets follows semantic versioning with additional data-specific guidelines:
- Patch Releases: Minor formatting or documentation updates without changes to the data.
- Minor Releases: Data content updates that maintain existing file and field names, including new datasets.
- Major Releases: Potential changes to file names or removal of datasets that may break backward compatibility.
For development setup:
npm install
For releasing:
npm run release
The repository code is licensed under the BSD-3-Clause License. Note that individual datasets have distinct licensing terms as specified in their metadata.
Appreciation is extended to the numerous organizations and individuals who have generously shared their data for use in this collection.