This repository contains code for cleaning, enriching and automatically generating reports on the Finnish national bibliography, Fennica.
This snapshot of the Fennica dataset includes bibliographic metadata for over 70,000 documents between 1488-1955, representing the publishing activity in Finland during that period. This is analyzed in parallel with Kungliga, a related collection of bibliographic metadata from the Swedish National library. In the future it will include the whole dataset, from earliest documents to the current day.
Copy the repository to your computer:
# In terminal / GIT
git clone https://github.com/fennicahub/fennica.git
Another alternative is to download the master branch from the repository front page in GitHub: <> Code -> Download ZIP.
Go to the cloned git repository or extracted zip folder and run R. The following example assumes that the folder was downloaded to user's home folder:
cd fennica
R
Another option is to open an IDE and set the working directory to fennica folder. In RStudio this can be done in the Files tab by changing the folder to fennica folder, clicking the gear icon and selecting "Set as working directory". Alternatively, from the R Console:
# See current working directory
getwd()
# Set working directory to fennica, assuming that fennica folder is in your current folder
setwd("fennica")
Install the necessary dependencies:
install.packages("devtools")
library(devtools)
# Install deps for the current project
devtools::install_local(".")
devtools::install_deps(".", dependencies = TRUE)
devtools::install_github("comhis/comhis")
Render the bookdown document:
bookdown::render_book("inst/examples")
Open the rendered book in your browser.
Alternatively, you can view the same live document deployed in a CSC Rahti container: http://fennica-fennica.rahtiapp.fi
The bookdown document is rendered with GitHub Actions. The generated files are placed in gh-pages branch in the GitHub Repository. The generated files are copied to Rahti by utilizing a webhook and are hosted on an nginx server.
The generated bookdown document consists of 20 different sections, or "chapters". Different sections focus on different fields from the MARC formatted raw data MARC. Most chapters also have visualizations that give a quick glance on what the data looks like. For most fields processed CSV datasets can also be downloaded for further analyses.
The data is summarized in the following automatically generated files:
- Fennica: a generic overview
- Presentation slide templates (PDF) and code
- A Quantitative Approach to Book Printing in Sweden and Finland, 1640–1828 Source code for the figures
- Knowledge production in Finland 1470-1828: Digital Humanities 2016 conference presentation slides (PDF) and code
- Figures and analyses for CCQ2019
The analyses cover several steps including XML parsing, data harmonization, removing unrecognized entries, enriching and organizing the data, carrying out statistical summaries, analysis, visualization and automated document generation.
The analyses and full source code are provided in this repository and can be freely reused under the BSD 2 clause (FreeBSD) open source licence. The analyses are based on R and rely on various R packages.
For original raw data, see National Library of Finland.
Email: [email protected]
The project is under active open development.
- Issues and bug reports
- Pull requests (we will acknowledge contributions)
The project is collaboration between Helsinki Computational History Group (COMHIS) (University of Helsinki) and Turku Data Science Group (University of Turku).
Main contributors:
Special thanks:
- Finnish National library (Fennica data collection)
- VRK (Finnish population register) Finnish first name-gender mappings and demographic information
- Maanmittauslaitos Geographic information
- Tilastokeskus Demographic information
- Open Street Map Geographic information
- [Google Maps] Geographic information
