Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
83cf3f2
adding denmark data
balit-raibot Dec 11, 2025
9fc9408
Merge branch 'master' into denmark_census
balit-raibot Jan 26, 2026
ba12706
standardizing file names
balit-raibot Jan 26, 2026
0515e4a
removing old files
balit-raibot Jan 26, 2026
af14a8d
added two imports
balit-raibot Jan 26, 2026
ced166d
resolving comments
balit-raibot Jan 26, 2026
3e62f91
ignored age above 105
niveditasing Jan 29, 2026
e68d166
ignored age above 105
niveditasing Jan 29, 2026
c2835a1
removing stat_vars.mcf file
smarthg-gi Feb 16, 2026
3cb8ed6
Updating directory structure
smarthg-gi Feb 16, 2026
5fc020e
Updating directory structure
smarthg-gi Feb 16, 2026
7bac0f6
Adding Readme file
smarthg-gi Feb 16, 2026
076957b
reorg folder structure
balit-raibot Feb 16, 2026
0467aba
added future dates
balit-raibot Feb 16, 2026
2f333cf
updated README
balit-raibot Feb 16, 2026
4c9b6fc
Merge branch 'master' into denmark_census
balit-raibot Feb 25, 2026
ed1713b
Merge branch 'master' into denmark_census
balit-raibot Mar 12, 2026
16ec59f
Merge branch 'master' into denmark_census
balit-raibot Mar 15, 2026
83c79ef
converted manual downloading to automatic
balit-raibot Mar 16, 2026
b6bd1a6
Merge branch 'denmark_census' of https://github.com/balit-raibot/data…
balit-raibot Mar 16, 2026
7780a0a
adding manifest.json
balit-raibot Mar 16, 2026
fe2e2d3
adding manifest.json
balit-raibot Mar 16, 2026
f905144
added a flag to download all data instead of current and previous year
balit-raibot Mar 16, 2026
4eb4232
increased timeout from 1 hour to 10 hours
balit-raibot Mar 16, 2026
2d4e5e3
refactored download script to download country level stats
balit-raibot Mar 17, 2026
b90c413
Merge branch 'master' into denmark_census
balit-raibot Mar 17, 2026
bb91162
modified manifest resource limits
balit-raibot Mar 17, 2026
1a276dc
Merge branch 'denmark_census' of https://github.com/balit-raibot/data…
balit-raibot Mar 17, 2026
b4afc92
Merge branch 'master' into denmark_census
balit-raibot Mar 17, 2026
44dab64
Merge branch 'master' into denmark_census
balit-raibot Mar 18, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions statvar_imports/denmark_demographics/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Statistics Denmark Demographics Dataset
## Overview
This dataset contains demographic statistics for the population of Denmark, sourced from Statistics Denmark. It includes two primary datasets covering quarterly and annual population breakdowns across various dimensions like geography (regions and municipalities), sex, age, and marital status.

The import covers:
- **Population (Quarterly):** Population count by region, marital status, age, and sex at the first day of each quarter (Table FOLK1A).
- **Population (Annual):** Population count by sex and age groups.

Type of place: Country

## Data Source
**Source URL:**
- Main Portal: https://www.statbank.dk/statbank5a/default.asp?w=1396
- Specific Table (FOLK1A): https://www.statbank.dk/FOLK1A

**Provenance Description:**
The data is provided by Statistics Denmark, the central authority for Danish statistics. The population figures are derived from the Central Person Register (CPR) and reflect the population residing in Denmark on the first day of the period.

## How To Download Input Data
To download the data manually:
1. Go to the [StatBank Denmark Portal](https://www.statbank.dk/statbank5a/default.asp?w=1396).
2. Browse or search for the desired population tables. For quarterly demographics, search for table **FOLK1A** (Population at the first day of the quarter).
3. Select the desired variables:
- **Region:** All Denmark.
- **Marital Status:** Total, Never married, Married/separated, Widowed, Divorced.
- **Age:** Individual ages or age groups.
- **Sex:** Men, Women.
- **Time:** Quarters.
4. Click "Show table" and then "Download" to save as CSV.

## Processing Instructions
To process the Denmark Demographics data and generate statistical variables, use the following command:

**For Data Run (Quarterly Run)**
```python ../../tools/statvar_importer/stat_var_processor.py \
--input_data='gs://unresolved_mcf/country/denmark/input_files/population_quarterly_region_time_marital_status_input.csv' \
--pv_map='population_quartely_region_time_marital_status_pvmap.csv' \
--output_path='population_quartely_region_time_marital_status_output' \
--config_file='denmark_demographics_metadata.csv' \
--existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf
```
**For Data Run (Annual Run)**
```python ../../tools/statvar_importer/stat_var_processor.py \
--input_data='gs://unresolved_mcf/country/denmark/input_files/population_sex_age_time_input.csv' \
--pv_map='population_sex_age_time_pvmap.csv' \
--output_path='population_sex_age_time_output' \
--config_file='denmark_demographics_metadata.csv' \
--existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf
```

This generates the following output files for the first time run:
- output.csv
- output_stat_vars_schema.mcf
- output_stat_vars.mcf
- output.tmcf

## Data Quality Checks and Validation
Validation is performed using the Data Commons import tool:

```bash
java -jar datacommons-import-tool-0.1-jar-with-dependencies.jar lint \
output_stat_vars_schema.mcf \
output.csv \
output.tmcf \
output_stat_vars.mcf
```

The tool generates a `report.json`, `summary_report.csv`, and `summary_report.html` which can be used to identify errors or warnings in the generated data.
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
parameter,value
output_columns,"observationDate,value,observationAbout,variableMeasured"
dc_api_root,https://api.datacommons.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
import requests
import pandas as pd
import itertools
import os

# --- CONFIGURATION ---
url = "https://api.statbank.dk/v1/data"
output_dir = "./input_files/"
table_id = "FOLK1A"

if not os.path.exists(output_dir):
os.makedirs(output_dir)

payload = {
"table": table_id,
"format": "JSONSTAT",
"lang": "en",
"variables": [
{"code": "OMRÅDE", "values": ["000"]}, # All of Denmark
{"code": "KØN", "values": ["*"]},
{"code": "ALDER", "values": ["*"]},
{"code": "CIVILSTAND", "values": ["*"]},
{"code": "Tid", "values": ["*"]}
]
}

def find_key_recursive(source_dict, target_key):
if target_key in source_dict: return source_dict[target_key]
for key, value in source_dict.items():
if isinstance(value, dict):
found = find_key_recursive(value, target_key)
if found is not None: return found
return None

response = requests.post(url, json=payload)

if response.status_code == 200:
full_data = response.json()
dims = find_key_recursive(full_data, 'dimension')
vals = find_key_recursive(full_data, 'value')

if dims and vals:
ids = find_key_recursive(full_data, 'id') or list(dims.keys())
role = find_key_recursive(full_data, 'role') or {}
metric_ids = role.get('metric', [])

dim_list = []
col_names = []

for d_id in ids:
if d_id in metric_ids or d_id.lower() in ['indhold', 'contents']: continue
labels = dims[d_id]['category']['label']
dim_list.append(list(labels.values()))
col_names.append(d_id)

# Build the DataFrame
df = pd.DataFrame(list(itertools.product(*dim_list)), columns=col_names)
df['Value'] = vals

# Renaming and Cleanup
df = df.rename(columns={'OMRÅDE': 'Region', 'ALDER': 'Age', 'CIVILSTAND': 'Marital_Status', 'Tid': 'Quarter', 'KØN': 'Sex'})
df.loc[df['Sex'] == 'Total', 'Sex'] = 'Gender_Total'
df.loc[df['Marital_Status'] == 'Total', 'Marital_Status'] = 'Marital_Total'

filename = f'population_quarterly_region_time_marital_status_input.csv'
df.to_csv(os.path.join(output_dir, filename), index=False)
print(f"Done! Saved {len(df)} rows to {filename}")
else:
print(f"Error: {response.status_code} - {response.text}")
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
import requests
import pandas as pd
import os
from io import StringIO
import re

# --- CONFIGURATION ---
url = "https://api.statbank.dk/v1/data"
output_dir = "./input_files/"
table_id = "BEFOLK2"

if not os.path.exists(output_dir):
os.makedirs(output_dir)

# --- FETCH DATA ---
payload = {
"table": table_id,
"format": "BULK",
"lang": "en",
"variables": [
{"code": "KØN", "values": ["*"]},
{"code": "ALDER", "values": ["*"]},
{"code": "Tid", "values": ["*"]}
]
}

response = requests.post(url, json=payload)

if response.status_code == 200:
df = pd.read_csv(StringIO(response.text), sep=';')
sex_col, age_col, time_col, val_col = df.columns

# 1. DYNAMIC SEX SORTING (Total -> Men -> Women)
# We look for "Total" dynamically, then assume the rest are Men/Women
sex_order = sorted(df[sex_col].unique(), key=lambda x: 0 if 'total' in str(x).lower() else 1)
# If the API returns Men/Women, this ensures 'Total' is index 0
df[sex_col] = pd.Categorical(df[sex_col], categories=sex_order, ordered=True)

# 2. DYNAMIC AGE SORTING (Age, total -> 0-4 -> 5-9...)
def get_age_rank(age_str):
age_str = str(age_str).lower()
if 'total' in age_str:
return -1
nums = re.findall(r'\d+', age_str)
return int(nums[0]) if nums else 999

# Create a temporary sort key
df['age_sort'] = df[age_col].apply(get_age_rank)

# 3. DYNAMIC YEAR SORTING
# Ensure years are integers so 1901 comes before 2026
df[time_col] = df[time_col].apply(lambda x: int(re.search(r'\d+', str(x)).group()))

# Sort the dataframe before pivoting
df = df.sort_values([sex_col, 'age_sort', time_col])

# 4. PIVOT
# We drop the age_sort key during pivot to keep the output clean
df_pivot = df.pivot_table(
index=[sex_col, age_col],
columns=time_col,
values=val_col,
aggfunc='first',
sort=False # CRITICAL: Keeps our manual sort order
).reset_index()
df_pivot = df_pivot.rename(columns={'ALDER': 'Age', 'KØN': 'Sex'})

# --- SAVE ---
filename = "population_sex_age_time_input.csv"
save_path = os.path.join(output_dir, filename)
df_pivot.to_csv(save_path, index=False, encoding='utf-8-sig')

print(f"File saved successfully: {save_path}")

else:
print(f"Request failed: {response.status_code}")
34 changes: 34 additions & 0 deletions statvar_imports/denmark_demographics/manifest.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
{
"import_specifications": [
{
"import_name": "Denmark_Demographics",
"curator_emails": [
"support@datacommons.org"
],
"provenance_url": "https://www.statbank.dk/statbank5a/default.asp?w=1280",
"provenance_description": "Population data for Denmark from Statbank",
"scripts": [
"download_population_quarterly_region_time_marital_status.py",
"download_population_sex_age_time.py",
"../../tools/statvar_importer/stat_var_processor.py --input_data=./input_files/population_quarterly_region_time_marital_status_input.csv --pv_map=./population_quarterly_region_time_marital_status_pvmap.csv --config_file=./denmark_demographics_metadata.csv --output_path=./output/population_quarterly_region_time_marital_status_output --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf",
"../../tools/statvar_importer/stat_var_processor.py --input_data=./input_files/population_sex_age_time_input.csv --pv_map=./population_sex_age_time_pvmap.csv --config_file=./denmark_demographics_metadata.csv --output_path=./output/population_sex_age_time_output --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf"
],
"import_inputs": [
{
"template_mcf": "output/population_sex_age_time_output.tmcf",
"cleaned_csv": "output/*_output.csv"
}
],
"source_files": [
"./input_files/*.csv"
],
"user_script_timeout": 36000,
"cron_schedule": "0 10 20 2,5,8,11 *",
"resource_limits": {
"cpu": 8,
"memory": 32,
"disk": 100
}
}
]
}
Loading
Loading