datacommonsorg · balit-raibot · Dec 11, 2025 · Jan 26, 2026 · Jan 26, 2026 · Jan 26, 2026
diff --git a/statvar_imports/denmark_demographics/README.md b/statvar_imports/denmark_demographics/README.md
@@ -0,0 +1,68 @@
+# Statistics Denmark Demographics Dataset
+## Overview
+This dataset contains demographic statistics for the population of Denmark, sourced from Statistics Denmark. It includes two primary datasets covering quarterly and annual population breakdowns across various dimensions like geography (regions and municipalities), sex, age, and marital status.
+
+The import covers:
+- **Population (Quarterly):** Population count by region, marital status, age, and sex at the first day of each quarter (Table FOLK1A).
+- **Population (Annual):** Population count by sex and age groups.
+
+Type of place: Country
+
+## Data Source
+**Source URL:**
+- Main Portal: https://www.statbank.dk/statbank5a/default.asp?w=1396
+- Specific Table (FOLK1A): https://www.statbank.dk/FOLK1A
+
+**Provenance Description:**
+The data is provided by Statistics Denmark, the central authority for Danish statistics. The population figures are derived from the Central Person Register (CPR) and reflect the population residing in Denmark on the first day of the period.
+
+## How To Download Input Data
+To download the data manually:
+1. Go to the [StatBank Denmark Portal](https://www.statbank.dk/statbank5a/default.asp?w=1396).
+2. Browse or search for the desired population tables. For quarterly demographics, search for table **FOLK1A** (Population at the first day of the quarter).
+3. Select the desired variables:
+   - **Region:** All Denmark.
+   - **Marital Status:** Total, Never married, Married/separated, Widowed, Divorced.
+   - **Age:** Individual ages or age groups.
+   - **Sex:** Men, Women.
+   - **Time:** Quarters.
+4. Click "Show table" and then "Download" to save as CSV.
+
+## Processing Instructions
+To process the Denmark Demographics data and generate statistical variables, use the following command:
+
+**For Data Run (Quarterly Run)**
+```python ../../tools/statvar_importer/stat_var_processor.py \
+    --input_data='gs://unresolved_mcf/country/denmark/input_files/population_quarterly_region_time_marital_status_input.csv' \
+    --pv_map='population_quartely_region_time_marital_status_pvmap.csv' \
+    --output_path='population_quartely_region_time_marital_status_output' \
+    --config_file='denmark_demographics_metadata.csv' \
+    --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf
+```
+**For Data Run (Annual Run)**
+```python ../../tools/statvar_importer/stat_var_processor.py \
+    --input_data='gs://unresolved_mcf/country/denmark/input_files/population_sex_age_time_input.csv' \
+    --pv_map='population_sex_age_time_pvmap.csv' \
+    --output_path='population_sex_age_time_output' \
+    --config_file='denmark_demographics_metadata.csv' \
+    --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf
+```
+
+This generates the following output files for the first time run:
+- output.csv
+- output_stat_vars_schema.mcf
+- output_stat_vars.mcf
+- output.tmcf
+
+## Data Quality Checks and Validation
+Validation is performed using the Data Commons import tool:
+
+```bash
+java -jar datacommons-import-tool-0.1-jar-with-dependencies.jar lint \
+    output_stat_vars_schema.mcf \
+    output.csv \
+    output.tmcf \
+    output_stat_vars.mcf  
+```
+
+The tool generates a `report.json`, `summary_report.csv`, and `summary_report.html` which can be used to identify errors or warnings in the generated data.
diff --git a/statvar_imports/denmark_demographics/denmark_demographics_metadata.csv b/statvar_imports/denmark_demographics/denmark_demographics_metadata.csv
@@ -0,0 +1,3 @@
+parameter,value
+output_columns,"observationDate,value,observationAbout,variableMeasured"
+dc_api_root,https://api.datacommons.org
diff --git a/..._imports/denmark_demographics/download_population_quarterly_region_time_marital_status.py b/..._imports/denmark_demographics/download_population_quarterly_region_time_marital_status.py
@@ -0,0 +1,69 @@
+import requests
+import pandas as pd
+import itertools
+import os
+
+# --- CONFIGURATION ---
+url = "https://api.statbank.dk/v1/data"
+output_dir = "./input_files/"
+table_id = "FOLK1A"
+
+if not os.path.exists(output_dir):
+    os.makedirs(output_dir)
+
+payload = {
+   "table": table_id,
+   "format": "JSONSTAT",
+   "lang": "en",
+   "variables": [
+      {"code": "OMRÅDE", "values": ["000"]},  # All of Denmark
+      {"code": "KØN", "values": ["*"]},
+      {"code": "ALDER", "values": ["*"]},
+      {"code": "CIVILSTAND", "values": ["*"]},
+      {"code": "Tid", "values": ["*"]} 
+   ]
+}
+
+def find_key_recursive(source_dict, target_key):
+    if target_key in source_dict: return source_dict[target_key]
+    for key, value in source_dict.items():
+        if isinstance(value, dict):
+            found = find_key_recursive(value, target_key)
+            if found is not None: return found
+    return None
+
+response = requests.post(url, json=payload)
+
+if response.status_code == 200:
+    full_data = response.json()
+    dims = find_key_recursive(full_data, 'dimension')
+    vals = find_key_recursive(full_data, 'value')
+
+    if dims and vals:
+        ids = find_key_recursive(full_data, 'id') or list(dims.keys())
+        role = find_key_recursive(full_data, 'role') or {}
+        metric_ids = role.get('metric', [])
+
+        dim_list = []
+        col_names = []
+
+        for d_id in ids:
+            if d_id in metric_ids or d_id.lower() in ['indhold', 'contents']: continue
+            labels = dims[d_id]['category']['label']
+            dim_list.append(list(labels.values()))
+            col_names.append(d_id)
+
+        # Build the DataFrame
+        df = pd.DataFrame(list(itertools.product(*dim_list)), columns=col_names)
+        df['Value'] = vals
+
+        # Renaming and Cleanup
+        df = df.rename(columns={'OMRÅDE': 'Region', 'ALDER': 'Age', 'CIVILSTAND': 'Marital_Status', 'Tid': 'Quarter', 'KØN': 'Sex'})
+        df.loc[df['Sex'] == 'Total', 'Sex'] = 'Gender_Total'
+        df.loc[df['Marital_Status'] == 'Total', 'Marital_Status'] = 'Marital_Total'
+
+        filename = f'population_quarterly_region_time_marital_status_input.csv'
+        df.to_csv(os.path.join(output_dir, filename), index=False)
+        print(f"Done! Saved {len(df)} rows to {filename}")
+else:
+    print(f"Error: {response.status_code} - {response.text}")
diff --git a/statvar_imports/denmark_demographics/download_population_sex_age_time.py b/statvar_imports/denmark_demographics/download_population_sex_age_time.py
@@ -0,0 +1,76 @@
+import requests
+import pandas as pd
+import os
+from io import StringIO
+import re
+
+# --- CONFIGURATION ---
+url = "https://api.statbank.dk/v1/data"
+output_dir = "./input_files/"
+table_id = "BEFOLK2"
+
+if not os.path.exists(output_dir):
+    os.makedirs(output_dir)
+
+# --- FETCH DATA ---
+payload = {
+   "table": table_id,
+   "format": "BULK",
+   "lang": "en",
+   "variables": [
+      {"code": "KØN", "values": ["*"]},
+      {"code": "ALDER", "values": ["*"]},
+      {"code": "Tid", "values": ["*"]}
+   ]
+}
+
+response = requests.post(url, json=payload)
+
+if response.status_code == 200:
+    df = pd.read_csv(StringIO(response.text), sep=';')
+    sex_col, age_col, time_col, val_col = df.columns
+
+    # 1. DYNAMIC SEX SORTING (Total -> Men -> Women)
+    # We look for "Total" dynamically, then assume the rest are Men/Women
+    sex_order = sorted(df[sex_col].unique(), key=lambda x: 0 if 'total' in str(x).lower() else 1)
+    # If the API returns Men/Women, this ensures 'Total' is index 0
+    df[sex_col] = pd.Categorical(df[sex_col], categories=sex_order, ordered=True)
+
+    # 2. DYNAMIC AGE SORTING (Age, total -> 0-4 -> 5-9...)
+    def get_age_rank(age_str):
+        age_str = str(age_str).lower()
+        if 'total' in age_str:
+            return -1
+        nums = re.findall(r'\d+', age_str)
+        return int(nums[0]) if nums else 999
+
+    # Create a temporary sort key
+    df['age_sort'] = df[age_col].apply(get_age_rank)
+
+    # 3. DYNAMIC YEAR SORTING
+    # Ensure years are integers so 1901 comes before 2026
+    df[time_col] = df[time_col].apply(lambda x: int(re.search(r'\d+', str(x)).group()))
+
+    # Sort the dataframe before pivoting
+    df = df.sort_values([sex_col, 'age_sort', time_col])
+
+    # 4. PIVOT
+    # We drop the age_sort key during pivot to keep the output clean
+    df_pivot = df.pivot_table(
+        index=[sex_col, age_col],
+        columns=time_col,
+        values=val_col,
+        aggfunc='first',
+        sort=False # CRITICAL: Keeps our manual sort order
+    ).reset_index()
+    df_pivot = df_pivot.rename(columns={'ALDER': 'Age', 'KØN': 'Sex'})
+
+    # --- SAVE ---
+    filename = "population_sex_age_time_input.csv"
+    save_path = os.path.join(output_dir, filename)
+    df_pivot.to_csv(save_path, index=False, encoding='utf-8-sig')
+
+    print(f"File saved successfully: {save_path}")
+
+else:
+    print(f"Request failed: {response.status_code}")
diff --git a/statvar_imports/denmark_demographics/manifest.json b/statvar_imports/denmark_demographics/manifest.json
@@ -0,0 +1,34 @@
+{
+    "import_specifications": [
+        {
+            "import_name": "Denmark_Demographics",
+            "curator_emails": [
+                "support@datacommons.org"
+            ],
+            "provenance_url": "https://www.statbank.dk/statbank5a/default.asp?w=1280",
+            "provenance_description": "Population data for Denmark from Statbank",
+            "scripts": [
+                "download_population_quarterly_region_time_marital_status.py",
+                "download_population_sex_age_time.py",
+                "../../tools/statvar_importer/stat_var_processor.py --input_data=./input_files/population_quarterly_region_time_marital_status_input.csv --pv_map=./population_quarterly_region_time_marital_status_pvmap.csv --config_file=./denmark_demographics_metadata.csv --output_path=./output/population_quarterly_region_time_marital_status_output --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf",
+                "../../tools/statvar_importer/stat_var_processor.py --input_data=./input_files/population_sex_age_time_input.csv --pv_map=./population_sex_age_time_pvmap.csv --config_file=./denmark_demographics_metadata.csv --output_path=./output/population_sex_age_time_output --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf"
+            ],
+            "import_inputs": [
+                {
+                    "template_mcf": "output/population_sex_age_time_output.tmcf",
+                    "cleaned_csv": "output/*_output.csv"
+                }
+            ],
+            "source_files": [
+                "./input_files/*.csv"
+            ],
+            "user_script_timeout": 36000,
+            "cron_schedule": "0 10 20 2,5,8,11 *",
+            "resource_limits": {
+                "cpu": 8,
+                "memory": 32,
+                "disk": 100
+            }
+        }
+    ]
+}