The Molecular Observation Network (https://www.emsl.pnnl.gov/monet) is an ongoing project developed by the Environmental Molecular Sciences Laboratory at Pacific Northwest National Laboratory, with aims to create a continential scale database of standardized soil molecular properties to advance the understanding of soil biogeochemistry. The MONet Data Synthesis Project aims to systematically compare and integrate soil data from MONet with established publicly available datasets, such as SoilGrids and the Soil Respiration Database (SRDB). This repository contains scripts to analyze and compare key soil properties (e.g., soil respiration, pH, and clay content) across known drivers of variability, with an emphasis on addressing spatial scale differences between global datasets and national-scale efforts like MONet.
- Understand Soil Properties Relative to Environmental Drivers:
- Explore soil respiration variations with mean annual precipitation (MAP) and mean annual temperature (MAT).
- Assess the spatial distribution and variability in pH and clay content at national and sub-national levels.
- Compare trends across subsamples of larger datasets (e.g., SoilGrids) with regional/national observations (e.g., MONet).
- Enhance Data Accessibility:
- Provide processed data, visualizations, and detailed scripts to the soil science, biogeochemistry, and environmental science communities.
| Folder | Description |
|---|---|
data/ |
Directory for storing raw and publicly available datasets. |
R_data/ |
Directory for saving intermediate processed data |
figures/ |
Directory for saving plots and visual outputs from the analyses. |
MONetSynthesis.Rmd |
R markdown file containing analysis of MONet data product. |
MONetDataPreprocessing.R |
R script for downloading and preprocessing data |
README.md |
Overview of the repository, datasets, and instructions for reproducing the analyses. |
This project integrates multiple high-quality datasets. A summary of their sources and attributes is given below:
| Dataset | Description | Source |
|---|---|---|
| MONet Soil Respiration | Processed soil respiration data collected in the United States. | MONet Soil Respiration - Zenodo |
| MONet Clay and pH | Clay content and pH of X number of soil samples in the United States | MONet |
| SoilGrids | Global gridded clay content and pH at 10km resolution. | SoilGrids |
| SRDB | Soil respiration observations aggregated from published journal articles globally. | SRDB GitHub Repository |
| Global Gridded 1-km Rh | Soil heterotrophic respiration upsacled globally at 1-km reslution. | NASA Earthdata |
| GCAM Basin Clay Content | Clay content and summary statistics of GCAM's 232 basins | GCAM |
| GCAM Basin Shapefiles | Spatial boundaries of GCAM's 232 basins at 0.5 arc mins resolution in crs: EPSG:4326 WGS84 - World Geodetic System 1984 | GCAM basin boundaries from moirai |
| Climate Zones | Koppen-Geiger classification shapefile used for grouping comparisons by climatic similarities. | North American Climate Atlas |
| USA Shapefiles | Spatial reference shapefiles for delineating site boundaries in the continental U.S. | NOAA GIS US States Shapefiles |
-
Download Raw Data:
- Clone this repository and navigate to the
data/folder. - Download the MONet data, Climate Zones, and USA shapefiles from the sources provided above and store them in the organized subdirectories.
- For SoilGrids and SRDB, run the get_SGdata.R and get_SRDBdata.R
- Clone this repository and navigate to the
-
Install Required R Packages:
- Install all necessary R libraries used in this analysis
-
Run Data Processing Scripts
- Open and run provided R script for data download and processing
(
MONetDataProcessing.R) - This script will take some time to run and saves a Rdata file to the directory
- Open and run provided R script for data download and processing
(
-
Run Analysis Scripts:
- Open the provided R markdown (
MONetSynthesis.Rmd) script. - Run the script to:
- Conduct exploratory analyses comparing respiration trends, MAP/MAT relationships, etc.
- Perform spatial and statistical comparisons on pH and clay content.
- Open the provided R markdown (
Below is the file structure that should exist after downloading required
data and running MONetDataProcessing.R
├── MONetSynthesis.Rproj
├── Morris-Wiens_MONetSynthesis.Rmd
├── R/
│ └── MONetDataPreprocessing.R
├── data/
│ ├── GCAM/
│ │ └── mapped_clay_KN.csv
│ ├── MONet/
│ │ ├── 1000S_processed_L2_summary.csv
│ │ ├── 1000Soils_Metadata_Site_Mastersheet_v1.csv
│ │ ├── pH/
│ │ │ └── processed_data/
│ │ │ ├── Column_Descriptions.xlsx
│ │ │ ├── Coordinates.csv
│ │ │ └── Soil_BioChemical_properties.csv
│ │ └── clay/
│ │ └── processed_data/
│ │ ├── Column_Descroptions.xlsx
│ │ ├── Coordinates.csv
│ │ └── Soil_BioChemical_properties.csv
│ ├── shapefiles/
│ │ ├── gcam_boundaries_moirai_3p1_0p5arcmin_wgs84/
│ │ │ └── main_outputs/
│ │ │ └── glu_boundaries_moirai_landcells_3p1_0p5arcmin.shp
│ │ ├── s_18_mr25/
│ │ │ └── s_18mr25.shp
│ │ └── na_climatezones_shapefile/
│ │ └── climatezones_shapefile/
│ │ └── NA_ClimateZones/
│ │ └── data/
│ │ └── NorthAmerica_Climate_Zones.shp
│ ├── soilgrids*/
│ │ ├── crop_roi_igh_clay_0-5cm.tif
│ │ ├── crop_roi_igh_clay_15-30cm.tif
│ │ ├── crop_roi_igh_ph_0-5cm.tif
│ │ └── crop_roi_igh_ph_15-30cm.tif
│ ├── SoilResp_HeterotrophicResp_1928/
│ │ └── data/
│ │ └── soil_Rh_mean.tif
│ ├── srdb*/
│ │ └── srdb-20250503a/
│ │ └── srdb-data.csv
│ └── worldclim_data*/
│ └── climate/
│ └── wc2.1_10m/
│ └── wc2.1_10m_prec_xx.tif
├── R_data*/
│ ├── processed_Rs.RData
│ ├── processed_pH.RData
│ └── processed_clay.RData
├── figures
└── README.md*These folders contain data that is generated in
MONetDataPreprocessing.R
The analysis pipeline includes:
-
Soil Respiration Analysis:
- Relating soil carbon flux (respiration) to MAP and MAT across the U.S.
- Identifying climatic influences on observed trends using regression.
-
pH and Clay Content Comparison:
- Cross-referencing MONet-observed properties with SoilGrids' datasets spatially and within climatic zones.
-
Data Visualization:
- Generating maps of sample coverage and plots comparing different datasets.