Skip to content

Commit 232752d

Browse files
authored
Merge pull request #20 from NCAR/fix_repo
Fix repo
2 parents 308231b + a6c7e3c commit 232752d

File tree

6 files changed

+47
-135
lines changed

6 files changed

+47
-135
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# OSDF-Examples
22

33
## About
4-
Contains Jupyter notebook workflows which access climate data from various [OSDF](https://osg-htc.org/services/osdf.html) origins using [PelicanFS](https://github.com/PelicanPlatform/pelicanfs). The examples include workflows where computations are executed on various platforms like NCAR's Casper, TACC's Stampede3 and cloud computing platforms like Jetstream2. If you use any workflow in this repository, please cite using [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.16863133.svg)](https://doi.org/10.5281/zenodo.16863133)
4+
This repository contains Jupyter notebook workflows which access climate data from various [OSDF](https://osg-htc.org/services/osdf.html) origins using [PelicanFS](https://github.com/PelicanPlatform/pelicanfs). The examples include workflows where computations are executed on various platforms like NCAR's Casper, TACC's Stampede3 and cloud computing platforms like Jetstream2. If you use any workflow in this repository, please cite using [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.16863133.svg)](https://doi.org/10.5281/zenodo.16863133)
55

66

77
## Example Workflows

docs/computational_platforms.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
---
2+
title: Other Computational Platforms
3+
date: 2026-1-28
4+
author: Harsha R. Hampapura
5+
---
6+
7+
8+
## Overview
9+
- This section contains notebooks that are executed on other computational platforms like TACC's Stampede3, Indiana University's Jetstream2, the Open Science Pool (OSPool) etc, demonstraing the capability of OSDF and the PelicanFS software to deliver data to compute.

docs/gdex_workflows.md

Lines changed: 8 additions & 120 deletions
Original file line numberDiff line numberDiff line change
@@ -1,127 +1,15 @@
11
---
2-
title: Reanalysis
3-
date: 2025-12-05
4-
author: Chia-Wei Hsu
2+
title: GDEX workflows
3+
date: 2026-1-28
4+
author: Harsha R. Hampapura
55
---
66

77

8-
## Overview
8+
## Overview - Geoscience Data Exchange (GDEX)
9+
- This section contains notebooks that are executed on NCAR's HPC system Casper. It has been further subdivided into 3 sections:
10+
- NCAR Data Origin - Notebooks that illustrate data streaming from NCAR's OSDF origin
11+
- Other Data Origins - Notebooks that illustrate data streaming from other OSDF origins like the AWS Open Data origin.
12+
- ML Workflows - Machine Learning Workflows
913

10-
Reanalysis products are comprehensive datasets that combine historical observations with models simulation to create a consistent, spatially and temporally complete representation of the Earth's past climate and weather. They are one of the most valuable resources in Earth system science for understanding climate variability, trends, and dynamics.
1114

12-
## Reanalysis vs. Analysis vs. Simulations vs. Observations
13-
14-
Understanding the differences between these fundamental data types is crucial for Earth system science research.
15-
16-
:::{important} Reanalysis vs. Analysis 👈
17-
:class: dropdown
18-
**Analysis** refers to the real-time, operational data assimilation product produced by weather forecasting centers:
19-
- Uses the **current** version of the forecast model and assimilation system
20-
- The model and assimilation methods **change over time** as improvements are made
21-
- Optimized for short-term weather forecasting
22-
- Creates discontinuities when the system is upgraded
23-
24-
**Reanalysis** retrospectively processes historical observations:
25-
- Uses a **fixed** model and assimilation system for the entire time period
26-
- Ensures temporal **consistency** and homogeneity
27-
- Not updated in real-time; produced in multi-year projects
28-
- Better suited for climate studies and long-term trend analysis
29-
- More computationally expensive due to reprocessing decades of data
30-
31-
**Key Distinction**: Analysis prioritizes current forecast skill; reanalysis prioritizes long-term consistency.
32-
:::
33-
34-
:::{important} Reanalysis vs. Model Simulations 👈
35-
:class: dropdown
36-
**Model Simulations** (also called free-running simulations or climate projections):
37-
- Run forward in time without observational constraints
38-
- Initial conditions may come from observations, but the model evolves freely
39-
- Used for future climate projections and sensitivity experiments
40-
- Can drift from observed climate due to model biases
41-
- Examples: CMIP6 models, CESM simulations
42-
43-
**Reanalysis**:
44-
- **Continuously constrained** by observations through data assimilation
45-
- Cannot drift far from observed atmospheric state
46-
- Represents the "best estimate" of what actually happened
47-
- Limited to historical periods where observations exist
48-
- Blends model physics with observational evidence
49-
50-
**Key Distinction**: Simulations show what the model thinks should happen; reanalysis shows what actually happened (constrained by observations).
51-
:::
52-
53-
54-
:::{important} Reanalysis vs. Observations 👈
55-
:class: dropdown
56-
**Direct Observations** (in-situ and remote sensing):
57-
- Actual measurements from instruments
58-
- **Highest accuracy** at measurement location and time
59-
- Spatially and temporally **incomplete** (gaps between stations, satellite swaths)
60-
- Different instruments have different biases and uncertainties
61-
- No information about unobserved variables or locations
62-
- Examples: Weather station data, satellite retrievals, radiosonde profiles
63-
64-
**Reanalysis**:
65-
- **Gridded, gap-filled** product combining observations with model physics
66-
- Provides estimates even where/when no observations exist
67-
- Spatially and temporally **complete**
68-
- Includes hundreds of variables, many not directly observed
69-
- **Less accurate** than direct observations at observed locations
70-
- Smooths out small-scale features
71-
- Subject to both observational and model uncertainties
72-
73-
**Key Distinction**: Observations provide ground truth but are incomplete; reanalysis provides complete coverage but is less accurate than direct observations.
74-
:::
75-
76-
77-
78-
## How Reanalysis Works
79-
80-
### Data Assimilation Process
81-
82-
Reanalysis uses a process called **data assimilation** to blend:
83-
84-
1. **Observational Data**: Including satellite measurements, weather stations, radiosondes (weather balloons), ship and buoy observations, and aircraft reports
85-
2. **Numerical Models**: Physics-based models that simulate atmospheric, oceanic, and land surface processes
86-
87-
The data assimilation system statistically combines these elements, weighing observations and model predictions based on their respective uncertainties to produce the "best estimate" of the atmospheric state at any given time.
88-
89-
### Key Characteristics
90-
91-
- **Temporal Consistency**: Uses a fixed, modern assimilation system and model throughout the entire reanalysis period.
92-
- **Spatial Completeness**: Fills gaps where observations are sparse or unavailable using model physics
93-
- **Regular Grid**: Produces data on uniform spatial and temporal grids, making it easier to analyze
94-
- **Multiple Variables**: Provides hundreds of atmospheric, oceanic, and land variables that are physically consistent with each other
95-
96-
97-
98-
## Applications in Earth System Science
99-
100-
Reanalysis products are used for:
101-
102-
1. **Climate Monitoring**: Tracking long-term temperature, precipitation, and circulation patterns
103-
2. **Extreme Event Analysis**: Studying hurricanes, droughts, heat waves, and other extreme weather
104-
3. **Model Validation**: Evaluating climate and weather models against a consistent reference
105-
4. **Forcing Data**: Driving regional models, hydrological models, and impact assessments
106-
5. **Process Studies**: Understanding physical mechanisms and teleconnections
107-
6. **Trend Analysis**: Identifying climate change signals and natural variability
108-
7. **Deep Learning/Machine Learning**: Training and validating AI-based weather and climate models
109-
110-
111-
112-
## Best Practices
113-
114-
When using reanalysis data:
115-
116-
1. **Choose the Right Product**: Consider temporal coverage, spatial resolution, and which variables are best represented
117-
2. **Validate Carefully**: Compare with independent observations when possible
118-
3. **Use Multiple Products**: Cross-validate findings across different reanalysis systems
119-
4. **Understand Uncertainty**: Be aware of which variables are observation-constrained vs. model-derived
120-
5. **Check Documentation**: Review known issues and dataset updates from producing centers
121-
122-
## Resources
123-
124-
Major reanalysis centers provide extensive documentation:
125-
- [ECMWF ERA5 Documentation](https://confluence.ecmwf.int/display/CKB/ERA5)
126-
- [JMA JRA-3Q Documentation](https://jra.kishou.go.jp/)
12715

docs/introduction.md

Lines changed: 15 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,33 @@
11
---
22
title: Introduction
3+
author: Harsha R. Hampapura
34
date: 2026-1-28
45
---
56

67
# Introduction
78

8-
Welcome to the **OSDF Examples** documentation! This repository provides example notebooks and scripts that demonstrate how to access data from NCAR's GDEX for cool geoscience applications and visualizations.
9+
Welcome to the **OSDF Examples** repository! This repository provides example notebooks and scripts that demonstrate how to access data from via the Open Science Data Federation([OSDF](https://osg-htc.org/services/osdf.html)) using the using [PelicanFS](https://github.com/PelicanPlatform/pelicanfs) software. All the notebooks demonstrate how to stream geoscience data into your workflows and perform an interesting calculation/visualization. If you wish to learn more about OSDF and/or Pelican, please refer to the [OSDF cookbook](https://projectpythia.org/osdf-cookbook/).
910

1011
:::{warning} Important Notice
11-
This jupyter book is under active development and is intended primarily for use by users who have access to NCAR's HPC resources (**NCAR HPC users**). In all the examples, we assume that you have access to NCAR's jupyterhub. If you are an external user trying to stream data from NCAR's GDEX into your workflows, please see : [osdf_examples](https://ncar.github.io/osdf_examples/) (**OSDF users**)
12+
This jupyter book is under active development. You will need to set up your python enviroment using the requirements.txt file before running any of the notebooks. Please open an issue on the associated github repository to report any bugs or suggest improvements.
1213
:::
1314

1415
## How is the repository organized?
1516

16-
This repository is organized by into various sections based on the type of dataset used:
17+
This repository is organized by into various sections (mostly) based on the data origins from which the data is accessed and the computational platforms used to execute the notebooks.
1718

18-
- **Observations** (satellite, in-situ measurements)
19-
- **Analysis** (NCEP GFS analyses)
20-
- **Reanalysis** (ERA5, JRA-3Q, etc.)
21-
- **Model output/Simulations** (CESM, CMIP, etc.)
22-
- **Data Fusion** (Combination of two or more datasets)
23-
- **Contribution** (A guide on contributions)
19+
- **NCAR GDEX workflows** (Workflows that are executed on NCAR's HPC system Casper)
20+
- NCAR Data Origin (Illustrate data streaming from NCAR's OSDF origin)
21+
- Other Data Origins (Illustrate data streaming from other OSDF origins like AWS etc)
22+
- ML Workflows (Machine Learning Workflows)
23+
- **Other Computational Platforms** (Workflows that are executed other HPC, cloud computing platforms)
24+
- **NDC Workflows** (Workflows that were developed as a part of the National Discovery Cloud (NDC) initiative )
25+
- **Scripts** (Python scripts and any content that is not a jupyter notebook)
2426

25-
### 🌐 Multiple Access Methods
26-
Some of these notebooks use intake/intake-ESM catalogs that support multiple access patterns :
27-
- **POSIX**: Direct filesystem access for NCAR HPC users
28-
- **HTTPS**: Web-based access for remote users
27+
### 🌐 Access Methods
28+
Some of these notebooks use intake/intake-ESM catalogs in conjunction with the PelicanFS software to stream data. Others directly use the PelicanFS software to load data into xarray.
2929

3030
## Repository Structure
3131
- **`docs/`** - Introductory markdown files for each section
32-
- **`examples/`** - Python script examples for generating dataset catalog
32+
- **`notebooks/`** - All computational workflows/examples archived as jupyter notebooks
33+
- **`scripts/`** - Python scripts and any content that is not a jupyter notebook

docs/ndc_workflows.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
---
2+
title: NDC workflows
3+
author: Harsha R. Hampapura
4+
date: 2026-1-28
5+
---
6+
7+
# NDC workflows
8+
9+
This section contains workflows that were developed as a part of the National Discovery Cloud's (NDC) [pathfinder](https://ndc-pathfinders.org/) initiative. Most of these notebooks can be run on your laptop orpersonal device without the need for access to an HPC system.
10+

myst.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,12 @@ project:
1717
# To autogenerate a Table of Contents, run "jupyter book init --write-toc"
1818
toc:
1919
# Auto-generated by `myst init --write-toc`
20+
- file: docs/introduction.md
2021
- file: README.md
2122

2223
- title: NCAR GDEX workflows
2324
children:
25+
- file: docs/gdex_workflows.md
2426
- title: NCAR data origin
2527
children:
2628
- file: notebooks/cesm_bias.ipynb
@@ -50,13 +52,15 @@ project:
5052

5153
- title: Other computational Platforms
5254
children:
55+
- file: docs/computational_platforms.md
5356
- file: notebooks/cesm_osdf_stampede3.ipynb
5457
- file: notebooks/ndc_workflows/ncar_benchmark.ipynb
5558
- file: notebooks/ndc_workflows/ncar_benchmark_simple.ipynb
5659
- file: notebooks/ndc_workflows/ncar_benchmark_ap40.ipynb
5760

5861
- title: NDC Workflows
5962
children:
63+
- file: docs/ndc_workflows.md
6064
- file: notebooks/ndc_workflows/aws_benchmark.ipynb
6165
- file: notebooks/ndc_workflows/envistor_test_ap40.ipynb
6266
- file: notebooks/ndc_workflows/pycogss_spectral_change.ipynb

0 commit comments

Comments
 (0)