Skip to content

Converting new nc into grib format to maintain backwards compatibility with old CBP routine #140

@rburghol

Description

@rburghol

Overview/Tasks

  • Map out nc to grib translation for NLDAS2
    • Confirm mapping with Gopal
  • Determine needed inputs for Gopal's routine
  • Check if NLDAS2 has been updated to include netcdf download from 1979-2023?
    • Yes. It appears that all data has been reformatted to the netCDF in the archive
  • Test routine to harvest layers from nc and set them into geotiff
    • Get the appropriate spatial reference system. Is srid 4087 correct? Can simply setting it make it work out?
    • Add new variable to allow easier testing comparisons? insert into dh_variabledefinition (varkey, varcode, varname) values ('nldas2_precip_netcdf', 'nldas2_precip_netcdf', 'nldas2_precip_netcdf');
    • Do download of nldas2-B in case that is the difference?
  • convert geotiff to grb
  • Add script to convert full dataset to grib file into work flow
    • Make optional setting? No. By making this a mandatory step,grid simply becomes the "source" format for the work flow steps in process.
    • Put into download step (or process?)
      • This should be in the process step, but, to avoid redundant conversions since it is the most time consuming processing step, should have a switch to decide whether or not to re-do if the grb files already exist.
    • deploy in work flow (see bout time addition, testing single raster seemed very low time of processing)
  • Handle all data needing translation
    • Re-download 2020-2025 to get recent retroactive data update
    • Run 01_translate_nc

Scripts

Run Translate Workflow Step for Range of Years

scenario=nldas2 
for year in 2020 2021 2022 2023 2024 2025; do
  sday=1
  ndays=`date -d "${year}-12-31" '+%j'`
  # or, if in the middle of a year
  #ndays=`date '+%j'`
  ndays=$((ndays - 1))
  for i in `seq $sday $ndays`; do
    job_name=`cbp mm_job_name $scenario $i`
    mdate=`date -d "${year}-01-01 $i days" '+%Y-%m-%d'`
    echo "sbatch --job-name=$job_name run_model raster_met $scenario "$mdate" auto met process 01_translate_nc"
    sbatch --job-name=$job_name run_model raster_met $scenario "$mdate" auto met process 01_translate_nc
  done
done

Development

SQL, Spatial Reference and R Tools

Proposed workflows

  • Document formats:
    • use h5ls to get a list of layer names
    • Export to a tiff -- see code in meta_model for example
    • We use tiff instead of grb, since grb can be created ,but not appended, whereas tiff can be appended by gdal
      • Considered multiple tiffs with gdal_merge.py to put them together but not needed
  • Workflow Steps:
    • Create an empty based on Grib template (use old file version)
    • Set projection SRID on source file

Code: gdal_create + gdal_warp

  • From gdal docs:
gdal_create -if in_red.tif -bands 3 out_rgb.tif
gdalwarp in_red.tif out_rgb.tif -srcband 1 -dstband 1
gdalwarp in_green.tif out_rgb.tif -srcband 1 -dstband 2
gdalwarp in_blue.tif out_rgb.tif -srcband 1 -dstband 3
  • Test a single file
    • on deq3 since it has new gdalwarp which can update individual bands
    • Uses geotiff format as the grid format is not editable, only creatable.
  • First: set the file name, the remainder of code is generic
#infile="NLDAS_FORA0125_H.A19890101.0100.020.nc"
infile="NLDAS_FORA0125_H.A20250101.0100.020.nc"
out_type='grb'
  • Now do the conversion and import routine:
# out_temp is the template for output format
out_temp="NLDAS_FORA0125_H.A20230101.0100.002.grb"
maskExtent="/opt/model/model_meteorology/data/cbp_extent.csv"
out_file="test_cbp."$out_type  # final file to import
lyr_file="test_cbp.tiff" # intermediate file - must have this  to accept layer setting, then convert
raster_sql="test_cbp.sql"
rm $out_file
rm $lyr_file 
gdal_create -if $out_temp -bands 11 $lyr_file
#gdaltransform -t_srs EPSG:4087 $outtif 
# clip 
# Note: -srcband is always 1, since we've extracted the band already by specifying the band name like ":Tair"
bnd=0
for i in Tair Qair PSurf Wind_E Wind_N LWdown CRainf_frac CAPE PotEvap Rainf SWdown ;  do
#for i in Rainf;  do
   bnd=$((bnd + 1))
   gdalwarp NETCDF:"$infile":$i -t_srs EPSG:4326 $lyr_file -srcband 1 -dstband $bnd
done
gdalwarp -of grib -t_srs EPSG:4326 $lyr_file $out_file

# now test
raster2pgsql $out_file >$raster_sql
echo "drop table test_cbp;" | psql -h dbase2 drupal.dh03
cat $raster_sql |psql -h dbase2 drupal.dh03

  • TO test, log on to database and set startup vars:
\pset pager 0
\set band 10
\set yr 2025
\set mo 1
\set da 1
\set hr 1
-- :ttid = the timestamp and tid for the matching day
select tsendtime as ttime from dh_timeseries_weather_varkey  where extract(year from to_timestamp(tsendtime)) = :yr and  extract(month from to_timestamp(tsendtime)) = :mo and  extract(day from to_timestamp(tsendtime)) = :da and  extract(hour from to_timestamp(tsendtime)) = :hr limit 1 \gset

  • The temp file import for a single hour:
select (stats).* from (
  select st_summarystats(rast,:band,TRUE) as stats
  from (
    select st_clip(a.rast, b.dh_geofield_geom) as rast
    from test_cbp as a left outer join dh_feature_fielded as b
    on (
      b.hydrocode = 'cbp6_met_coverage'
    )
  ) as bar
) as foo;

  • Example Result:
 count |        sum         |        mean        |       stddev       | min |        max
-------+--------------------+--------------------+--------------------+-----+-------------------
  4744 | 2132.3541013153736 | 0.4494844227056015 | 0.5993693764500673 |   0 | 3.254499912261963
(1 row)
  • summarize the existing rasters
  • Note: imports as UTC, so, EST day is 5 hours earlier hence :ttime - 3600 * 5
select tsendtime, (stats).* from (
  select to_timestamp(tsendtime) as tsendtime, st_summarystats(rast,1,TRUE) as stats
  from (
    select  tstime, tsendtime, count(*), st_union(rast) as rast
    from  dh_timeseries_weather_varkey
    where varkey = 'nldas2_precip_hourly_tiled_16x16' 
    and tsendtime = (:ttime - 3600*5)
    group by tstime, tsendtime
  ) as foo
);
  • Result for 2025-01-01 import (imports as UTC, so, EST day is 5 hours earlier
 
       tsendtime        | count |        sum        |        mean         |       stddev       | min |        max     
------------------------+-------+-------------------+---------------------+--------------------+-----+-------------------
 2024-12-31 20:00:00-05 |  4632 | 2094.369601050159 | 0.45215233183293585 | 0.6014965362747882 |   0 | 3.254499912261963
(1 row)
Unsure of ate Range? Show a bunch
select tsendtime, (stats).* from (
  select to_timestamp(tsendtime) as tsendtime, st_summarystats(rast,1,TRUE) as stats
  from (
    select  tstime, tsendtime, count(*), st_union(rast) as rast
    from  dh_timeseries_weather_varkey
    where varkey = 'nldas2_precip_hourly_tiled_16x16' 
    and tsendtime >= (:ttime - 3600*15)
    and tsendtime <= (:ttime + 3600*15)
    group by tstime, tsendtime
  ) as foo
);
Try a single Land Segment
  • Note: To find land segment with non-zero rainfall on the date of interest, See below: "Find Landsegs with Rainfall"
  • Show the manual import:
\set landseg 'A51027'
select (stats).* from (
  select st_summarystats(rast,:band,TRUE) as stats
  from (
    select st_clip(a.rast, b.dh_geofield_geom) as rast
    from test_cbp as a left outer join dh_feature_fielded as b
    on (
      b.hydrocode = :'landseg' and b.ftype = 'cbp532_landseg'
    )
  ) as bar
) as foo;
  • show the dh_timeseries_weather raster summary for the single hour
select (stats).* from (
  select st_summarystats(rast,1,TRUE) as stats
  from (
    select st_clip(a.rast, b.dh_geofield_geom) as rast
    from (
      select  tstime, tsendtime, count(*), st_union(rast) as rast
      from  dh_timeseries_weather_varkey
      where varkey = 'nldas2_precip_hourly_tiled_16x16'
        and tsendtime = (:ttime - 3600*5)
      group by tstime, tsendtime
    ) as a left outer join dh_feature_fielded as b
    on (
      b.hydrocode = :'landseg' and b.ftype = 'cbp532_landseg'
    )
  ) as bar
) as foo;
  • Show the dh_timeseries_weather rasters for a range including the date of interest
select (stats).* from (
  select st_summarystats(rast,1,TRUE) as stats
  from (
    select st_clip(a.rast, b.dh_geofield_geom) as rast
    from (
      select  tstime, tsendtime, count(*), st_union(rast) as rast
      from  dh_timeseries_weather_varkey
      where varkey = 'nldas2_precip_hourly_tiled_16x16'
        and tsendtime >= (:ttime - 3600*15)
        and tsendtime <= (:ttime + 3600*15)
      group by tstime, tsendtime
    ) as a left outer join dh_feature_fielded as b
    on (
      b.hydrocode = :'landseg' and b.ftype = 'cbp532_landseg'
    )
  ) as bar
) as foo;

Results single test raster
 count |  sum  |  mean  | stddev | min |  max
-------+-------+--------+--------+-----+--------
    10 | 0.125 | 0.0125 |  0.025 |   0 | 0.0625
(1 row)
dh_timeseries_weather imported raster
 count |         sum         |         mean         |        stddev        | min |         max
-------+---------------------+----------------------+----------------------+-----+---------------------
    10 | 0.19900000002235174 | 0.019900000002235175 | 0.029516944293705646 |   0 | 0.07890000194311142
(1 row)



Find Landsegs with Rainfall
select (stats).* from (
  select st_summarystats(rast,1,TRUE) as stats
  from (
    select st_clip(a.rast, b.dh_geofield_geom) as rast
    from (
      select  tstime, tsendtime, count(*), st_union(rast) as rast
      from  dh_timeseries_weather_varkey
      where varkey = 'nldas2_precip_hourly_tiled_16x16'
        and tsendtime = (:ttime - 3600*5)
      group by tstime, tsendtime
    ) as a left outer join dh_feature_fielded as b
    on (
      b.hydrocode = :'landseg' and b.ftype = 'cbp532_landseg'
    )
  ) as bar
) as foo;

Miscellaneous

Use gdalinfo to get layer names.

Names from netCDF file format (2024 forward).
gdalinfo /backup/meteorology/2025/001/NLDAS_FORA0125_H.A20250101.0200.020.nc|grep _NAME
Warning 1: No UNIDATA NC_GLOBAL:Conventions attribute
  SUBDATASET_1_NAME=NETCDF:"/backup/meteorology/2025/001/NLDAS_FORA0125_H.A20250101.0200.020.nc":time_bnds
  SUBDATASET_2_NAME=NETCDF:"/backup/meteorology/2025/001/NLDAS_FORA0125_H.A20250101.0200.020.nc":Tair
  SUBDATASET_3_NAME=NETCDF:"/backup/meteorology/2025/001/NLDAS_FORA0125_H.A20250101.0200.020.nc":Qair
  SUBDATASET_4_NAME=NETCDF:"/backup/meteorology/2025/001/NLDAS_FORA0125_H.A20250101.0200.020.nc":PSurf
  SUBDATASET_5_NAME=NETCDF:"/backup/meteorology/2025/001/NLDAS_FORA0125_H.A20250101.0200.020.nc":Wind_E
  SUBDATASET_6_NAME=NETCDF:"/backup/meteorology/2025/001/NLDAS_FORA0125_H.A20250101.0200.020.nc":Wind_N
  SUBDATASET_7_NAME=NETCDF:"/backup/meteorology/2025/001/NLDAS_FORA0125_H.A20250101.0200.020.nc":LWdown
  SUBDATASET_8_NAME=NETCDF:"/backup/meteorology/2025/001/NLDAS_FORA0125_H.A20250101.0200.020.nc":CRainf_frac
  SUBDATASET_9_NAME=NETCDF:"/backup/meteorology/2025/001/NLDAS_FORA0125_H.A20250101.0200.020.nc":CAPE
  SUBDATASET_10_NAME=NETCDF:"/backup/meteorology/2025/001/NLDAS_FORA0125_H.A20250101.0200.020.nc":PotEvap
  SUBDATASET_11_NAME=NETCDF:"/backup/meteorology/2025/001/NLDAS_FORA0125_H.A20250101.0200.020.nc":Rainf
  SUBDATASET_12_NAME=NETCDF:"/backup/meteorology/2025/001/NLDAS_FORA0125_H.A20250101.0200.020.nc":SWdown


Names from grib file format (early 2024 and previous).
rob@deq2:/opt/model/meta_model$ gdalinfo /backup/meteorology/2023/001/NLDAS_FORA0125_H.A20230101.0200.002.grb | grep GRIB_EL
    GRIB_ELEMENT=TMP
    GRIB_ELEMENT=SPFH
    GRIB_ELEMENT=PRES
    GRIB_ELEMENT=UGRD
    GRIB_ELEMENT=VGRD
    GRIB_ELEMENT=DLWRF
    GRIB_ELEMENT=var153
    GRIB_ELEMENT=CAPE
    GRIB_ELEMENT=PEVAP
    GRIB_ELEMENT=APCP
    GRIB_ELEMENT=DSWRF

Use h5ls to get information.

h5ls /backup/meteorology/2025/001/NLDAS_FORB0125_H.A20250101.2200.020.nc
ACond                    Dataset {1, 224, 464}
CRainf                   Dataset {1, 224, 464}
PSurf                    Dataset {1, 224, 464}
PhiS                     Dataset {1, 224, 464}
Qair                     Dataset {1, 224, 464}
Rainf                    Dataset {1, 224, 464}
SWdown                   Dataset {1, 224, 464}
Tair                     Dataset {1, 224, 464}
Wind_E                   Dataset {1, 224, 464}
Wind_N                   Dataset {1, 224, 464}
bnds                     Dataset {2}
lat                      Dataset {224}
lon                      Dataset {464}
time                     Dataset {1}
time_bnds                Dataset {1, 2}
root@deq2:~# gdal_merge.py
No input files selected.
Usage: gdal_merge.py [-o out_filename] [-of out_format] [-co NAME=VALUE]*
                     [-ps pixelsize_x pixelsize_y] [-tap] [-separate] [-q] [-v] [-pct]
                     [-ul_lr ulx uly lrx lry] [-init "value [value...]"]
                     [-n nodata_value] [-a_nodata output_nodata_value]
                     [-ot datatype] [-createonly] input_files
                     [--help-general]

Variables Used by @gopal-bhatt C++ code:

Indicate a match in the new NLDAS2 raster format .nc

  • 1 (yes): TMP Temperature [C] | 2[m]
  • 2 (yes): SPFH Specific humidity [kg/kg]
  • 4 (yes): u-component of wind
  • 11 (yes): SWdown Downward shortwave radiation flux [W/m^2]
  • 5 (yes): VGRD (wind speed v-component)
  • 3 (yes): PRES SFC (Ground or water surface) Pressure [Pa]

Name mapping:

  • New format is only 1 index slot off from old format
Grib Element Grib Number netCDF Name netCDF Number Grib Comment Grib Description
TMP 1 Tair 2 Temperature [C] 2[m] HTGL (Specified height level above ground)
SPFH 2 Qair 3 Specific humidity [kg/kg] 2[m] HTGL (Specified height level above ground)
PRES 3 PSurf 4 SFC (Ground or water surface) Pressure [Pa] 0[-] SFC (Ground or water surface)
UGRD 4 Wind_E 5 u-component of wind [m/s] 10[m] HTGL (Specified height level above ground)
VGRD 5 Wind_N 6 v-component of wind [m/s] 10[m] HTGL (Specified height level above ground)
DLWRF 6 LWdown 7 Downward longwave radiation flux [W/m^2] 0[-] SFC (Ground or water surface)
var153 7 CRainf_frac 8 Undefined 0[-] SFC (Ground or water surface)
CAPE 8 CAPE 9 Convective available potential energy [J/kg] 180-0[hPa] SPDY (Level between 2 levels at specified pressure difference from ground to level)
PEVAP 9 PotEvap 10 Potential evaporation [kg/m^2] 0[-] SFC (Ground or water surface)
APCP 10 Rainf 11 Total precipitation [kg/m^2] 0[-] SFC (Ground or water surface)
DSWRF 11 SWdown* 12 Downward shortwave radiation flux [W/m^2] 0[-] SFC (Ground or water surface)

Excerpt from

//DEBUG0 cout << "$#" << getRasterValue(hourGDAL, 10, grid[j].row, grid[j].col) << "\n";

 gridValue[0] = getRasterValue(hourGDAL, 10, 224 - grid[j].row, grid[j].col - 1);
			gridValue[1] = getRasterValue(hourGDAL, 1, 224 - grid[j].row, grid[j].col - 1);
			gridValue[2] = getRasterValue(hourGDAL, 2, 224 - grid[j].row, grid[j].col - 1);
			gridValue[3] = getRasterValue(hourGDAL, 4, 224 - grid[j].row, grid[j].col - 1);
			gridValue[4] = getRasterValue(hourGDAL, 11, 224 - grid[j].row, grid[j].col - 1);
			gridValue[5] = getRasterValue(hourGDAL, 5, 224 - grid[j].row, grid[j].col - 1);
			gridValue[6] = getRasterValue(hourGDAL, 3, 224 - grid[j].row, grid[j].col - 1);

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions