Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Model Variable Renaming Sprint changes in to GDASApp yamls and templates #1362

Open
RussTreadon-NOAA opened this issue Nov 5, 2024 · 55 comments · May be fixed by #1355
Open

Integrate Model Variable Renaming Sprint changes in to GDASApp yamls and templates #1362

RussTreadon-NOAA opened this issue Nov 5, 2024 · 55 comments · May be fixed by #1355

Comments

@RussTreadon-NOAA
Copy link
Contributor

Several JEDI repositories have been updated with changes from the Model Variable Renaming Sprint. Updating JEDI hashes in sorc/ requires changes in GDASApp and jcb-gdas yamls and templates. This issue is opened to document these changes.

@RussTreadon-NOAA
Copy link
Contributor Author

Started from g-w PR #2992 with sorc/gdas.cd populated with GDASApp PR #1346. Use ush/submodules/update_develop.sh to update the hashes for the following JEDI repos

        modified:   sorc/fv3-jedi (new commits)
        modified:   sorc/ioda (new commits)
        modified:   sorc/iodaconv (new commits)
        modified:   sorc/oops (new commits)
        modified:   sorc/saber (new commits)
        modified:   sorc/soca (new commits)
        modified:   sorc/ufo (new commits)
        modified:   sorc/vader (new commits)

Changes to yamls (templates) thus far include

parm/io/fv3jedi_fieldmetadata_history.yaml
parm/jcb-gdas/model/atmosphere/atmosphere_background.yaml.j2
parm/jcb-gdas/observations/atmosphere/sondes.yaml.j2

Using test_gdasapp_atm_jjob_var_init and test_gdasapp_atm_jjob_var_run to iteratively work through issues.

Puzzled by current failure in the variational analysis job

0: Variable 'virtual_temperature' calculated using Vader recipe AirVirtualTemperature_A
0: OOPS_TRACE[0] leaving Vader::executePlanNL
0: Requested variables Vader could not produce: 25 variables: water_area_fraction, land_area_fraction, ice_area_fraction, surface\
_snow_area_fraction, skin_temperature_at_surface_where_sea, skin_temperature_at_surface_where_land, skin_temperature_at_surface_where_ice, skin_temperature_at_surface_where_snow, vegetation_area_fraction, leaf_area_index, volume_fraction_of_condensed_water_in_soil, soil_temperature, surface_snow_thickness, vegetation_type_index, soil_type, water_vapor_mixing_ratio_wrt_dry_air, mole_fraction_of_ozone_in_air, mass_content_of_cloud_liquid_water_in_atmosphere_layer, effective_radius_of_cloud_liquid_water_particle, mass_content_of_cloud_ice_in_atmosphere_layer, effective_radius_of_cloud_ice_particle, wind_speed_at_surface, wind_from_direction_at_surface, average_surface_temperature_within_field_of_view, geopotential_height
0: OOPS_TRACE[0] leaving Vader::changeVar
0: OOPS_TRACE[0] State::State (from geom, vars and time) starting
0: OOPS_TRACE[0] State::State (from geom, vars and time) done
0: OOPS_TRACE[0] fv3jedi::VarChaModel2GeoVaLs changeVar start
5: Field_fail: Field water_area_fraction cannot be obtained from input fields.
5: Abort(1) on node 5 (rank 5 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 5
4: Field_fail: Field water_area_fraction cannot be obtained from input fields.
4: Abort(1) on node 4 (rank 4 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 4

A check of atmanlvar.yaml in the run directory does not find any references to water_area_fraction. None of the files in directory fv3jedi/ mention water_area_fraction. The background fields do not contain water_area_fraction. Not sure where, how, or why Vadar is trying to produce water_area_fraction.

@RussTreadon-NOAA
Copy link
Contributor Author

test_gdasapp_atm_jjob_var_run assimilates amsua_n19 and sondes. Remove amsua_n19 from list of assimilated observations. The init job failed because g-w ush/python/pygfs/task/atm_analysis.py assumes bias correction files will always be staged. Modified atm_analysis.py as follows

@@ -114,12 +114,15 @@ class AtmAnalysis(Task):
         # stage bias corrections
         logger.info(f"Staging list of bias correction files")
         bias_dict = self.jedi_dict['atmanlvar'].render_jcb(self.task_config, 'atm_bias_staging')
-        bias_dict['copy'] = Jedi.remove_redundant(bias_dict['copy'])
-        FileHandler(bias_dict).sync()
-        logger.debug(f"Bias correction files:\n{pformat(bias_dict)}")
+        if bias_dict['copy'] is None:
+            logger.info(f"No bias correction files to stage")
+        else:
+            bias_dict['copy'] = Jedi.remove_redundant(bias_dict['copy'])
+            FileHandler(bias_dict).sync()
+            logger.debug(f"Bias correction files:\n{pformat(bias_dict)}")

-        # extract bias corrections
-        Jedi.extract_tar_from_filehandler_dict(bias_dict)
+            # extract bias corrections
+            Jedi.extract_tar_from_filehandler_dict(bias_dict)

         # stage CRTM fix files
         logger.info(f"Staging CRTM fix files from {self.task_config.CRTM_FIX_YAML}")

With this local change in place the init job ran to completion. The var job successfully ran 3dvar assimilating only sondes. The job failed the reference check since the reference state assimilates amsua_n19 and sondes.

Note that test_gdasapp_atm_jjob_var_run runs the variational analysis using the identity matrix for the background error. This test should be rerun using GSIBEC and/or an ensemble.

Has the default behavior for radiance data assimilation changed? Do we now require numerous surface fields be available? This makes sense if one wants to accurately compute surface emissivity. Surface conditions can also be used for data filtering and QC. This is a change from previous JEDI hashes. test_gdasapp_atm_jjob_var_run previously passed.

@DavidNew-NOAA
Copy link
Collaborator

@RussTreadon-NOAA Is the failure in Jedi.remove_redundant()? Just so I know have I can fix #2992

@RussTreadon-NOAA
Copy link
Contributor Author

@DavidNew-NOAA Yes, the traceback mentions remove_redundant

^[[38;21m2024-11-05 20:02:14,130 - INFO     - jedi        :   END: pygfs.jedi.jedi.render_jcb^[[0m
^[[38;5;39m2024-11-05 20:02:14,132 - DEBUG    - jedi        :  returning: {'mkdir': ['/work/noaa/da/rtreadon/git/global-workflow/pr2992/sor\
c/gdas.cd/build/gdas/test/atm/global-workflow/testrun/RUNDIRS/gdas_test/gdasatmanl_18/bc/'], 'copy': None}^[[0m
^[[38;21m2024-11-05 20:02:14,132 - INFO     - jedi        : BEGIN: pygfs.jedi.jedi.remove_redundant^[[0m
^[[38;5;39m2024-11-05 20:02:14,132 - DEBUG    - jedi        : ( None )^[[0m
Traceback (most recent call last):
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/scripts/exglobal_atm_analysis_initialize.py", line 26, in <module>
    AtmAnl.initialize()
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/wxflow/logger.py", line 266, in wrapper
    retval = func(*args, **kwargs)
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/pygfs/task/atm_analysis.py", line 117, in initialize
    bias_dict['copy'] = Jedi.remove_redundant(bias_dict['copy'])
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/wxflow/logger.py", line 266, in wrapper
    retval = func(*args, **kwargs)
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/pygfs/jedi/jedi.py", line 242, in remove_redundant
    for item in input_list:
TypeError: 'NoneType' object is not iterable
+ slurm_script[1]: postamble slurm_script 1730836856 1

If you can fix this in g-w PR #2992, great!

@RussTreadon-NOAA
Copy link
Contributor Author

@DavidNew-NOAA : Updated working copy of feature/jcb-obsbias to e59e883. Rerun test_gdasapp_atm_jjob_var_init without amsua_n19 in the list of assimilated observations. ctest failed as before in remove_redundant

^[[38;21m2024-11-06 11:22:21,613 - INFO     - jedi        :   END: pygfs.jedi.jedi.render_jcb^[[0m
^[[38;5;39m2024-11-06 11:22:21,613 - DEBUG    - jedi        :  returning: {'mkdir': ['/work/noaa/da/rtreadon/git/global-workflow/pr29\
92/sorc/gdas.cd/build/gdas/test/atm/global-workflow/testrun/RUNDIRS/gdas_test/gdasatmanl_18/bc/'], 'copy': None}^[[0m
^[[38;21m2024-11-06 11:22:21,613 - INFO     - jedi        : BEGIN: pygfs.jedi.jedi.remove_redundant^[[0m
^[[38;5;39m2024-11-06 11:22:21,613 - DEBUG    - jedi        : ( None )^[[0m
Traceback (most recent call last):
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/scripts/exglobal_atm_analysis_initialize.py", line 26, in <module>
    AtmAnl.initialize()
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/wxflow/logger.py", line 266, in wrapper
    retval = func(*args, **kwargs)
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/pygfs/task/atm_analysis.py", line 118, in initialize
    bias_dict['copy'] = Jedi.remove_redundant(bias_dict['copy'])
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/wxflow/logger.py", line 266, in wrapper
    retval = func(*args, **kwargs)
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/pygfs/jedi/jedi.py", line 253, in remove_redundant
    for item in input_list:
TypeError: 'NoneType' object is not iterable

@DavidNew-NOAA
Copy link
Collaborator

@RussTreadon-NOAA That newest commit didn't have a fix yet for this ob issue. I will work on it this morning.

@DavidNew-NOAA
Copy link
Collaborator

@RussTreadon-NOAA Actually, I just committed the changes you suggested. There's really no reason to mess with remove_redundant for this problem.

@DavidNew-NOAA
Copy link
Collaborator

DavidNew-NOAA commented Nov 6, 2024

Forgot a line..make sure it's commit 7ac6ccb2bbf88b25fb533185c5d481cd328415ee (latest)

@RussTreadon-NOAA
Copy link
Contributor Author

Thank you @DavidNew-NOAA . test_gdasapp_atm_jjob_var_init passes without amsua_n19!

@RussTreadon-NOAA
Copy link
Contributor Author

@danholdaway , @ADCollard , and @emilyhcliu : When I update GDASApp JEDI hashes in develop, ctest test_gdasapp_atm_jjob_var_run when when processing amsua_n19 with the error

0: Requested variables Vader could not produce: 25 variables: water_area_fraction, land_area_fraction, ice_area_fraction, surface_snow_area_fraction, skin_temperature_at_surface_where_sea, skin_temperature_at_surface_where_land, skin_temperature_at_surface_where_ice, skin_temperature_at_surface_where_snow, vegetation_area_fraction, leaf_area_index, volume_fraction_of_condensed_water_in_soil, soil_temperature, surface_snow_thickness, vegetation_type_index, soil_type, water_vapor_mixing_ratio_wrt_dry_air, mole_fraction_of_ozone_in_air, mass_content_of_cloud_liquid_water_in_at\
mosphere_layer, effective_radius_of_cloud_liquid_water_particle, mass_content_of_cloud_ice_in_atmosphere_layer, effective_radius_of_cloud_ice_particle, wind_speed_at_surface, wind_from_direction_at_surface, average_surface_temperature_within_field_of_view, geopotential_height
0: OOPS_TRACE[0] leaving Vader::changeVar
0: OOPS_TRACE[0] State::State (from geom, vars and time) starting
0: OOPS_TRACE[0] State::State (from geom, vars and time) done
0: OOPS_TRACE[0] fv3jedi::VarChaModel2GeoVaLs changeVar start
0: Field_fail: Field water_area_fraction cannot be obtained from input fields.
4: Field_fail: Field water_area_fraction cannot be obtained from input fields.
2: Field_fail: Field water_area_fraction cannot be obtained from input fields.
3: Field_fail: Field water_area_fraction cannot be obtained from input fields.
5: Field_fail: Field water_area_fraction cannot be obtained from input fields.
1: Field_fail: Field water_area_fraction cannot be obtained from input fields.
1: Abort(1) on node 1 (rank 1 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
2: Abort(1) on node 2 (rank 2 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2

Updating the JEDI hashes brings in changes from the Model Variable Renaming Sprint. What changed in fv-3jedi, ufo, or vader which now requires the variables listed on the Vader line above? The input yaml does not mention these fields.

test_gdasapp_atm_jjob_var_run only assimilates amsua_n19 and sondes. The test passes if I remove amsua_n19.

@danholdaway
Copy link
Contributor

This is failing because this if statement is not true when it should be. Likely because a variable is not being recognized as being present. Can you point me to your GDASapp and jcb-gdas code?

@RussTreadon-NOAA
Copy link
Contributor Author

This is failing because this if statement is not true when it should be. Likely because a variable is not being recognized as being present. Can you point me to your GDASapp and jcb-gdas code?

@danholdaway : Here key directories and the job log file (all on Hercules):

  • GDASApp: work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd
  • jcb-gdas: /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/parm/jcb-gdas
  • atmanlvar run directory: /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build/gdas/test/atm/global-workflow/testrun/RUNDIRS/gdas_test/gdasatmanl_18
  • failed job log file: /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build/gdas/test/atm/global-workflow/testrun/atmanlvar-3103929.out

@RussTreadon-NOAA
Copy link
Contributor Author

Prints added to src/fv3jedi/VariableChange/Model2GeoVaLs/fv3jedi_vc_model2geovals_mod.f90 show that have_ts is .false.. The input cube sphere history surface file contains

        double land(time, tile, grid_yt, grid_xt) ;
        double weasd(time, tile, grid_yt, grid_xt) ;
        double tmpsfc(time, tile, grid_yt, grid_xt) ;
        double vtype(time, tile, grid_yt, grid_xt) ;
        double sotyp(time, tile, grid_yt, grid_xt) ;
        double veg(time, tile, grid_yt, grid_xt) ;
        double soilt1(time, tile, grid_yt, grid_xt) ;
        double soilt2(time, tile, grid_yt, grid_xt) ;
        double soilt3(time, tile, grid_yt, grid_xt) ;
        double soilt4(time, tile, grid_yt, grid_xt) ;
        double soilw1(time, tile, grid_yt, grid_xt) ;
        double soilw2(time, tile, grid_yt, grid_xt) ;
        double soilw3(time, tile, grid_yt, grid_xt) ;
        double soilw4(time, tile, grid_yt, grid_xt) ;
        double snod(time, tile, grid_yt, grid_xt) ;
        double ugrd_hyblev1(time, tile, grid_yt, grid_xt) ;
        double vgrd_hyblev1(time, tile, grid_yt, grid_xt) ;
        double f10m(time, tile, grid_yt, grid_xt) ;

There is no ts or tsea field. The cube history surface file only contains tmpsfc. A check of the sfc_data tile files shows that these files contains several temperature fields

        double tsea(Time, yaxis_1, xaxis_1) ;
        double tisfc(Time, yaxis_1, xaxis_1) ;
        double tsfc(Time, yaxis_1, xaxis_1) ;
        double tsfcl(Time, yaxis_1, xaxis_1) ;
        double tiice(Time, zaxis_1, yaxis_1, xaxis_1) ;

Does the cube history file contain all the information we need to defined surface characteristics for radiance assimilation?

@danholdaway
Copy link
Contributor

In jcb-gdas you changed surface_geopotential_height to hgtsfc and tsea to tmpsfc. Perhaps try changing as:

surface_geopotential_height -> geopotential_height_times_gravity_at_surface
tsea -> sst?

Switching from the old short name to the IO name may have resulted in crossed wires.

@danholdaway
Copy link
Contributor

I think the sst change is because of https://github.com/JCSDA-internal/fv3-jedi/pull/1258 rather than variable naming conventions.

@RussTreadon-NOAA
Copy link
Contributor Author

RussTreadon-NOAA commented Nov 8, 2024

Thank you @danholdaway for pointing me at fv3-jedi PR #1258. I see there was confusion over the name used for the skin temperature. This confusion remains when I ncdump -hcs gfs history cubed_sphere_grid_sfcf006.nc and restart sfc_data.tile*.nc

Our cube sphere surface history files contain the following fields

        double land(time, tile, grid_yt, grid_xt) ;
        double weasd(time, tile, grid_yt, grid_xt) ;
        double tmpsfc(time, tile, grid_yt, grid_xt) ;
        double vtype(time, tile, grid_yt, grid_xt) ;
        double sotyp(time, tile, grid_yt, grid_xt) ;
        double veg(time, tile, grid_yt, grid_xt) ;
        double soilt1(time, tile, grid_yt, grid_xt) ;
        double soilt2(time, tile, grid_yt, grid_xt) ;
        double soilt3(time, tile, grid_yt, grid_xt) ;
        double soilt4(time, tile, grid_yt, grid_xt) ;
        double soilw1(time, tile, grid_yt, grid_xt) ;
        double soilw2(time, tile, grid_yt, grid_xt) ;
        double soilw3(time, tile, grid_yt, grid_xt) ;
        double soilw4(time, tile, grid_yt, grid_xt) ;
        double snod(time, tile, grid_yt, grid_xt) ;
        double ugrd_hyblev1(time, tile, grid_yt, grid_xt) ;
        double vgrd_hyblev1(time, tile, grid_yt, grid_xt) ;
        double f10m(time, tile, grid_yt, grid_xt) ;

There is neither sst nor tsea. The cube history surface contains tmpsfc.

Our tiled surface restart files contain the following fields starting with t

        double tsea(Time, yaxis_1, xaxis_1) ;
        double tg3(Time, yaxis_1, xaxis_1) ;
        double t2m(Time, yaxis_1, xaxis_1) ;
        double tisfc(Time, yaxis_1, xaxis_1) ;
        double tprcp(Time, yaxis_1, xaxis_1) ;
        double tsfc(Time, yaxis_1, xaxis_1) ;
        double tsfcl(Time, yaxis_1, xaxis_1) ;
        double tref(Time, yaxis_1, xaxis_1) ;
        double tvxy(Time, yaxis_1, xaxis_1) ;
        double tgxy(Time, yaxis_1, xaxis_1) ;
        double tahxy(Time, yaxis_1, xaxis_1) ;
        double taussxy(Time, yaxis_1, xaxis_1) ;
        double tiice(Time, zaxis_1, yaxis_1, xaxis_1) ;
        double tsnoxy(Time, zaxis_3, yaxis_1, xaxis_1) ;

The restart surface tiles contains tsea. There is no tmpsfc.

Our atmospheric variational and local ensemble yamls now use filetype: cube sphere history for the backgrounds. The updated fv3-jedi code does not recognize tmpsfc. Is there a way via tables, parm files, or yamls to get the code to process tmpsfc?

The restart tiles have what appear to be fields for temperature over various surface types

  • tsea - temperature over sea surface?
  • tisfc - temperature over ice surface?
  • tsfc - temperature over all surfaces?
  • tsfcl - temperature over land surface?

Which temperature or combination of temperature should we pass to CRTM?

I sidestepped this question and did a simple test. I renamed tmpsfc as ts in the cube sphere surface history file. With this change, ctest test_gdasapp_atm_jjob_var_run Passed. This is good but is tmpsfc or it's renamed variant, ts, the correct temperature to pass to CRTM?

I can replace the variable name tmpsfc with ts in our canned ctest cube sphere surface history files, but is this right approach?

Tagging @emilyhcliu , @ADCollard , @CoryMartin-NOAA , and @DavidNew-NOAA . Two questions

  1. What's the short term patch to keep this issue moving forward?
  2. What temperature should we be passing to the CRTM?

The response to question 1 can be captured in this issue. Resolution of questions 2 likely needs a new issue.

@danholdaway
Copy link
Contributor

@RussTreadon-NOAA the issue might be in the mapping between tmpsfc and the long name in the FieldMetadata file. Do you know where that is coming from? It might be a fix file I guess.

@RussTreadon-NOAA
Copy link
Contributor Author

@danholdaway , you are right.

I spent the morning wading through code, yamls, parm files, & fix files. I found the spot to make the correct linkage between fv3-jedi source code and our gfs cube sphere history files. With the change in place the variational and local ensemble DA jobs passed. The increment jobs failed. I still need to update the yamls for these jobs

(gdasapp) hercules-login-2:/work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build$ ctest -R test_gdasapp_atm_jjob
Test project /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build
      Start 2016: test_gdasapp_atm_jjob_var_init
 1/11 Test #2016: test_gdasapp_atm_jjob_var_init .........   Passed   45.77 sec
      Start 2017: test_gdasapp_atm_jjob_var_run
 2/11 Test #2017: test_gdasapp_atm_jjob_var_run ..........   Passed  106.22 sec
      Start 2018: test_gdasapp_atm_jjob_var_inc
 3/11 Test #2018: test_gdasapp_atm_jjob_var_inc ..........***Failed   42.28 sec
      Start 2019: test_gdasapp_atm_jjob_var_final
 4/11 Test #2019: test_gdasapp_atm_jjob_var_final ........***Failed   42.23 sec
      Start 2020: test_gdasapp_atm_jjob_ens_init
 5/11 Test #2020: test_gdasapp_atm_jjob_ens_init .........   Passed   45.67 sec
      Start 2021: test_gdasapp_atm_jjob_ens_letkf
 6/11 Test #2021: test_gdasapp_atm_jjob_ens_letkf ........   Passed  554.37 sec
      Start 2022: test_gdasapp_atm_jjob_ens_init_split
 7/11 Test #2022: test_gdasapp_atm_jjob_ens_init_split ...   Passed   45.85 sec
      Start 2023: test_gdasapp_atm_jjob_ens_obs
 8/11 Test #2023: test_gdasapp_atm_jjob_ens_obs ..........   Passed   42.27 sec
      Start 2024: test_gdasapp_atm_jjob_ens_sol
 9/11 Test #2024: test_gdasapp_atm_jjob_ens_sol ..........   Passed   42.28 sec
      Start 2025: test_gdasapp_atm_jjob_ens_inc
10/11 Test #2025: test_gdasapp_atm_jjob_ens_inc ..........***Failed   42.26 sec
      Start 2026: test_gdasapp_atm_jjob_ens_final
11/11 Test #2026: test_gdasapp_atm_jjob_ens_final ........***Failed   74.29 sec

64% tests passed, 4 tests failed out of 11

Total Test time (real) = 1083.98 sec

The file I modified is $HOMEgfs/fix/gdas/fv3jedi/fieldmetadata/gfs-history.yaml. I replaced

- long name: skin_temperature_at_surface
  io name: tsea

with

- long name: skin_temperature_at_surface
  io name: tmpsfc

@danholdaway
Copy link
Contributor

Thanks @RussTreadon-NOAA, really nice work digging through. If that fix file came directly from fv3-jedi (and was used in the fv3-jedi tests) there wouldn't have been any work to do so perhaps we should look into doing that.

@RussTreadon-NOAA
Copy link
Contributor Author

Agreed! We've been bitten by this disconnect more than once.

@RussTreadon-NOAA
Copy link
Contributor Author

Hercules test
Install g-w PR #2992 on Hercules. Use sorc/gdas.cd/ush/submodules/update_develop.sh to update JEDI hashes. Iteratively work through issues to get atm var and ensda ctests to pass. Run all test_gdasapp ctests with the following results

Test project /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build
      Start 1582: test_gdasapp_util_coding_norms
 1/55 Test #1582: test_gdasapp_util_coding_norms ......................................   Passed    4.28 sec
      Start 1583: test_gdasapp_util_ioda_example
 2/55 Test #1583: test_gdasapp_util_ioda_example ......................................   Passed   12.28 sec
      Start 1584: test_gdasapp_util_prepdata
 3/55 Test #1584: test_gdasapp_util_prepdata ..........................................   Passed    5.11 sec
      Start 1585: test_gdasapp_util_rads2ioda
 4/55 Test #1585: test_gdasapp_util_rads2ioda .........................................   Passed    0.94 sec
      Start 1586: test_gdasapp_util_ghrsst2ioda
 5/55 Test #1586: test_gdasapp_util_ghrsst2ioda .......................................   Passed    0.11 sec
      Start 1587: test_gdasapp_util_rtofstmp
 6/55 Test #1587: test_gdasapp_util_rtofstmp ..........................................   Passed    1.41 sec
      Start 1588: test_gdasapp_util_rtofssal
 7/55 Test #1588: test_gdasapp_util_rtofssal ..........................................   Passed    0.45 sec
      Start 1589: test_gdasapp_util_smap2ioda
 8/55 Test #1589: test_gdasapp_util_smap2ioda .........................................   Passed    0.09 sec
      Start 1590: test_gdasapp_util_smos2ioda
 9/55 Test #1590: test_gdasapp_util_smos2ioda .........................................   Passed    0.13 sec
      Start 1591: test_gdasapp_util_viirsaod2ioda
10/55 Test #1591: test_gdasapp_util_viirsaod2ioda .....................................   Passed    0.09 sec
      Start 1592: test_gdasapp_util_icecabi2ioda
11/55 Test #1592: test_gdasapp_util_icecabi2ioda ......................................   Passed    0.12 sec
      Start 1593: test_gdasapp_util_icecamsr2ioda
12/55 Test #1593: test_gdasapp_util_icecamsr2ioda .....................................   Passed    0.11 sec
      Start 1594: test_gdasapp_util_icecmirs2ioda
13/55 Test #1594: test_gdasapp_util_icecmirs2ioda .....................................   Passed    0.09 sec
      Start 1595: test_gdasapp_util_icecjpssrr2ioda
14/55 Test #1595: test_gdasapp_util_icecjpssrr2ioda ...................................   Passed    0.09 sec
      Start 1951: test_gdasapp_check_python_norms
15/55 Test #1951: test_gdasapp_check_python_norms .....................................   Passed    3.95 sec
      Start 1952: test_gdasapp_check_yaml_keys
16/55 Test #1952: test_gdasapp_check_yaml_keys ........................................   Passed    1.28 sec
      Start 1953: test_gdasapp_jedi_increment_to_fv3
17/55 Test #1953: test_gdasapp_jedi_increment_to_fv3 ..................................   Passed    9.07 sec
      Start 1954: test_gdasapp_fv3jedi_fv3inc
18/55 Test #1954: test_gdasapp_fv3jedi_fv3inc .........................................   Passed   24.38 sec
      Start 1955: test_gdasapp_snow_create_ens
19/55 Test #1955: test_gdasapp_snow_create_ens ........................................   Passed    0.84 sec
      Start 1956: test_gdasapp_snow_imsproc
20/55 Test #1956: test_gdasapp_snow_imsproc ...........................................   Passed    3.40 sec
      Start 1957: test_gdasapp_snow_apply_jediincr
21/55 Test #1957: test_gdasapp_snow_apply_jediincr ....................................   Passed    2.35 sec
      Start 1958: test_gdasapp_snow_letkfoi_snowda
22/55 Test #1958: test_gdasapp_snow_letkfoi_snowda ....................................   Passed    7.13 sec
      Start 1959: test_gdasapp_convert_bufr_adpsfc_snow
23/55 Test #1959: test_gdasapp_convert_bufr_adpsfc_snow ...............................   Passed    3.40 sec
      Start 1960: test_gdasapp_WCDA-3DVAR-C48mx500
24/55 Test #1960: test_gdasapp_WCDA-3DVAR-C48mx500 ....................................   Passed   16.80 sec
      Start 1961: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_stage_ic_202103241200
25/55 Test #1961: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_stage_ic_202103241200 .........   Passed  1385.57 sec
      Start 1962: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_fcst_202103241200
26/55 Test #1962: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_fcst_202103241200 .............   Passed  643.56 sec
      Start 1963: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_prepoceanobs_202103241800
27/55 Test #1963: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_prepoceanobs_202103241800 .....   Passed  269.23 sec
      Start 1964: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marinebmat_202103241800
28/55 Test #1964: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marinebmat_202103241800 .......   Passed  269.72 sec
      Start 1965: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlinit_202103241800
29/55 Test #1965: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlinit_202103241800 ....   Passed  395.73 sec
      Start 1966: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlvar_202103241800
30/55 Test #1966: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlvar_202103241800 .....***Failed  596.78 sec
      Start 1967: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlchkpt_202103241800
31/55 Test #1967: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlchkpt_202103241800 ...***Failed  294.69 sec
      Start 1968: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800
32/55 Test #1968: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800 ...***Failed  292.32 sec
      Start 1969: test_gdasapp_convert_bufr_adpsfc
33/55 Test #1969: test_gdasapp_convert_bufr_adpsfc ....................................   Passed   11.23 sec
      Start 1970: test_gdasapp_convert_gsi_satbias
34/55 Test #1970: test_gdasapp_convert_gsi_satbias ....................................   Passed    4.34 sec
      Start 1971: test_gdasapp_setup_atm_cycled_exp
35/55 Test #1971: test_gdasapp_setup_atm_cycled_exp ...................................   Passed    2.28 sec
      Start 1972: test_gdasapp_atm_jjob_var_init
36/55 Test #1972: test_gdasapp_atm_jjob_var_init ......................................   Passed   45.04 sec
      Start 1973: test_gdasapp_atm_jjob_var_run
37/55 Test #1973: test_gdasapp_atm_jjob_var_run .......................................   Passed  106.17 sec
      Start 1974: test_gdasapp_atm_jjob_var_inc
38/55 Test #1974: test_gdasapp_atm_jjob_var_inc .......................................   Passed   42.19 sec
      Start 1975: test_gdasapp_atm_jjob_var_final
39/55 Test #1975: test_gdasapp_atm_jjob_var_final .....................................   Passed   42.17 sec
      Start 1976: test_gdasapp_atm_jjob_ens_init
40/55 Test #1976: test_gdasapp_atm_jjob_ens_init ......................................   Passed   44.80 sec
      Start 1977: test_gdasapp_atm_jjob_ens_letkf
41/55 Test #1977: test_gdasapp_atm_jjob_ens_letkf .....................................   Passed  554.25 sec
      Start 1978: test_gdasapp_atm_jjob_ens_init_split
42/55 Test #1978: test_gdasapp_atm_jjob_ens_init_split ................................   Passed  140.93 sec
      Start 1979: test_gdasapp_atm_jjob_ens_obs
43/55 Test #1979: test_gdasapp_atm_jjob_ens_obs .......................................   Passed   74.18 sec
      Start 1980: test_gdasapp_atm_jjob_ens_sol
44/55 Test #1980: test_gdasapp_atm_jjob_ens_sol .......................................   Passed   42.19 sec
      Start 1981: test_gdasapp_atm_jjob_ens_inc
45/55 Test #1981: test_gdasapp_atm_jjob_ens_inc .......................................   Passed   42.17 sec
      Start 1982: test_gdasapp_atm_jjob_ens_final
46/55 Test #1982: test_gdasapp_atm_jjob_ens_final .....................................   Passed   42.19 sec
      Start 1983: test_gdasapp_aero_gen_3dvar_yaml
47/55 Test #1983: test_gdasapp_aero_gen_3dvar_yaml ....................................   Passed    0.46 sec
      Start 1984: test_gdasapp_bufr2ioda_insitu_profile_argo
48/55 Test #1984: test_gdasapp_bufr2ioda_insitu_profile_argo ..........................***Failed    5.33 sec
      Start 1985: test_gdasapp_bufr2ioda_insitu_profile_bathy
49/55 Test #1985: test_gdasapp_bufr2ioda_insitu_profile_bathy .........................***Failed    0.22 sec
      Start 1986: test_gdasapp_bufr2ioda_insitu_profile_glider
50/55 Test #1986: test_gdasapp_bufr2ioda_insitu_profile_glider ........................***Failed    0.22 sec
      Start 1987: test_gdasapp_bufr2ioda_insitu_profile_tesac
51/55 Test #1987: test_gdasapp_bufr2ioda_insitu_profile_tesac .........................***Failed    0.21 sec
      Start 1988: test_gdasapp_bufr2ioda_insitu_profile_tropical
52/55 Test #1988: test_gdasapp_bufr2ioda_insitu_profile_tropical ......................***Failed    0.22 sec
      Start 1989: test_gdasapp_bufr2ioda_insitu_profile_xbtctd
53/55 Test #1989: test_gdasapp_bufr2ioda_insitu_profile_xbtctd ........................***Failed    0.22 sec
      Start 1990: test_gdasapp_bufr2ioda_insitu_surface_drifter
54/55 Test #1990: test_gdasapp_bufr2ioda_insitu_surface_drifter .......................***Failed    0.22 sec
      Start 1991: test_gdasapp_bufr2ioda_insitu_surface_trkob
55/55 Test #1991: test_gdasapp_bufr2ioda_insitu_surface_trkob .........................***Failed    0.22 sec

80% tests passed, 11 tests failed out of 55

Label Time Summary:
gdas-utils    =  25.30 sec*proc (14 tests)
manual        = 4164.41 sec*proc (9 tests)
script        =  25.30 sec*proc (14 tests)

Total Test time (real) = 5451.27 sec

The following tests FAILED:
        1966 - test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlvar_202103241800 (Failed)
        1967 - test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlchkpt_202103241800 (Failed)
        1968 - test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800 (Failed)
        1984 - test_gdasapp_bufr2ioda_insitu_profile_argo (Failed)
        1985 - test_gdasapp_bufr2ioda_insitu_profile_bathy (Failed)
        1986 - test_gdasapp_bufr2ioda_insitu_profile_glider (Failed)
        1987 - test_gdasapp_bufr2ioda_insitu_profile_tesac (Failed)
        1988 - test_gdasapp_bufr2ioda_insitu_profile_tropical (Failed)
        1989 - test_gdasapp_bufr2ioda_insitu_profile_xbtctd (Failed)
        1990 - test_gdasapp_bufr2ioda_insitu_surface_drifter (Failed)
        1991 - test_gdasapp_bufr2ioda_insitu_surface_trkob (Failed)
Errors while running CTest
Output from these tests are in: /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build/Testing/Temporary/LastTest.log

The test_gdasapp_bufr2ioda_insitu failures are a know problem. Each of these jobs fails with the same message ModuleNotFoundError: No module named 'pyiodaconv'. For example, here is the traceback from test_gdasapp_bufr2ioda_insitu_profile_argo

1984: Traceback (most recent call last):
1984:   File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/bundle/gdas/ush/ioda/bufr2ioda/marine/b2i/bufr2ioda_insitu_profile_argo.py", line 6, in <module>
1984:     from b2iconverter.bufr2ioda_converter import Bufr2ioda_Converter
1984:   File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/ush/ioda/bufr2ioda/marine/b2i/b2iconverter/bufr2ioda_converter.py", line 7, in <module>
1984:     from pyiodaconv import bufr
1984: ModuleNotFoundError: No module named 'pyiodaconv'
1/1 Test #1984: test_gdasapp_bufr2ioda_insitu_profile_argo ...***Failed   16.83 sec

@apchoiCMD , do you have a branch with changes that allow these tests to pass?

The test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlvar_202103241800 failure appears to be related to the Model Variable Renaming Sprint. The job fails with the message

  0: insitu_profile_argo processed vars: 2 Variables: waterTemperature, salinity
 0: insitu_profile_argo assimilated vars: 2 Variables: waterTemperature, salinity
 1: Unable to find field metadata for: cicen
 9: Unable to find field metadata for: cicen
13: Unable to find field metadata for: cicen
15: Unable to find field metadata for: cicen
 1: Abort(1) on node 1 (rank 1 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
 7: Unable to find field metadata for: cicen
 9: Abort(1) on node 9 (rank 9 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 9
13: Abort(1) on node 13 (rank 13 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 13
14: Unable to find field metadata for: cicen
15: Abort(1) on node 15 (rank 15 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 15

@guillaumevernieres , do you know where / what needs to be changed in yamls or fixed files to get the marinevar test to pass? The log file for the failed job is /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build/gdas/test/gw-ci/WCDA-3DVAR-C48mx500/COMROOT/WCDA-3DVAR-C48mx500/logs/2021032418/gdas_marineanlvar.log

@RussTreadon-NOAA
Copy link
Contributor Author

g-w CI for DA

Successfully run C96C48_ufs_hybatmDA g-w CI on Hercules.

C96C48_hybatmaerosnowDA and C48mx500_3DVarAOWCDA fail.

The C48mx500_3DVarAOWCDA failure is expected given ctest failures.

The C96C48_hybatmaerosnowDA failure is in the 20211220 18Z enkfgdas_esnowrecen.log. Executable fregrid.x aborts with the following message

NOTE: done calculating index and weight for conservative interpolation
Successfully running fregrid and the following output file are generated.
****./bkg/det_ensres//20211220.150000.sfc_data.tile1.nc
****./bkg/det_ensres//20211220.150000.sfc_data.tile2.nc
****./bkg/det_ensres//20211220.150000.sfc_data.tile3.nc
****./bkg/det_ensres//20211220.150000.sfc_data.tile4.nc
****./bkg/det_ensres//20211220.150000.sfc_data.tile5.nc
****./bkg/det_ensres//20211220.150000.sfc_data.tile6.nc
^[[38;21m2024-11-11 11:56:28,496 - INFO     - snowens_analysis:   END: pygfs.task.snowens_analysis.regridDetBkg^[[0m
^[[38;5;39m2024-11-11 11:56:28,496 - DEBUG    - snowens_analysis:  returning: None^[[0m
^[[38;21m2024-11-11 11:56:28,496 - INFO     - snowens_analysis: BEGIN: pygfs.task.snowens_analysis.regridDetInc^[[0m
^[[38;5;39m2024-11-11 11:56:28,496 - DEBUG    - snowens_analysis: ( <pygfs.task.snowens_analysis.SnowEnsAnalysis object at 0x1460b0588f10> )^[[0m
^[[38;5;39m2024-11-11 11:56:28,496 - DEBUG    - snowens_analysis: Executing /work/noaa/stmp/rtreadon/HERCULES/RUNDIRS/praero_pr2992/enkfgdas.2021122018/esnowrecen.1577478/fregrid.x^[[0m
^[[38;21m2024-11-11 11:56:28,678 - INFO     - root        : BEGIN: wxflow.exceptions.__init__^[[0m
^[[38;5;39m2024-11-11 11:56:28,678 - DEBUG    - root        : ( WorkflowException('An error occured during execution of /work/noaa/stmp/rtreadon/HERCULES/RUNDIRS/praero_pr2992/enkfgdas.2021122018/esnowrecen.1577478/fregrid.x'), 'An error occured during execution of /work/noaa/stmp/rtreadon/HERCULES/RUNDIRS/praero_pr2992/enkfgdas.2021122018/esnowrecen.1577478/fregrid.x' )^[[0m

...

Traceback (most recent call last):
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/pygfs/task/snowens_analysis.py", line 230, in regridDetInc
    exec_cmd(*arg_list)
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/wxflow/executable.py", line 230, in __call__
    raise ProcessError(f"Command exited with status {proc.returncode}:", long_msg)
wxflow.executable.ProcessError: Command exited with status -11:
'/work/noaa/stmp/rtreadon/HERCULES/RUNDIRS/praero_pr2992/enkfgdas.2021122018/esnowrecen.1577478/fregrid.x' '--input_mosaic' './orog/det/C96_mosaic.nc' '--input_dir' './inc/det/' '--input_file' 'snowinc.20211220.150000.sfc_data' '--scalar_field' 'snodl' '--output_dir' './inc/det_ensres/' '--output_file' 'snowinc.20211220.150000.sfc_data' '--output_mosaic' './orog/ens/C48_mosaic.nc' '--interp_method' 'conserve_order1' '--weight_file' './orog/det/C96.mx500_interp_weight' '--weight_field' 'lsm_frac' '--remap_file' './remap'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/scripts/exgdas_enkf_snow_recenter.py", line 27, in <module>
    anl.regridDetInc()
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/wxflow/logger.py", line 266, in wrapper
    retval = func(*args, **kwargs)
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/pygfs/task/snowens_analysis.py", line 234, in regridDetInc
    raise WorkflowException(f"An error occured during execution of {exec_cmd}")
wxflow.exceptions.WorkflowException
+ JGDAS_ENKF_SNOW_RECENTER[1]: postamble JGDAS_ENKF_SNOW_RECENTER 1731326133 1

It is not clear from the traceback what the actual error is. Since this installation of GDASApp includes JEDI hashes with changes from the Model Variable Renaming Sprint, one or more yaml keywords or fix file keyword most likely need to be updated.

@jiaruidong2017, @ClaraDraper-NOAA : Any ideas what we need to change in JEDI snow DA when moving to JEDI hashes which include changes from the Model Variable Renaming Sprint?

The log file for the failed job is /work/noaa/stmp/rtreadon/COMROOT/praero_pr2992/logs/2021122018/enkfgdas_esnowrecen.log on Hercules.

@RussTreadon-NOAA
Copy link
Contributor Author

test_gdasapp update

Install g-w PR #2992 on Hercules. Specifically, g-w branch DavidNew-NOAA:feature/jcb-obsbias at a6fd65ad was installed. sorc/gdas.cd was replaced with GDASApp branch feature/resume_nightly at 4561ead.

test_gdasapp was run with the following results

Test project /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build
      Start 1582: test_gdasapp_util_coding_norms
 1/64 Test #1582: test_gdasapp_util_coding_norms ......................................   Passed    8.98 sec
      Start 1583: test_gdasapp_util_ioda_example
 2/64 Test #1583: test_gdasapp_util_ioda_example ......................................   Passed   12.24 sec
      Start 1584: test_gdasapp_util_prepdata
 3/64 Test #1584: test_gdasapp_util_prepdata ..........................................   Passed    3.69 sec
      Start 1585: test_gdasapp_util_rads2ioda
 4/64 Test #1585: test_gdasapp_util_rads2ioda .........................................   Passed    0.62 sec
      Start 1586: test_gdasapp_util_ghrsst2ioda
 5/64 Test #1586: test_gdasapp_util_ghrsst2ioda .......................................   Passed    0.12 sec
      Start 1587: test_gdasapp_util_rtofstmp
 6/64 Test #1587: test_gdasapp_util_rtofstmp ..........................................   Passed    1.97 sec
      Start 1588: test_gdasapp_util_rtofssal
 7/64 Test #1588: test_gdasapp_util_rtofssal ..........................................   Passed    0.46 sec
      Start 1589: test_gdasapp_util_smap2ioda
 8/64 Test #1589: test_gdasapp_util_smap2ioda .........................................   Passed    0.12 sec
      Start 1590: test_gdasapp_util_smos2ioda
 9/64 Test #1590: test_gdasapp_util_smos2ioda .........................................   Passed    0.12 sec
      Start 1591: test_gdasapp_util_viirsaod2ioda
10/64 Test #1591: test_gdasapp_util_viirsaod2ioda .....................................   Passed    0.12 sec
      Start 1592: test_gdasapp_util_icecabi2ioda
11/64 Test #1592: test_gdasapp_util_icecabi2ioda ......................................   Passed    0.13 sec
      Start 1593: test_gdasapp_util_icecamsr2ioda
12/64 Test #1593: test_gdasapp_util_icecamsr2ioda .....................................   Passed    0.12 sec
      Start 1594: test_gdasapp_util_icecmirs2ioda
13/64 Test #1594: test_gdasapp_util_icecmirs2ioda .....................................   Passed    0.12 sec
      Start 1595: test_gdasapp_util_icecjpssrr2ioda
14/64 Test #1595: test_gdasapp_util_icecjpssrr2ioda ...................................   Passed    0.12 sec
      Start 1951: test_gdasapp_check_python_norms
15/64 Test #1951: test_gdasapp_check_python_norms .....................................   Passed    2.67 sec
      Start 1952: test_gdasapp_check_yaml_keys
16/64 Test #1952: test_gdasapp_check_yaml_keys ........................................   Passed    0.91 sec
      Start 1953: test_gdasapp_jedi_increment_to_fv3
17/64 Test #1953: test_gdasapp_jedi_increment_to_fv3 ..................................   Passed    8.40 sec
      Start 1954: test_gdasapp_fv3jedi_fv3inc
18/64 Test #1954: test_gdasapp_fv3jedi_fv3inc .........................................   Passed   19.85 sec
      Start 1955: test_gdasapp_snow_create_ens
19/64 Test #1955: test_gdasapp_snow_create_ens ........................................   Passed    3.57 sec
      Start 1956: test_gdasapp_snow_imsproc
20/64 Test #1956: test_gdasapp_snow_imsproc ...........................................   Passed    3.01 sec
      Start 1957: test_gdasapp_snow_apply_jediincr
21/64 Test #1957: test_gdasapp_snow_apply_jediincr ....................................   Passed    4.58 sec
      Start 1958: test_gdasapp_snow_letkfoi_snowda
22/64 Test #1958: test_gdasapp_snow_letkfoi_snowda ....................................   Passed    9.79 sec
      Start 1959: test_gdasapp_convert_bufr_adpsfc_snow
23/64 Test #1959: test_gdasapp_convert_bufr_adpsfc_snow ...............................   Passed    3.04 sec
      Start 1960: test_gdasapp_WCDA-3DVAR-C48mx500
24/64 Test #1960: test_gdasapp_WCDA-3DVAR-C48mx500 ....................................   Passed   28.92 sec
      Start 1961: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_stage_ic_202103241200
25/64 Test #1961: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_stage_ic_202103241200 .........   Passed   45.72 sec
      Start 1962: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_fcst_seg0_202103241200
26/64 Test #1962: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_fcst_seg0_202103241200 ........   Passed  320.42 sec
      Start 1963: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_prepoceanobs_202103241800
27/64 Test #1963: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_prepoceanobs_202103241800 .....   Passed  248.05 sec
      Start 1964: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marinebmat_202103241800
28/64 Test #1964: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marinebmat_202103241800 .......***Failed  171.34 sec
      Start 1965: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlinit_202103241800
29/64 Test #1965: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlinit_202103241800 ....***Failed   58.38 sec
      Start 1966: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlvar_202103241800
30/64 Test #1966: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlvar_202103241800 .....***Failed   53.74 sec
      Start 1967: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlchkpt_202103241800
31/64 Test #1967: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlchkpt_202103241800 ...***Failed   43.18 sec
      Start 1968: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800
32/64 Test #1968: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800 ...***Failed   45.18 sec
      Start 1969: test_gdasapp_WCDA-hyb-C48mx500
33/64 Test #1969: test_gdasapp_WCDA-hyb-C48mx500 ......................................   Passed   34.23 sec
      Start 1970: test_gdasapp_WCDA-hyb-C48mx500_gdas_stage_ic_202103241200
34/64 Test #1970: test_gdasapp_WCDA-hyb-C48mx500_gdas_stage_ic_202103241200 ...........   Passed   58.17 sec
      Start 1971: test_gdasapp_WCDA-hyb-C48mx500_enkfgdas_stage_ic_202103241200
35/64 Test #1971: test_gdasapp_WCDA-hyb-C48mx500_enkfgdas_stage_ic_202103241200 .......   Passed   44.28 sec
      Start 1972: test_gdasapp_WCDA-hyb-C48mx500_gdas_fcst_seg0_202103241200
36/64 Test #1972: test_gdasapp_WCDA-hyb-C48mx500_gdas_fcst_seg0_202103241200 ..........   Passed  566.11 sec
      Start 1973: test_gdasapp_WCDA-hyb-C48mx500_enkfgdas_fcst_mem001_202103241200
37/64 Test #1973: test_gdasapp_WCDA-hyb-C48mx500_enkfgdas_fcst_mem001_202103241200 ....   Passed  426.34 sec
      Start 1974: test_gdasapp_WCDA-hyb-C48mx500_enkfgdas_fcst_mem002_202103241200
38/64 Test #1974: test_gdasapp_WCDA-hyb-C48mx500_enkfgdas_fcst_mem002_202103241200 ....   Passed  409.62 sec
      Start 1975: test_gdasapp_WCDA-hyb-C48mx500_enkfgdas_fcst_mem003_202103241200
39/64 Test #1975: test_gdasapp_WCDA-hyb-C48mx500_enkfgdas_fcst_mem003_202103241200 ....   Passed  423.69 sec
      Start 1976: test_gdasapp_WCDA-hyb-C48mx500_gdas_prepoceanobs_202103241800
40/64 Test #1976: test_gdasapp_WCDA-hyb-C48mx500_gdas_prepoceanobs_202103241800 .......   Passed  237.67 sec
      Start 1977: test_gdasapp_WCDA-hyb-C48mx500_gdas_marineanlletkf_202103241800
41/64 Test #1977: test_gdasapp_WCDA-hyb-C48mx500_gdas_marineanlletkf_202103241800 .....***Failed  137.85 sec
      Start 1978: test_gdasapp_convert_bufr_adpsfc
42/64 Test #1978: test_gdasapp_convert_bufr_adpsfc ....................................   Passed   16.93 sec
      Start 1979: test_gdasapp_convert_gsi_satbias
43/64 Test #1979: test_gdasapp_convert_gsi_satbias ....................................   Passed    8.28 sec
      Start 1980: test_gdasapp_setup_atm_cycled_exp
44/64 Test #1980: test_gdasapp_setup_atm_cycled_exp ...................................   Passed    7.26 sec
      Start 1981: test_gdasapp_atm_jjob_var_init
45/64 Test #1981: test_gdasapp_atm_jjob_var_init ......................................   Passed   80.19 sec
      Start 1982: test_gdasapp_atm_jjob_var_run
46/64 Test #1982: test_gdasapp_atm_jjob_var_run .......................................   Passed  106.44 sec
      Start 1983: test_gdasapp_atm_jjob_var_inc
47/64 Test #1983: test_gdasapp_atm_jjob_var_inc .......................................   Passed   74.35 sec
      Start 1984: test_gdasapp_atm_jjob_var_final
48/64 Test #1984: test_gdasapp_atm_jjob_var_final .....................................   Passed   42.34 sec
      Start 1985: test_gdasapp_atm_jjob_ens_init
49/64 Test #1985: test_gdasapp_atm_jjob_ens_init ......................................   Passed   79.58 sec
      Start 1986: test_gdasapp_atm_jjob_ens_letkf
50/64 Test #1986: test_gdasapp_atm_jjob_ens_letkf .....................................   Passed  778.65 sec
      Start 1987: test_gdasapp_atm_jjob_ens_init_split
51/64 Test #1987: test_gdasapp_atm_jjob_ens_init_split ................................   Passed  112.04 sec
      Start 1988: test_gdasapp_atm_jjob_ens_obs
52/64 Test #1988: test_gdasapp_atm_jjob_ens_obs .......................................   Passed   42.32 sec
      Start 1989: test_gdasapp_atm_jjob_ens_sol
53/64 Test #1989: test_gdasapp_atm_jjob_ens_sol .......................................   Passed   42.33 sec
      Start 1990: test_gdasapp_atm_jjob_ens_inc
54/64 Test #1990: test_gdasapp_atm_jjob_ens_inc .......................................   Passed  106.33 sec
      Start 1991: test_gdasapp_atm_jjob_ens_final
55/64 Test #1991: test_gdasapp_atm_jjob_ens_final .....................................   Passed   42.38 sec
      Start 1992: test_gdasapp_aero_gen_3dvar_yaml
56/64 Test #1992: test_gdasapp_aero_gen_3dvar_yaml ....................................   Passed    5.77 sec
      Start 1993: test_gdasapp_bufr2ioda_insitu_profile_argo
57/64 Test #1993: test_gdasapp_bufr2ioda_insitu_profile_argo ..........................***Failed   10.11 sec
      Start 1994: test_gdasapp_bufr2ioda_insitu_profile_bathy
58/64 Test #1994: test_gdasapp_bufr2ioda_insitu_profile_bathy .........................***Failed    0.42 sec
      Start 1995: test_gdasapp_bufr2ioda_insitu_profile_glider
59/64 Test #1995: test_gdasapp_bufr2ioda_insitu_profile_glider ........................***Failed    0.40 sec
      Start 1996: test_gdasapp_bufr2ioda_insitu_profile_tesac
60/64 Test #1996: test_gdasapp_bufr2ioda_insitu_profile_tesac .........................***Failed    0.40 sec
      Start 1997: test_gdasapp_bufr2ioda_insitu_profile_tropical
61/64 Test #1997: test_gdasapp_bufr2ioda_insitu_profile_tropical ......................***Failed    0.40 sec
      Start 1998: test_gdasapp_bufr2ioda_insitu_profile_xbtctd
62/64 Test #1998: test_gdasapp_bufr2ioda_insitu_profile_xbtctd ........................***Failed    0.41 sec
      Start 1999: test_gdasapp_bufr2ioda_insitu_surface_drifter
63/64 Test #1999: test_gdasapp_bufr2ioda_insitu_surface_drifter .......................***Failed    0.38 sec
      Start 2000: test_gdasapp_bufr2ioda_insitu_surface_trkob
64/64 Test #2000: test_gdasapp_bufr2ioda_insitu_surface_trkob .........................***Failed    0.38 sec

78% tests passed, 14 tests failed out of 64

Label Time Summary:
gdas-utils    =  28.91 sec*proc (14 tests)
manual        = 3352.89 sec*proc (18 tests)
script        =  28.91 sec*proc (14 tests)

Total Test time (real) = 4999.35 sec

The following tests FAILED:
        1964 - test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marinebmat_202103241800 (Failed)
        1965 - test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlinit_202103241800 (Failed)
        1966 - test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlvar_202103241800 (Failed)
        1967 - test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlchkpt_202103241800 (Failed)
        1968 - test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800 (Failed)
        1977 - test_gdasapp_WCDA-hyb-C48mx500_gdas_marineanlletkf_202103241800 (Failed)
        1993 - test_gdasapp_bufr2ioda_insitu_profile_argo (Failed)
        1994 - test_gdasapp_bufr2ioda_insitu_profile_bathy (Failed)
        1995 - test_gdasapp_bufr2ioda_insitu_profile_glider (Failed)
        1996 - test_gdasapp_bufr2ioda_insitu_profile_tesac (Failed)
        1997 - test_gdasapp_bufr2ioda_insitu_profile_tropical (Failed)
        1998 - test_gdasapp_bufr2ioda_insitu_profile_xbtctd (Failed)
        1999 - test_gdasapp_bufr2ioda_insitu_surface_drifter (Failed)
        2000 - test_gdasapp_bufr2ioda_insitu_surface_trkob (Failed)

Log files for failed marine 3DVar jobs are in /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build/gdas/test/gw-ci/WCDA-3DVAR-C48mx500/COMROOT/WCDA-3DVAR-C48mx500/logs/2021032418. The marine bmat job log file contains the message

12: Unable to find field metadata for: tocn
14: Unable to find field metadata for: tocn
 0: Unable to find field metadata for: tocn
 0: OOPS Ending   2024-11-11 16:49:33 (UTC+0000)
 1: Unable to find field metadata for: tocn
 2: Abort(1) on node 2 (rank 2 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2
 3: Abort(1) on node 3 (rank 3 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3
 4: Unable to find field metadata for: tocn
 5: Abort(1) on node 5 (rank 5 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 5

The appears to a model variable renaming issue. Correcting the bmat job may allow the subsequent marine jobs to successfully run to completion.

The log file for failed marine hyb job contains

  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/wxflow/attrdict.py", line 84, in __missing__
    raise KeyError(name)
KeyError: 'APRUN_MARINEANLLETKF'
+ JGLOBAL_MARINE_ANALYSIS_LETKF[1]: postamble JGLOBAL_MARINE_ANALYSIS_LETKF 1731346303 1
+ preamble.sh[70]: set +x

This error may indicate that it is premature to run the marine letkf ctest. This test may need updates from g-w PR #3401. If true, this again highlights the problem we face with GDASApp getting several development cycles ahead of g-w.

Tagging @guillaumevernieres , @AndrewEichmann-NOAA , and @apchoiCMD for help in debugging the marine DA and bufr2ioda_insitu failures.

@RussTreadon-NOAA
Copy link
Contributor Author

g-w CI update

Install g-w PR #2992 on Hercules. Specifically, g-w branch DavidNew-NOAA:feature/jcb-obsbias at a6fd65ad was installed. sorc/gdas.cd was replaced with GDASApp branch feature/resume_nightly at 4561ead.

The following g-w DA CI was configured and run

  1. C96C48_hybatmDA - GSI based atmospheric DA (prgsi)
  2. C96C48_ufs_hybatmDA - JEDI based atmospheric DA (prjedi)
  3. C96C48_hybatmaerosnowDA - GSI atmospheric DA, JEDI aerosol and snow DA (praero)
  4. C48mx500_3DVarAOWCDA - GSI atmospheric DA, JEDI marine DA (prwcda)

prgsi (1) and prjedi (2) successfully ran to completion

rocotostat /work/noaa/stmp/rtreadon/EXPDIR/prgsi_pr2992
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202112201800        Done    Nov 11 2024 18:50:22    Nov 11 2024 19:20:03
202112210000        Done    Nov 11 2024 18:50:22    Nov 11 2024 21:50:03
202112210600        Done    Nov 11 2024 18:50:22    Nov 11 2024 22:10:03

rocotostat /work/noaa/stmp/rtreadon/EXPDIR/prjedi_pr2992
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202402231800        Done    Nov 11 2024 18:50:23    Nov 11 2024 19:20:04
202402240000        Done    Nov 11 2024 18:50:23    Nov 11 2024 22:40:05
202402240600        Done    Nov 11 2024 18:50:23    Nov 11 2024 23:05:04

praero (3) and prwcda (4) encountered DEAD jobs which halted each parallel

rocotostat /work/noaa/stmp/rtreadon/EXPDIR/praero_pr2992
202112201800     enkfgdas_esnowrecen                     3152798                DEAD                   1         2          51.0

rocotostat /work/noaa/stmp/rtreadon/EXPDIR/prwcda_pr2992
202103241800         gdas_marinebmat                     3152661                DEAD                   1         2         114.0

The log files for the DEAD jobs are

  • praero: /work/noaa/stmp/rtreadon/COMROOT/praero_pr2992/logs/2021122018/enkfgdas_esnowrecen.log
  • prwcda: /work/noaa/stmp/rtreadon/COMROOT/prwcda_pr2992/logs/2021032418/gdas_marinebmat.log

@CoryMartin-NOAA
Copy link
Contributor

@DavidNew-NOAA any insights here?

@DavidNew-NOAA
Copy link
Collaborator

@CoryMartin-NOAA @jiaruidong2017 FV3-JEDI PR #1289 modified the FMS2 IO interface to ensure that for each dimension, a dimension variable is written. Which FV3-JEDI hash was used to generate this increment?

@RussTreadon-NOAA
Copy link
Contributor Author

@DavidNew-NOAA : The most recent tests reported in this issue use GDASApp feature/resume_nightly at 4561ead. sorc/fv3-jedi is at 52507de.

@DavidNew-NOAA
Copy link
Collaborator

Hmm, this is strange. I need to study the FMS2 IO code a bit and figure out why the block that writes dimension variables may not be activated

@DavidNew-NOAA
Copy link
Collaborator

I'm also confused as to why variable renaming would have caused this feature to fail

@RussTreadon-NOAA
Copy link
Contributor Author

The Model Variable Renaming Sprint may not be responsible for the enkfgdas_esnowrecen failure.

Updating JEDI hashes brings in a lot of other changes. For example, I think the gdas_marinebmat failure is due to SOCA PR #1082. This PR is not part of the Model Variable Renaming Sprint.

@DavidNew-NOAA
Copy link
Collaborator

I will build GW with the GDAS hash you mentioned and dig into this

@RussTreadon-NOAA
Copy link
Contributor Author

marinebmat failure
Make the following changes in working copy of /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd.

  1. parm/jcb-gdas/algorithm/marine/soca_diagb.yaml.j2
@@ -18,7 +18,7 @@ background error:
   type: incr

 variables:
-  name: [tocn, socn, uocn, vocn, hocn, ssh, cicen, hicen, hsnon, mom6_mld]
+  name: [sea_water_potential_temperature, sea_water_salinity, eastward_sea_water_velocity, northward_sea_water_velocity, sea_wat
er_cell_thickness, sea_surface_height_above_geoid, sea_ice_area_fraction, sea_ice_thickness, sea_ice_snow_thickness, mom6_mld]

 rescale: 2.0    # rescales the filtered std. dev. by "rescale"
 min sst: 0.0    # Added to sst bkg. err.
  1. parm/soca/fields_metadata.yaml
@@ -58,6 +58,11 @@
   io file: ocn
   io name: ave_ssh

+- name: mom6_mld
+  io file: ocn
+  io name: MLD
+  fill value: 0.0
+
 # --------------------------------------------------------------------------------------------------
 # ice state variables with no categories
 # --------------------------------------------------------------------------------------------------

With these changes in place, marinebmat got further but eventually died with

 0: Background:
 0:
 0:   Valid time: 2021-03-24T21:00:00Z
 0: sea_water_potential_temperature   min=   -1.902229   max=   30.965748   mean=   11.777422
 0:              sea_water_salinity   min=    0.000000   max=   40.043293   mean=   33.680997
 0:     eastward_sea_water_velocity   min=   -0.919190   max=    0.681972   mean=   -0.001670
 0:    northward_sea_water_velocity   min=   -0.547376   max=    0.909467   mean=    0.003174
 0:        sea_water_cell_thickness   min=    0.000000   max= 5416.733887   mean=  128.626722
 0:  sea_surface_height_above_geoid   min=   -1.976035   max=    0.900495   mean=   -0.310314
 0:           sea_ice_area_fraction   min=    0.000000   max=    1.000000   mean=    0.123549
 0:               sea_ice_thickness   min=    0.000000   max=    4.723974   mean=    0.171670
 0:          sea_ice_snow_thickness   min=    0.000000   max=    0.599296   mean=    0.022770
 0:                        mom6_mld   min=    2.703109   max= 2157.872314   mean=  103.642769
 0: ====================== build mesh connectivity
 0: ====================== start variance partitioning
 0: ====================== allocate std. dev. field set
 0: ====================== calculate layer depth
 0: Exception: FieldSet: cannot find field "hocn"  (/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.7.0/cache/build_stage/spack-stage-ecmwf-atlas-0.36.0-73uspevvnzkbkpib5iwnvt7rvwsb6ggl/spack-src/src/atlas/field/FieldSet.cc +81 field)

This is a confusing error message. grep -i hocn -r in the marinebmat job run directory does not return any hits.

I am unable to examine work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.7.0/cache/build_stage/spack-stage-ecmwf-atlas-0.36.0-73uspevvnzkbkpib5iwnvt7rvwsb6ggl/spack-src/src/atlas/field/FieldSet.cc due to permission restrictions.

Not sure where to go from here. Any suggestions @guillaumevernieres or @AndrewEichmann-NOAA ?

@RussTreadon-NOAA
Copy link
Contributor Author

@apchoiCMD , the test_gdasapp_bufr2ioda_insitu* failures on Hercules are due an error in the PYTHONPATH for these tests.
ctest -VV revealed that these tests run with

`PYTHONPATH=/work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/bundle/gdas/build/lib/python3.10/`

This is not correct. The GDASApp build directory contains PYTHONPATH=/work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/bundle/gdas/build/lib/python3.7/

I manually changed the PYTHONPATH in GDASApp to build/gdas/test/marine/CTestTestfile.cmake to python3.7. After this the ctests pass

(gdasapp) hercules-login-4:/work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build$ ctest -R test_gdasapp_bufr2ioda_insitu
Test project /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build
    Start 1993: test_gdasapp_bufr2ioda_insitu_profile_argo
1/8 Test #1993: test_gdasapp_bufr2ioda_insitu_profile_argo .......   Passed    4.00 sec
    Start 1994: test_gdasapp_bufr2ioda_insitu_profile_bathy
2/8 Test #1994: test_gdasapp_bufr2ioda_insitu_profile_bathy ......   Passed    0.93 sec
    Start 1995: test_gdasapp_bufr2ioda_insitu_profile_glider
3/8 Test #1995: test_gdasapp_bufr2ioda_insitu_profile_glider .....   Passed    1.81 sec
    Start 1996: test_gdasapp_bufr2ioda_insitu_profile_tesac
4/8 Test #1996: test_gdasapp_bufr2ioda_insitu_profile_tesac ......   Passed    3.94 sec
    Start 1997: test_gdasapp_bufr2ioda_insitu_profile_tropical
5/8 Test #1997: test_gdasapp_bufr2ioda_insitu_profile_tropical ...   Passed    1.18 sec
    Start 1998: test_gdasapp_bufr2ioda_insitu_profile_xbtctd
6/8 Test #1998: test_gdasapp_bufr2ioda_insitu_profile_xbtctd .....   Passed    0.94 sec
    Start 1999: test_gdasapp_bufr2ioda_insitu_surface_drifter
7/8 Test #1999: test_gdasapp_bufr2ioda_insitu_surface_drifter ....   Passed    1.12 sec
    Start 2000: test_gdasapp_bufr2ioda_insitu_surface_trkob
8/8 Test #2000: test_gdasapp_bufr2ioda_insitu_surface_trkob ......   Passed    1.00 sec

100% tests passed, 0 tests failed out of 8

Total Test time (real) =  15.26 sec

GDASApp test/marine/CMakeLists.txt is hardwired to python3.10/

set(PYIODACONV_DIR "${PROJECT_SOURCE_DIR}/build/lib/python3.10/")

Why does test/marine/CMakeLists.txt hardwire PYIODACONV_DIR to a specific python version. Can cmake determine the python version during the configure and append it to the path?

@apchoiCMD
Copy link
Collaborator

apchoiCMD commented Nov 12, 2024

@RussTreadon-NOAA Thanks for letting us know- I will let @givelberg know what is going on the inside of CMakeList.txt-

@apchoiCMD
Copy link
Collaborator

@RussTreadon-NOAA Thanks for letting us know- I will let @givelberg know what is going on the inside of CMakeList.txt-

Made a quick chat with @givelberg and I expect that he will work on it-

@guillaumevernieres
Copy link
Contributor

I'll try to start on this before the end of the week @RussTreadon-NOAA . The issue you report above is coming from code that needs to be updated in the gdasapp.

@RussTreadon-NOAA
Copy link
Contributor Author

Thank you @guillaumevernieres . I never thought to look at GDASApp code. I now see that utils/soca/gdas_soca_diagb.h references hocn

@DavidNew-NOAA
Copy link
Collaborator

DavidNew-NOAA commented Nov 13, 2024

@RussTreadon-NOAA I ran C96C48_hybatmaerosnowDA on Hercules with GDASApp hash 4561ead and G-W branch feature/jcb-obsbias, and this was the contents of /work/noaa/da/dnew/COMROOT/test-jedi/gdas.20211220/18/analysis/snow/snowinc.20211220.180000.sfc_data.tile1.nc from running gdas_snowanl:

netcdf snowinc.20211220.180000.sfc_data.tile1 {
dimensions:
	xaxis_1 = 96 ;
	yaxis_1 = 96 ;
	Time = UNLIMITED ; // (1 currently)
variables:
	double xaxis_1(xaxis_1) ;
		xaxis_1:long_name = "xaxis_1\000?" ;
		xaxis_1:units = "none" ;
		xaxis_1:cartesian_axis = "X\000\000\000Fake Latit" ;
	double yaxis_1(yaxis_1) ;
		yaxis_1:long_name = "yaxis_1\000)" ;
		yaxis_1:units = "none" ;
		yaxis_1:cartesian_axis = "Y\000\000\000positive" ;
	double Time(Time) ;
		Time:long_name = "Time\000\000\000\000t" ;
		Time:units = "time" ;
		Time:cartesian_axis = "T\000\000\000descriptio" ;
	double snodl(Time, yaxis_1, xaxis_1) ;
		snodl:long_name = "totalSnow" ;
		snodl:units = "mm]\t?" ;
		snodl:checksum = "A4A145EF5554C960" ;
	double vtype(Time, yaxis_1, xaxis_1) ;
		vtype:long_name = "vtype\024" ;
		vtype:units = "none?" ;
		vtype:checksum = "               0" ;
	double slmsk(Time, yaxis_1, xaxis_1) ;
		slmsk:long_name = "slmsk\024" ;
		slmsk:units = "none?" ;
		slmsk:checksum = "               0" ;

// global attributes:
		:NumFilesInSet = 1 ;
}

I'm realized that I introduced a bug in FV3-JEDI PR #1289, which is why the attribute names are so strange. I just created FV3-JEDI PR #1304 to fix it, and I get the following output:

netcdf snowinc.20211220.180000.sfc_data.tile1 {
dimensions:
	xaxis_1 = 96 ;
	yaxis_1 = 96 ;
	Time = UNLIMITED ; // (1 currently)
variables:
	double xaxis_1(xaxis_1) ;
		xaxis_1:long_name = "xaxis_1" ;
		xaxis_1:units = "none" ;
		xaxis_1:cartesian_axis = "X" ;
	double yaxis_1(yaxis_1) ;
		yaxis_1:long_name = "yaxis_1" ;
		yaxis_1:units = "none" ;
		yaxis_1:cartesian_axis = "Y" ;
	double Time(Time) ;
		Time:long_name = "Time" ;
		Time:units = "time level" ;
		Time:cartesian_axis = "T" ;
	double snodl(Time, yaxis_1, xaxis_1) ;
		snodl:long_name = "totalSnowDepth" ;
		snodl:units = "mm" ;
		snodl:checksum = "A4A145EF5554C960" ;
	double vtype(Time, yaxis_1, xaxis_1) ;
		vtype:long_name = "vtype" ;
		vtype:units = "none" ;
		vtype:checksum = "               0" ;
	double slmsk(Time, yaxis_1, xaxis_1) ;
		slmsk:long_name = "slmsk" ;
		slmsk:units = "none" ;
		slmsk:checksum = "               0" ;

// global attributes:
		:NumFilesInSet = 1 ;
}

Either way, I'm getting variables associated with each axis. I'm not sure why your run is missing these variables. Perhaps you can double check your hash.

However, enkfgdas_esnowrecen fails with the same error.

@RussTreadon-NOAA
Copy link
Contributor Author

With changes from fv3-jedi PR #1304, job enkfgdas_esnowrecen after rerunning gdas_snowanl.

Thanks @DavidNew-NOAA !

@RussTreadon-NOAA
Copy link
Contributor Author

g-w DA CI update

With inclusion of src/fv3jedi/IO/FV3Restart/fv3jedi_io_fms2_mod.f90 from fv3-jedi PR #1304, C96C48_hybatmaerosnowDA is able to successfully run to completion

rocotostat /work/noaa/stmp/rtreadon/EXPDIR/praero_pr2992
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202112201200        Done    Nov 11 2024 18:50:24    Nov 11 2024 19:20:05
202112201800        Done    Nov 11 2024 18:50:24    Nov 13 2024 04:30:03
202112210000        Done    Nov 11 2024 18:50:24    Nov 13 2024 11:41:42

C96C48_ufs_hybatmDA already runs to completion.

rocotostat /work/noaa/stmp/rtreadon/EXPDIR/prjedi_pr2992
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202402231800        Done    Nov 11 2024 18:50:23    Nov 11 2024 19:20:04
202402240000        Done    Nov 11 2024 18:50:23    Nov 11 2024 22:40:05
202402240600        Done    Nov 11 2024 18:50:23    Nov 11 2024 23:05:04

Progress is being made on C48mx500_3DVarAOWCDA.

Variables names in utils/soca/gdas_soca_diagb.h have been updated. gdas_soca_diagb.x runs to completion in gdas_marinebmat. Now gdas_soca_setcorscales.x aborts with

 0: Exception: Could not find "sea_surface_height_above_geoid" in Configuration
 1: Exception: Could not find "sea_surface_height_above_geoid" in Configuration

Previously the code aborted with

 1: Unable to find field metadata for: ssh
 4: Unable to find field metadata for: ssh
 5: Unable to find field metadata for: ssh
 6: Abort(1) on node 6 (rank 6 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 6
 7: Abort(1) on node 7 (rank 7 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 7

ssh was replaced with sea_surface_height_above_geoid in parm/jcb-gdas/algorithm/marine/soca_setcorscales.yaml.j2

@@ -4,7 +4,7 @@ resolution:

 date: "0001-01-01T00:00:00Z"

-corr variables: [ssh]
+corr variables: [sea_surface_height_above_geoid]

 scales:
   vert layers: 5 # in units of layer

Apparently this change is not sufficient. Investigation continues to determine what other changes are needed to get gdas_soca_setcorscales.x to complete in gdas_marinebmat.

Any pointers @guillaumevernieres as to what / where to modify for gdas_soca_setcorscales.x? It's possible that executables or yamls upstream of gdas_soca_setcorscales.x are not correctly configured.

@RussTreadon-NOAA
Copy link
Contributor Author

Found combination of changes in parm/jcb-gdas

        modified:   algorithm/marine/soca_diagb.yaml.j2
        modified:   algorithm/marine/soca_parameters_diffusion_hz.yaml.j2
        modified:   algorithm/marine/soca_parameters_diffusion_vt.yaml.j2
        modified:   algorithm/marine/soca_setcorscales.yaml.j2

which allow gdas_soca_setcorscales.x and gdas_marinebmat to run to completion. gdas_marineanlint successfully ran. Now working through failures in gdas_marineanlvar

@DavidNew-NOAA
Copy link
Collaborator

@RussTreadon-NOAA FV3-JEDI PR #1304 is merged into develop

@RussTreadon-NOAA
Copy link
Contributor Author

gdas_marinevar now works. Failures in gdas_marineanlchkpt are being worked through. All failures thus far have been related to the variable name changes in soca PR #1082.

Note: Our init, run, finalize approach can be problematic at times. Case in point: The yamls used in gdas_marineanlchkpt are created in gdas_marineanlinit. While I can locally edit yamls in the run directory, the true test is to edit yamls in HOMEgfs/sorc/gdas.cd and then rerun init, var, and chkpt. This isn't hard. It's just tedious.

@RussTreadon-NOAA
Copy link
Contributor Author

gdas_marineanlchkpt now runs to completion. Given this, rerunning test_gdasapp_WCDA-3DVAR-C48mx500 suite of ctests to ensure each passes.

@danholdaway
Copy link
Contributor

Thanks you for this effort Russ.

@RussTreadon-NOAA
Copy link
Contributor Author

@AndrewEichmann-NOAA , I updated feature/resume_nightly with GDASApp develop. This brought in changes from #1352. Now g-w gdas_marinefinal fails with

0: ========= Processing /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build/gdas/test/gw-ci/../../test/gw-ci/WCDA-3DVAR-C48mx500/COMROOT/WCDA-3DVAR-C48mx500/gdas.20210324/18//analysis/ocean/diags/insitu_surface_trkob.2021032418.nc4          date: 2021032418
0: insitu_surface_trkob.2021032418.nc4: read database from /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build/gdas/test/gw-ci/../../test/gw-ci/WCDA-3DVAR-C48mx500/COMROOT/WCDA-3DVAR-C48mx500/gdas.20210324/18//analysis/ocean/diags/insitu_surface_trkob.2021032418.nc4 (io pool size: 1)
0: insitu_surface_trkob.2021032418.nc4 processed vars: 2 Variables: seaSurfaceSalinity, seaSurfaceTemperature
0: insitu_surface_trkob.2021032418.nc4 assimilated vars: 1 Variables: seaSurfaceSalinity
0: nlocs =863
0: Exception:   Reason: An exception occurred inside ioda while opening a variable.
0:      name:   ombg/seaSurfaceSalinity
0:      source_column:  0
0:      source_filename:        /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/bundle/ioda/src/engines/ioda/src/ioda/Has_Variables.cpp

Is this failure possibly related to #1352?

@RussTreadon-NOAA
Copy link
Contributor Author

GDASApp PR #1374 modifies test/marine/CMakeLists.txt such that the correct python version is set for test_gdasapp_bufr2ioda_insitu*. With this change in place all test_gdasapp_bufr2ioda_insitu* pass

Test project /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build
    Start 1993: test_gdasapp_bufr2ioda_insitu_profile_argo
1/8 Test #1993: test_gdasapp_bufr2ioda_insitu_profile_argo .......   Passed   52.78 sec
    Start 1994: test_gdasapp_bufr2ioda_insitu_profile_bathy
2/8 Test #1994: test_gdasapp_bufr2ioda_insitu_profile_bathy ......   Passed    3.72 sec
    Start 1995: test_gdasapp_bufr2ioda_insitu_profile_glider
3/8 Test #1995: test_gdasapp_bufr2ioda_insitu_profile_glider .....   Passed    3.62 sec
    Start 1996: test_gdasapp_bufr2ioda_insitu_profile_tesac
4/8 Test #1996: test_gdasapp_bufr2ioda_insitu_profile_tesac ......   Passed    5.71 sec
    Start 1997: test_gdasapp_bufr2ioda_insitu_profile_tropical
5/8 Test #1997: test_gdasapp_bufr2ioda_insitu_profile_tropical ...   Passed    3.33 sec
    Start 1998: test_gdasapp_bufr2ioda_insitu_profile_xbtctd
6/8 Test #1998: test_gdasapp_bufr2ioda_insitu_profile_xbtctd .....   Passed    2.62 sec
    Start 1999: test_gdasapp_bufr2ioda_insitu_surface_drifter
7/8 Test #1999: test_gdasapp_bufr2ioda_insitu_surface_drifter ....   Passed    2.41 sec
    Start 2000: test_gdasapp_bufr2ioda_insitu_surface_trkob
8/8 Test #2000: test_gdasapp_bufr2ioda_insitu_surface_trkob ......   Passed    2.86 sec

100% tests passed, 0 tests failed out of 8

Total Test time (real) =  78.83 sec

@RussTreadon-NOAA
Copy link
Contributor Author

@AndrewEichmann-NOAA , I rolled back the change to parm/soca/obs/obs_list.yaml from #1352 and reran the test_gdasapp_WCDA-3DVAR-C48mx500 suite of tests. All passed

Test project /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build
    Start 1960: test_gdasapp_WCDA-3DVAR-C48mx500
1/9 Test #1960: test_gdasapp_WCDA-3DVAR-C48mx500 ....................................   Passed   32.22 sec
    Start 1961: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_stage_ic_202103241200
2/9 Test #1961: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_stage_ic_202103241200 .........   Passed   58.00 sec
    Start 1962: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_fcst_seg0_202103241200
3/9 Test #1962: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_fcst_seg0_202103241200 ........   Passed  408.83 sec
    Start 1963: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_prepoceanobs_202103241800
4/9 Test #1963: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_prepoceanobs_202103241800 .....   Passed  266.16 sec
    Start 1964: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marinebmat_202103241800
5/9 Test #1964: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marinebmat_202103241800 .......   Passed  168.43 sec
    Start 1965: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlinit_202103241800
6/9 Test #1965: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlinit_202103241800 ....   Passed  111.26 sec
    Start 1966: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlvar_202103241800
7/9 Test #1966: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlvar_202103241800 .....   Passed  168.09 sec
    Start 1967: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlchkpt_202103241800
8/9 Test #1967: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlchkpt_202103241800 ...   Passed  180.57 sec
    Start 1968: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800
9/9 Test #1968: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800 ...   Passed   68.63 sec

100% tests passed, 0 tests failed out of 9

Label Time Summary:
manual    = 1462.19 sec*proc (9 tests)

Total Test time (real) = 1463.94 sec

Does failure of test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800 with the PR #1352 parm/soca/obs/obs_list.yaml make sense?

@RussTreadon-NOAA
Copy link
Contributor Author

@guillaumevernieres and @AndrewEichmann-NOAA : test_gdasapp_WCDA-hyb-C48mx500_gdas_marineanlletkf_202103241800 fails with the error

^[[38;5;39m2024-11-14 03:11:43,830 - DEBUG    - marine_da_utils: Executing srun -l --export=ALL --hint=nomultithread -n 16 /work/noaa/da/rtreadon/git/global-workflow/pr2992/exec/gdas_soca_gridgen.x /work/noaa/da/rtreadon/git/global-workflow/pr2992/parm/gdas/soca/gridgen/gridgen.yaml^[[0m
 2: Exception: Cannot open /work/noaa/da/rtreadon/git/global-workflow/pr2992/parm/gdas/soca/gridgen/gridgen.yaml  (No such file or directory)

There is no g-w directory parm/gdas/soca/gridgen. I checked g-w PR #3041. I do not see any change to sorc/link_workflow.sh to add this directory to parm/gdas/soca.

Should test_gdasapp_WCDA-hyb-C48mx500_gdas_marineanlletkf_202103241800 successfully run in GDASApp develop with g-w develop? Does this test work when built and run inside g-w PR #3041](NOAA-EMC/global-workflow#3041)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants