diff --git a/pipeline/metadata/L1_metadata/readme_files/README_v2-1.txt b/pipeline/metadata/L1_metadata/readme_files/README_v2-1.txt new file mode 100644 index 0000000..a6049ed --- /dev/null +++ b/pipeline/metadata/L1_metadata/readme_files/README_v2-1.txt @@ -0,0 +1,107 @@ +COMPASS-FME Level 1 sensor data +Version: [VERSION] +Date: [DATESTAMP] +Observations: [OBSERVATIONS] +Git commit: [GIT_COMMIT] + +DESCRIPTION +—---------------------------------- +This is the Level 1 (L1) data release [VERSION] for COMPASS-FME +environmental sensors located at field sites in the Lake Erie and +Chesapeake Bay regions. L1 data are close to raw, but are +units-transformed and have flags (out of instrument bounds, out of +service, and possible outlier) added. Duplicates and missing data are +removed but otherwise these data are not filtered, and have not been +subject to any additional algorithmic or human QA/QC. Any scientific +analyses of L1 data should be performed with care. + +CONTACT +—---------------------------------- +Project: https://compass.pnnl.gov +Data lead: Stephanie Pennington, stephanie.pennington@pnnl.gov + +HOW TO CITE THESE DATA +—---------------------------------- +Pennington, Bittencourt Peixoto, Bond-Lamberty, Cheng, LaGorga, +Machado-Silva, Peresta, Phillips, Regier, Rich, Sandoval, Stearns, Ward, +Wilson, Weintraub, Megonigal, and Bailey (2024). COMPASS-FME Level 1 +Sensor Data (version [VERSION] released [DATESTAMP]), downloaded +YYYY-MM-DD, https://compass.pnnl.gov. + +DATA STRUCTURE +—---------------------------------- +Data are organized into {SITE}_{YEAR} folders, with comma-separated +value (CSV) files in each folder for each plot and output variable at +that site. + +The data file naming convention is +{SITE}_{PLOT}_{DATE RANGE}_{OUTPUT VARIABLE}_L1_{VERSION}.csv + +Sites include CRC (Crane Creek), DLG (DELUGE experiment), GCW (GCReW), +GWI (Goodwin Island), MSM (Moneystump Marsh), OWC (Old Woman Creek), PTR +(Portage River), SWH (Sweet Hall Marsh), and TMP (TEMPEST experiment). +See site-specific metadata files in each folder. + +Data are normally logged at a 15 minute interval, but this is **not** +guaranteed. In particular, there may be: +* Missing data points (due to offline sensors, for example); +* Multiple, numerically different observations for a given timestamp (rare); +* Less-than-15-minute time intervals, in particular during TEMPEST flood events. + +DATA VERSIONS +—---------------------------------- +COMPASS-FME L1 data releases use semantic versioning (https://semver.org). +This means that given a version number MAJOR-MINOR-PATCH, we increment the: +* MAJOR number when we make incompatible changes to the data structure; +* MINOR version when we add data in a backwards-compatible manner; and +* PATCH version when we fix documentation and the like. + +Importantly, “backward compatible” does NOT mean that the data don’t +change, only that your scripts using L1 data will probably still work. + +CHANGELOG +—---------------------------------- +Version 2-1 release [DATESTAMP] +* DELUGE (DLG) data are here! Well, the TEROS measurements anyway +* TEMPEST AquaTROLL600 pressure is now corrected for atmospheric pressure; see site files +* The `sonde-depth` variable has been removed, as it was unreliable and misleading +* New soil redox data streams at all CB sites +* Plot elevation information is now available is the site metadata files + +Version 2-0 released 2025-07-07 +* Covers late 2019 through June 2025 for TEMPEST and all synoptic sites +* Data files are now annual and single-variable, rather than monthly and multi-variable +* "Source_file" (giving hash of original datalogger file) now listed in data files, replacing ID column +* "F_MAD" outlier flag, based on median absolute deviation, added +* Data plots now include out-of-bounds indicators and informative axis labels +* Back-corrected two years of corrupted AQ600 files at TEMPEST; thanks to SJW +* Back-corrected 2022-2024 TEMPEST AquaTROLL600s unvented `gw-pressure` values +* Minor data fixes: CD8 sapflux sensor, wx_par_tot15 calculation, MSM Buoy time zone, sapflux sensor depth, ClimaVue VP units +* New code examples, documentation improvements, and more +* Many backend improvements; see https://github.com/COMPASS-DOE/sensor-data-pipeline/issues/244 + +Version 1-2 released 2025-02-14 +* Covers late 2019 through December 2024 for TEMPEST and all synoptic sites +* All sonde (EXO) data now appear in their own "OW" (open water) plot +* The TEMPEST (TMP) folder README files now include detailed information on flood timings, volumes, etc. + +Version 1-1 released 2024-08-05 +* Covers late 2019 through July 2024 for TEMPEST and all synoptic sites +* TEMPEST redox data now available starting April 2024 +* Now includes high-frequency (1 and 5 min interval) data from TEMPEST floods + +Version 1-0 released 2024-05-29 +* Covers late 2019 through April 2024 for TEMPEST and all synoptic sites +* Restructured for ease of use, with metadata (location, sensor ID, etc) in separate columns +* SWH plot naming reworked for new upland plot +* Mirroring TMP C to GCW UP for synoptic site consistency +* GCReW weather station variables are now included in GCW-W +* Many fixes to variable units and bounds +* Out-of-service is valid for AquaTROLL and EXO + +Version 0-9 released 2024-01-22 +* Preliminary release covering all synoptic site and TEMPEST data collected to date +* Units and bounds (and thus OOB flags) are missing for some ClimaVue, AquaTROLL, and Sonde variables +* Some research_name assignments for may incorrect for ClimaVue +* Out-of-service is working only for AquaTROLL +* No TEMPEST 1- or 5-minute data included diff --git a/pipeline/metadata/L2_metadata/readme_files/README_v2-1.txt b/pipeline/metadata/L2_metadata/readme_files/README_v2-1.txt new file mode 100644 index 0000000..a83a5df --- /dev/null +++ b/pipeline/metadata/L2_metadata/readme_files/README_v2-1.txt @@ -0,0 +1,78 @@ +COMPASS-FME Level 2 sensor data +Version: [VERSION] (BETA) +Date: [DATESTAMP] +Observations: [OBSERVATIONS] +Git commit: [GIT_COMMIT] + +DESCRIPTION +—---------------------------------- +Level 2 (L2) data consist of sensor observations from the COMPASS-FME +synoptic sites, TEMPEST, and DELUGE. Compared to the L1 data, these are +more consistent (always 15-minute timestamps for the entire year); +better QA/QC’d (out of bounds, out of service, and extreme outlier +values are removed); and more complete, with a gap-filled time series +available alongside the main observations, and additional derived +(calculated) variables. L2 data are intended to be rapidly and easily +usable in analyses and simulations. However, algorithmic outlier +identification always carries the risk of removing valid data, and Level +1 data may be more suitable for analyses that focus on variability or +extreme events. + +CONTACT +—---------------------------------- +Project: https://compass.pnnl.gov +Data lead: Stephanie Pennington, stephanie.pennington@pnnl.gov + +HOW TO CITE THESE DATA +—---------------------------------- +Pennington, Bittencourt Peixoto, Bond-Lamberty, Cheng, LaGorga, +Machado-Silva, Peresta, Phillips, Regier, Rich, Sandoval, Stearns, Ward, +Wilson, Weintraub, Megonigal, and Bailey (2024). COMPASS-FME Level 2 +Sensor Data (version [VERSION] released [DATESTAMP]), downloaded +YYYY-MM-DD, https://compass.pnnl.gov. + +DATA STRUCTURE +—---------------------------------- +Data are organized into {SITE}_{YEAR} folders, with Parquet (a high +performance, space efficient format; see https://parquet.apache.org) +files in each folder for each plot and output variable at that site. + +The data file naming convention is +{SITE}_{PLOT}_{YEAR}_{OUTPUT VARIABLE}_L2_{VERSION}.parquet + +Sites include CRC (Crane Creek), DLG (DELUGE experiment), GCW (GCReW), +GWI (Goodwin Island), MSM (Moneystump Marsh), OWC (Old Woman Creek), PTR +(Portage River), SWH (Sweet Hall Marsh), and TMP (TEMPEST experiment). +See site-specific metadata files in each folder. + +Data timestamps are every 15 minutes, from January 1 00:00 to December +31 23:45. The `N_avg` column indicates how many Level 1 values were averaged +to produce the L2 value. Short (typically <= 1 hour, although this +varies by variable) data gaps are filled by linear interpolation. If you +want to exclude these interpolated values, only use data where the `N_avg` +column is >= 1. + +DATA VERSIONS +—---------------------------------- +COMPASS-FME L2 data releases use semantic versioning (https://semver.org). +This means that given a version number MAJOR-MINOR-PATCH, we increment the: +* MAJOR number when we make incompatible changes to the data structure; +* MINOR version when we add data in a backwards-compatible manner; and +* PATCH version when we fix documentation and the like. + +Importantly, “backward compatible” does NOT mean that the data don’t +change, only that your scripts using L2 data will probably still work. + +CHANGELOG +—---------------------------------- +Version 2-1 release [DATESTAMP] +* The Level 2 data format has been tweaked: TODO TODO <<<<----------- +* DELUGE (DLG) data are here! Well, the TEROS measurements anyway +* TEMPEST AquaTROLL600 pressure is now corrected for atmospheric pressure; see site files +* The `sonde-depth` variable has been removed, as it was unreliable and misleading +* New soil redox data streams at all CB sites +* Plot elevation information is now available is the site metadata files + +Version 2-0 released 2025-07-07 +* First release; covers late 2019 through June 2025 for TEMPEST and all synoptic sites +* This release has only a subset of the L1 output variables diff --git a/pipeline/metadata/site_files/DLG.txt b/pipeline/metadata/site_files/DLG.txt index d665e5a..18bbb7f 100644 --- a/pipeline/metadata/site_files/DLG.txt +++ b/pipeline/metadata/site_files/DLG.txt @@ -2,7 +2,7 @@ The DELUGE (DLG) experiment is an ecosystem-scale flooding experiment within the Crane Creek lacustuary region of southwestern Lake Erie, 9.25 miles (14.9 km) northwest of Oak Harbor, Ohio. This site is owned by the USFWS and managed as a part of the Ottawa National WIldlife Refuge -complex. The ploits are located at 41.6153N, 83.2297W) and were +complex. The ploits are located at 41.6153N, 83.2297W and were established in 2025. NOTE: Meteorological data for DLG are available in the CRC-W files.