Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 107 additions & 0 deletions pipeline/metadata/L1_metadata/readme_files/README_v2-1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
COMPASS-FME Level 1 sensor data
Version: [VERSION]
Date: [DATESTAMP]
Observations: [OBSERVATIONS]
Git commit: [GIT_COMMIT]

DESCRIPTION
—----------------------------------
This is the Level 1 (L1) data release [VERSION] for COMPASS-FME
environmental sensors located at field sites in the Lake Erie and
Chesapeake Bay regions. L1 data are close to raw, but are
units-transformed and have flags (out of instrument bounds, out of
service, and possible outlier) added. Duplicates and missing data are
removed but otherwise these data are not filtered, and have not been
subject to any additional algorithmic or human QA/QC. Any scientific
analyses of L1 data should be performed with care.

CONTACT
—----------------------------------
Project: https://compass.pnnl.gov
Data lead: Stephanie Pennington, [email protected]

HOW TO CITE THESE DATA
—----------------------------------
Pennington, Bittencourt Peixoto, Bond-Lamberty, Cheng, LaGorga,
Machado-Silva, Peresta, Phillips, Regier, Rich, Sandoval, Stearns, Ward,
Wilson, Weintraub, Megonigal, and Bailey (2024). COMPASS-FME Level 1
Sensor Data (version [VERSION] released [DATESTAMP]), downloaded
YYYY-MM-DD, https://compass.pnnl.gov.

DATA STRUCTURE
—----------------------------------
Data are organized into {SITE}_{YEAR} folders, with comma-separated
value (CSV) files in each folder for each plot and output variable at
that site.

The data file naming convention is
{SITE}_{PLOT}_{DATE RANGE}_{OUTPUT VARIABLE}_L1_{VERSION}.csv

Sites include CRC (Crane Creek), DLG (DELUGE experiment), GCW (GCReW),
GWI (Goodwin Island), MSM (Moneystump Marsh), OWC (Old Woman Creek), PTR
(Portage River), SWH (Sweet Hall Marsh), and TMP (TEMPEST experiment).
See site-specific metadata files in each folder.

Data are normally logged at a 15 minute interval, but this is **not**
guaranteed. In particular, there may be:
* Missing data points (due to offline sensors, for example);
* Multiple, numerically different observations for a given timestamp (rare);
* Less-than-15-minute time intervals, in particular during TEMPEST flood events.

DATA VERSIONS
—----------------------------------
COMPASS-FME L1 data releases use semantic versioning (https://semver.org).
This means that given a version number MAJOR-MINOR-PATCH, we increment the:
* MAJOR number when we make incompatible changes to the data structure;
* MINOR version when we add data in a backwards-compatible manner; and
* PATCH version when we fix documentation and the like.

Importantly, “backward compatible” does NOT mean that the data don’t
change, only that your scripts using L1 data will probably still work.

CHANGELOG
—----------------------------------
Version 2-1 release [DATESTAMP]
* DELUGE (DLG) data are here! Well, the TEROS measurements anyway
* TEMPEST AquaTROLL600 pressure is now corrected for atmospheric pressure; see site files
* The `sonde-depth` variable has been removed, as it was unreliable and misleading
* New soil redox data streams at all CB sites
* Plot elevation information is now available is the site metadata files

Version 2-0 released 2025-07-07
* Covers late 2019 through June 2025 for TEMPEST and all synoptic sites
* Data files are now annual and single-variable, rather than monthly and multi-variable
* "Source_file" (giving hash of original datalogger file) now listed in data files, replacing ID column
* "F_MAD" outlier flag, based on median absolute deviation, added
* Data plots now include out-of-bounds indicators and informative axis labels
* Back-corrected two years of corrupted AQ600 files at TEMPEST; thanks to SJW
* Back-corrected 2022-2024 TEMPEST AquaTROLL600s unvented `gw-pressure` values
* Minor data fixes: CD8 sapflux sensor, wx_par_tot15 calculation, MSM Buoy time zone, sapflux sensor depth, ClimaVue VP units
* New code examples, documentation improvements, and more
* Many backend improvements; see https://github.com/COMPASS-DOE/sensor-data-pipeline/issues/244

Version 1-2 released 2025-02-14
* Covers late 2019 through December 2024 for TEMPEST and all synoptic sites
* All sonde (EXO) data now appear in their own "OW" (open water) plot
* The TEMPEST (TMP) folder README files now include detailed information on flood timings, volumes, etc.

Version 1-1 released 2024-08-05
* Covers late 2019 through July 2024 for TEMPEST and all synoptic sites
* TEMPEST redox data now available starting April 2024
* Now includes high-frequency (1 and 5 min interval) data from TEMPEST floods

Version 1-0 released 2024-05-29
* Covers late 2019 through April 2024 for TEMPEST and all synoptic sites
* Restructured for ease of use, with metadata (location, sensor ID, etc) in separate columns
* SWH plot naming reworked for new upland plot
* Mirroring TMP C to GCW UP for synoptic site consistency
* GCReW weather station variables are now included in GCW-W
* Many fixes to variable units and bounds
* Out-of-service is valid for AquaTROLL and EXO

Version 0-9 released 2024-01-22
* Preliminary release covering all synoptic site and TEMPEST data collected to date
* Units and bounds (and thus OOB flags) are missing for some ClimaVue, AquaTROLL, and Sonde variables
* Some research_name assignments for may incorrect for ClimaVue
* Out-of-service is working only for AquaTROLL
* No TEMPEST 1- or 5-minute data included
78 changes: 78 additions & 0 deletions pipeline/metadata/L2_metadata/readme_files/README_v2-1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
COMPASS-FME Level 2 sensor data
Version: [VERSION] (BETA)
Date: [DATESTAMP]
Observations: [OBSERVATIONS]
Git commit: [GIT_COMMIT]

DESCRIPTION
—----------------------------------
Level 2 (L2) data consist of sensor observations from the COMPASS-FME
synoptic sites, TEMPEST, and DELUGE. Compared to the L1 data, these are
more consistent (always 15-minute timestamps for the entire year);
better QA/QC’d (out of bounds, out of service, and extreme outlier
values are removed); and more complete, with a gap-filled time series
available alongside the main observations, and additional derived
(calculated) variables. L2 data are intended to be rapidly and easily
usable in analyses and simulations. However, algorithmic outlier
identification always carries the risk of removing valid data, and Level
1 data may be more suitable for analyses that focus on variability or
extreme events.

CONTACT
—----------------------------------
Project: https://compass.pnnl.gov
Data lead: Stephanie Pennington, [email protected]

HOW TO CITE THESE DATA
—----------------------------------
Pennington, Bittencourt Peixoto, Bond-Lamberty, Cheng, LaGorga,
Machado-Silva, Peresta, Phillips, Regier, Rich, Sandoval, Stearns, Ward,
Wilson, Weintraub, Megonigal, and Bailey (2024). COMPASS-FME Level 2
Sensor Data (version [VERSION] released [DATESTAMP]), downloaded
YYYY-MM-DD, https://compass.pnnl.gov.

DATA STRUCTURE
—----------------------------------
Data are organized into {SITE}_{YEAR} folders, with Parquet (a high
performance, space efficient format; see https://parquet.apache.org)
files in each folder for each plot and output variable at that site.

The data file naming convention is
{SITE}_{PLOT}_{YEAR}_{OUTPUT VARIABLE}_L2_{VERSION}.parquet

Sites include CRC (Crane Creek), DLG (DELUGE experiment), GCW (GCReW),
GWI (Goodwin Island), MSM (Moneystump Marsh), OWC (Old Woman Creek), PTR
(Portage River), SWH (Sweet Hall Marsh), and TMP (TEMPEST experiment).
See site-specific metadata files in each folder.

Data timestamps are every 15 minutes, from January 1 00:00 to December
31 23:45. The `N_avg` column indicates how many Level 1 values were averaged
to produce the L2 value. Short (typically <= 1 hour, although this
varies by variable) data gaps are filled by linear interpolation. If you
want to exclude these interpolated values, only use data where the `N_avg`
column is >= 1.

DATA VERSIONS
—----------------------------------
COMPASS-FME L2 data releases use semantic versioning (https://semver.org).
This means that given a version number MAJOR-MINOR-PATCH, we increment the:
* MAJOR number when we make incompatible changes to the data structure;
* MINOR version when we add data in a backwards-compatible manner; and
* PATCH version when we fix documentation and the like.

Importantly, “backward compatible” does NOT mean that the data don’t
change, only that your scripts using L2 data will probably still work.

CHANGELOG
—----------------------------------
Version 2-1 release [DATESTAMP]
* The Level 2 data format has been tweaked: TODO TODO <<<<-----------
* DELUGE (DLG) data are here! Well, the TEROS measurements anyway
* TEMPEST AquaTROLL600 pressure is now corrected for atmospheric pressure; see site files
* The `sonde-depth` variable has been removed, as it was unreliable and misleading
* New soil redox data streams at all CB sites
* Plot elevation information is now available is the site metadata files

Version 2-0 released 2025-07-07
* First release; covers late 2019 through June 2025 for TEMPEST and all synoptic sites
* This release has only a subset of the L1 output variables
2 changes: 1 addition & 1 deletion pipeline/metadata/site_files/DLG.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ The DELUGE (DLG) experiment is an ecosystem-scale flooding experiment
within the Crane Creek lacustuary region of southwestern Lake Erie, 9.25
miles (14.9 km) northwest of Oak Harbor, Ohio. This site is owned by the
USFWS and managed as a part of the Ottawa National WIldlife Refuge
complex. The ploits are located at 41.6153N, 83.2297W) and were
complex. The ploits are located at 41.6153N, 83.2297W and were
established in 2025.

NOTE: Meteorological data for DLG are available in the CRC-W files.
Expand Down