-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathmerge_eddy.Rd
115 lines (106 loc) · 4.65 KB
/
merge_eddy.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Data_handling.R
\name{merge_eddy}
\alias{merge_eddy}
\title{Merge Regular Date-Time Sequence and Data Frames}
\usage{
merge_eddy(
x,
start = NULL,
end = NULL,
check_dupl = TRUE,
interval = NULL,
format = "\%Y-\%m-\%d \%H:\%M",
tz = "GMT"
)
}
\arguments{
\item{x}{List of data frames, each with \code{"timestamp"} column of class
\code{"POSIXt"}. Optionally with attributes \code{varnames} and
\code{units} for each column.}
\item{start, end}{A value specifying the first (last) value of the generated
date-time sequence. If \code{NULL}, \code{\link{min}} (\code{\link{max}})
is taken across the values in \code{"timestamp"} columns across \code{x}
elements. If numeric, the value specifies the year for which the first
(last) date-time value will be generated, considering given time
\code{interval} and convention of assigning of measured records to the end
of the time interval. Otherwise, character representation of specific half
hour is expected with given \code{format} and \code{tz}.}
\item{check_dupl}{A logical value specifying whether rows with duplicated
date-time values checked across \code{x} elements should be excluded before
merging.}
\item{interval}{A numeric value specifying the time interval (in seconds) of
the generated date-time sequence.}
\item{format}{A character string. Format of \code{start} (\code{end}) if
provided as a character string.The default \code{\link[=strptime]{format}}
is \code{"\%Y-\%m-\%d \%H:\%M"}.}
\item{tz}{A time zone (see \code{\link{time zones}}) specification to be used
for the conversion of \code{start} (\code{end}) if provided as a character
string.}
}
\value{
A data frame with attributes \code{varnames} and \code{units} for
each column, containing date-time information in column \code{"timestamp"}.
}
\description{
Merge generated regular date-time sequence with single or multiple data
frames.
}
\details{
The primary purpose of \code{merge_eddy} is to combine chunks of data
vertically along their column \code{"timestamp"} with date-time information.
This \code{"timestamp"} is expected to be regular with given time
\code{interval}. Resulting data frame contains added rows with expected
date-time values that were missing in \code{"timestamp"} column, followed by
\code{NA}s. In case that \code{check_dupl = TRUE} and \code{"timestamp"}
values across \code{x} elements overlap, detected duplicated rows are removed
(the order in which duplicates are evaluated depends on the order of \code{x}
elements). A special case when \code{x} has only one element allows to fill
missing date-time values in \code{"timestamp"} column of given data frame.
Storage mode of \code{"timestamp"} column is set to be integer instead
of double. This simplifies application of \code{\link{round_df}} but could
lead to unexpected behavior if the date-time information is expected to
resolve fractional seconds.
The list of data frames, each with column \code{"timestamp"}, is sequentially
\code{\link{merge}}d using \code{\link{Reduce}}. A \emph{(full) outer join},
i.e. \code{merge(..., all = TRUE)}, is performed to keep all columns of
\code{x} elements. The order of \code{x} elements can affect the result.
Duplicated column names within \code{x} elements are corrected using
\code{\link{make.unique}}. The merged data frame is then merged on the
validated \code{"timestamp"} column that can be either automatically
extracted from \code{x} or manually specified.
For horizontal merging (adding columns instead of rows) \code{check_dupl =
FALSE} must be set but simple \code{\link{merge}} could be preferred.
Combination of vertical and horizontal merging should be avoided as it
depends on the order of \code{x} elements and can lead to row duplication.
Instead, data chunks from different data sources should be first separately
vertically merged and then merged horizontally in a following step.
}
\examples{
set.seed(123)
n <- 20 # number of half-hourly records in one non-leap year
tstamp <- seq(c(ISOdate(2021,3,20)), by = "30 mins", length.out = n)
x <- data.frame(
timestamp = tstamp,
H = rf(n, 1, 2, 1),
LE = rf(n, 1, 2, 1),
qc_flag = sample(c(0:2, NA), n, replace = TRUE)
)
openeddy::varnames(x) <- c("timestamp", "sensible heat", "latent heat",
"quality flag")
openeddy::units(x) <- c("-", "W m-2", "W m-2", "-")
str(x)
y1 <- ex(x, 1:10)
y2 <- ex(x, 11:20)
y <- merge_eddy(list(y1, y2))
str(y)
attributes(y$timestamp)
typeof(y$timestamp)
# Duplicated rows and different number of columns
z1 <- ex(x, 8:20, 1:3)
z <- merge_eddy(list(y1, z1))
}
\seealso{
\code{\link{merge}}, \code{\link{Reduce}}, \code{\link{strptime}},
\code{\link{time zones}}, \code{\link{make.unique}}
}