Skip to content

Commit 2cb0316

Browse files
Don't mention 14-year-old version info in ?alloc.col (#6887)
* Don't mention 14-year-old version info in ?alloc.col * Also omit mention of ancient R version
1 parent 77b4394 commit 2cb0316

File tree

1 file changed

+16
-5
lines changed

1 file changed

+16
-5
lines changed

man/truelength.Rd

+16-5
Original file line numberDiff line numberDiff line change
@@ -22,16 +22,27 @@ alloc.col(DT,
2222
\item{verbose}{ Output status and information. }
2323
}
2424
\details{
25-
When adding columns by reference using \code{:=}, we \emph{could} simply create a new column list vector (one longer) and memcpy over the old vector, with no copy of the column vectors themselves. That requires negligible use of space and time, and is what v1.7.2 did. However, that copy of the list vector of column pointers only (but not the columns themselves), a \emph{shallow copy}, resulted in inconsistent behaviour in some circumstances. So, as from v1.7.3 data.table over allocates the list vector of column pointers so that columns can be added fully by reference, consistently.
25+
When adding columns by reference using \code{:=}, we \emph{could} simply create a new column list vector (one longer) and memcpy over the old vector,
26+
with no copy of the column vectors themselves. That requires negligible use of space and time, and long ago we did just that.
27+
However, that copy of the list vector of column pointers only (but not the columns themselves), a \emph{shallow copy}, resulted in inconsistent behaviour
28+
in some circumstances. Therefore, data.table over-allocates the list vector of column pointers so that columns can be added fully by reference, consistently.
2629

27-
When the allocated column pointer slots are used up, to add a new column \code{data.table} must reallocate that vector. If two or more variables are bound to the same data.table this shallow copy may or may not be desirable, but we don't think this will be a problem very often (more discussion may be required on data.table issue tracker). Setting \code{options(datatable.verbose=TRUE)} includes messages if and when a shallow copy is taken. To avoid shallow copies there are several options: use \code{\link{copy}} to make a deep copy first, use \code{setalloccol} to reallocate in advance, or, change the default allocation rule (perhaps in your .Rprofile); e.g., \code{options(datatable.alloccol=10000L)}.
30+
When the allocated column pointer slots are used up, to add a new column \code{data.table} must reallocate that vector. If two or more variables are bound to
31+
the same data.table this shallow copy may or may not be desirable, but we don't think this will be a problem very often (more discussion may be required on
32+
data.table issue tracker). Setting \code{options(datatable.verbose=TRUE)} includes messages if and when a shallow copy is taken. To avoid shallow copies there
33+
are several options: use \code{\link{copy}} to make a deep copy first, use \code{setalloccol} to reallocate in advance, or, change the default allocation rule
34+
(perhaps in your .Rprofile); e.g., \code{options(datatable.alloccol=10000L)}.
2835
29-
Please note : over allocation of the column pointer vector is not for efficiency \emph{per se}; it is so that \code{:=} can add columns by reference without a shallow copy.
36+
Please note: over-allocation of the column pointer vector is not for efficiency \emph{per se}; it is so that \code{:=} can add columns by reference without a shallow copy.
3037
}
3138
\value{
32-
\code{truelength(x)} returns the length of the vector allocated in memory. \code{length(x)} of those items are in use. Currently, it is just the list vector of column pointers that is over-allocated (i.e. \code{truelength(DT)}), not the column vectors themselves, which would in future allow fast row \code{insert()}. For tables loaded from disk however, \code{truelength} is 0 in \R 2.14.0+ (and random in \R <= 2.13.2), which is perhaps unexpected. \code{data.table} detects this state and over-allocates the loaded \code{data.table} when the next column addition occurs. All other operations on \code{data.table} (such as fast grouping and joins) do not need \code{truelength}.
39+
\code{truelength(x)} returns the length of the vector allocated in memory. \code{length(x)} of those items are in use. Currently, it is just the list vector of column
40+
pointers that is over-allocated (i.e. \code{truelength(DT)}), not the column vectors themselves, which would in future allow fast row \code{insert()}. For tables loaded
41+
from disk however, \code{truelength} is 0, which is perhaps unexpected. \code{data.table} detects this state and over-allocates the loaded \code{data.table} when the
42+
next column addition occurs. All other operations on \code{data.table} (such as fast grouping and joins) do not need \code{truelength}.
3343
34-
\code{setalloccol} \emph{reallocates} \code{DT} by reference. This may be useful for efficiency if you know you are about to going to add a lot of columns in a loop. It also returns the new \code{DT}, for convenience in compound queries.
44+
\code{setalloccol} \emph{reallocates} \code{DT} by reference. This may be useful for efficiency if you know you are about to going to add a lot of columns in a loop.
45+
It also returns the new \code{DT}, for convenience in compound queries.
3546
}
3647
\seealso{ \code{\link{copy}} }
3748
\examples{

0 commit comments

Comments
 (0)