-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathfile_formats.html
143 lines (143 loc) · 10.9 KB
/
file_formats.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<title>WiredTiger: File formats and compression</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript">
$(document).ready(initResizable);
$(window).load(resizeHeight);
</script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
<link href="wiredtiger.css" rel="stylesheet" type="text/css"/>
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td id="projectlogo"><a href="http://wiredtiger.com/"><img alt="Logo" src="LogoFinal-header.png" alt="WiredTiger" /></a></td>
<td style="padding-left: 0.5em;">
<div id="projectname">
 <span id="projectnumber">Version 2.0.1</span>
</div>
</td>
</tr>
</tbody>
</table>
</div>
<div class="banner">
<a href="https://github.com/wiredtiger/wiredtiger">Fork me on GitHub</a>
<a class="last" href="http://groups.google.com/group/wiredtiger-users">Join my user group</a>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.3.1 -->
<div id="navrow1" class="tabs">
<ul class="tablist">
<li><a href="index.html"><span>Main Page</span></a></li>
<li class="current"><a href="pages.html"><span>Related Pages</span></a></li>
<li><a href="modules.html"><span>Modules</span></a></li>
<li><a href="examples.html"><span>Examples</span></a></li>
<li><a href="community.html"><span>Community</span></a></li>
<li><a href="license.html"><span>License</span></a></li>
</ul>
</div>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
<div id="nav-sync" class="sync"></div>
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
$(document).ready(function(){initNavTree('file_formats.html','');});
</script>
<div id="doc-content">
<div class="header">
<div class="headertitle">
<div class="title">File formats and compression </div> </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><h1><a class="anchor" id="file_formats_formats"></a>
File formats</h1>
<p>WiredTiger supports two underlying file formats: row-store and column-store, both are key/value stores.</p>
<p>In a row-store, both keys and data are variable-length byte strings. In a column-store, keys are 64-bit record numbers (key_format type 'r'), and values are either variable- or fixed-length byte strings.</p>
<p>Generally, row-stores are faster for queries where all of the columns are required by every lookup (because there's only a single set of meta-data pages to read into the cache and search). Column-stores are faster when most queries require only a subset of the columns (because columns can be separated into multiple files and only the columns being returned need be present in the cache).</p>
<p>Row-store keys and values, and variable-length column-store values, can be up to (4GB - 512B) in length. Keys and values too large to fit on a normal page are stored as overflow items in the file, and are likely to require additional file I/O to access.</p>
<p>Fixed-length column-store values (value_format type 't'), are limited to 8-bits, and only values between 0 and 255 may be stored. Additionally, there is no out-of-band fixed-length "deleted" value, and deleting a value is the same as storing a value of 0. For the same reason, storing a value of 0 will cause cursor scans to skip the record.</p>
<p>WiredTiger does not support duplicate data items: there can be only a single value for any given key, and applications are responsible for creating unique key/value pairs.</p>
<p>WiredTiger allocates space from the underlying files in block units. The minimum file allocation unit WiredTiger supports is 512B and the maximum file allocation unit is 512MB. File block offsets are 64-bit (meaning the maximum file size is very, very large).</p>
<h1><a class="anchor" id="file_formats_compression"></a>
File formats and compression</h1>
<p>Row-stores support four types of compression: key prefix compression, dictionary compression, Huffman encoding and block compression.</p>
<ul>
<li><p class="startli">Key prefix compression reduces the size requirement of both in-memory and on-disk objects by storing any identical key prefix only once per page.</p>
<p class="startli">The cost is minor additional CPU and some additional memory use when operating on the in-memory tree. Specifically, sequential cursor movement through prefix-compressed page in reverse (but not forward) order, or the random lookup of a key/value pair will allocate sufficient memory to hold some number of uncompressed keys. So, for example, if key prefix compression only saves a small number of bytes per key, the additional memory cost of instantiating the uncompressed key may mean prefix compression is not worthwhile. Further, in cases where the on-disk cost is the primary concern, block compression may mean prefix compression is less useful.</p>
<p class="startli">Applications may limit the use of prefix compression by configuring the minimum number of bytes that must be gained before prefix compression is used with the <a class="el" href="struct_w_t___s_e_s_s_i_o_n.html#a358ca4141d59c345f401c58501276bbb" title="Create a table, column group, index or file.">WT_SESSION::create</a> method's <code>prefix_compression_min</code> configuration string, or turn off key prefix compression entirely using the <a class="el" href="struct_w_t___s_e_s_s_i_o_n.html#a358ca4141d59c345f401c58501276bbb" title="Create a table, column group, index or file.">WT_SESSION::create</a> method's <code>prefix_compression</code> configuration string.</p>
<p class="startli">Key prefix compression is enabled by default.</p>
</li>
</ul>
<ul>
<li><p class="startli">Dictionary compression reduces the size requirement of both the in-memory and on-disk objects by storing any identical value only once per page. The cost is minor additional CPU and memory use when writing pages to disk.</p>
<p class="startli">Dictionary compression is disabled by default.</p>
</li>
</ul>
<ul>
<li><p class="startli">Huffman encoding reduces the size requirement of both the in-memory and on-disk objects by compressing individual key/value items, and can be separately configured either or both keys and values. The cost is additional CPU and memory use when searching the in-memory tree (if keys are encoded), and additional CPU and memory use when returning values from the in-memory tree and when writing pages to disk. Note the additional CPU cost of Huffman encoding can be high, and should be considered. (See <a class="el" href="huffman.html">Huffman Encoding</a> for details.)</p>
<p class="startli">Huffman encoding is disabled by default.</p>
</li>
</ul>
<ul>
<li><p class="startli">Block compression reduces the size requirement of on-disk objects by compressing blocks of the backing object's file. The cost is additional CPU and memory use when reading and writing pages to disk. Note the additional CPU cost of block compression can be high, and should be considered. (See <a class="el" href="compression.html">Compressors</a> for details.)</p>
<p class="startli">Block compression is disabled by default.</p>
</li>
</ul>
<p>Column-stores with variable-length byte string values support four types of compression: run-length encoding, dictionary compression, Huffman encoding and block compression.</p>
<ul>
<li><p class="startli">Run-length encoding reduces the size requirement of both the in-memory and on-disk objects by storing sequential, duplicate values in the store only a single time (with an associated count). The cost is minor additional CPU and memory use when returning values from the in-memory tree and when writing pages to disk.</p>
<p class="startli">Run-length encoding is always enabled and cannot be turned off.</p>
</li>
</ul>
<ul>
<li><p class="startli">Dictionary compression reduces the size requirement of both the in-memory and on-disk objects by storing any identical value only once per page. The cost is minor additional CPU and memory use when returning values from the in-memory tree and when writing pages to disk.</p>
<p class="startli">Dictionary compression is disabled by default.</p>
</li>
</ul>
<ul>
<li><p class="startli">Huffman encoding reduces the size requirement of both the in-memory and on-disk objects by compressing individual value items. The cost is additional CPU and memory use when returning values from the in-memory tree and when writing pages to disk. Note the additional CPU cost of Huffman encoding can be high, and should be considered. (See <a class="el" href="huffman.html">Huffman Encoding</a> for details.)</p>
<p class="startli">Huffman encoding is disabled by default.</p>
</li>
</ul>
<ul>
<li><p class="startli">Block compression reduces the size requirement of on-disk objects by compressing blocks of the backing object's file. The cost is additional CPU and memory use when reading and writing pages to disk. Note the additional CPU cost of block compression can be high, and should be considered. (See <a class="el" href="compression.html">Compressors</a> for details.)</p>
<p class="startli">Block compression is disabled by default.</p>
</li>
</ul>
<p>Column-stores with fixed-length byte values support a single type of compression: block compression.</p>
<ul>
<li><p class="startli">Block compression reduces the size requirement of on-disk objects by compressing blocks of the backing object's file. The cost is additional CPU and memory use when reading and writing pages to disk. Note the additional CPU cost of block compression can be high, and should be considered. (See <a class="el" href="compression.html">Compressors</a> for details.)</p>
<p class="startli">Block compression is disabled by default. </p>
</li>
</ul>
</div></div><!-- contents -->
</div><!-- doc-content -->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
<ul>
<li class="navelem"><a class="el" href="index.html">Reference Guide</a></li><li class="navelem"><a class="el" href="programming.html">Writing WiredTiger applications</a></li>
<li class="footer">Copyright (c) 2008-2013 WiredTiger, Inc. All rights reserved. Contact <a href="mailto:[email protected]">[email protected]</a> for more information.</li>
</ul>
</div>
</body>
</html>