-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathhuffman.html
102 lines (102 loc) · 6.79 KB
/
huffman.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<title>WiredTiger: Huffman Encoding</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript">
$(document).ready(initResizable);
$(window).load(resizeHeight);
</script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
<link href="wiredtiger.css" rel="stylesheet" type="text/css"/>
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td id="projectlogo"><a href="http://wiredtiger.com/"><img alt="Logo" src="LogoFinal-header.png" alt="WiredTiger" /></a></td>
<td style="padding-left: 0.5em;">
<div id="projectname">
 <span id="projectnumber">Version 2.0.1</span>
</div>
</td>
</tr>
</tbody>
</table>
</div>
<div class="banner">
<a href="https://github.com/wiredtiger/wiredtiger">Fork me on GitHub</a>
<a class="last" href="http://groups.google.com/group/wiredtiger-users">Join my user group</a>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.3.1 -->
<div id="navrow1" class="tabs">
<ul class="tablist">
<li><a href="index.html"><span>Main Page</span></a></li>
<li class="current"><a href="pages.html"><span>Related Pages</span></a></li>
<li><a href="modules.html"><span>Modules</span></a></li>
<li><a href="examples.html"><span>Examples</span></a></li>
<li><a href="community.html"><span>Community</span></a></li>
<li><a href="license.html"><span>License</span></a></li>
</ul>
</div>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
<div id="nav-sync" class="sync"></div>
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
$(document).ready(function(){initNavTree('huffman.html','');});
</script>
<div id="doc-content">
<div class="header">
<div class="headertitle">
<div class="title">Huffman Encoding </div> </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><p>Keys in row-stores and variable-length values in either row- or column-stores can be compressed with Huffman encoding.</p>
<p>Huffman compression is maintained in memory as well as on disk, and can increase the amount of usable data the cache can hold as well as decrease the size of the data on disk. The additional CPU cost of Huffman coding can be high, and should be considered.</p>
<p>To configure Huffman encoding for the key in a row-store, specify <code>huffman_key=english</code>, <code>huffman_key=utf8<file></code> or <code>huffman_key=utf16<file></code> in the configuration passed to <code><a class="el" href="struct_w_t___s_e_s_s_i_o_n.html#a358ca4141d59c345f401c58501276bbb" title="Create a table, column group, index or file.">WT_SESSION::create</a></code>.</p>
<p>To configure Huffman encoding for a variable-length value in either a row-store or a column-store, specify <code>huffman_value=english</code>, <code>huffman_value=utf8<file></code> or <code>huffman_value=utf16<file></code> in the configuration passed to <code><a class="el" href="struct_w_t___s_e_s_s_i_o_n.html#a358ca4141d59c345f401c58501276bbb" title="Create a table, column group, index or file.">WT_SESSION::create</a></code>.</p>
<p>Setting Huffman encoding to <code>english</code> configures WiredTiger to use a built-in English language frequency table. The English language frequency table is based on <code>"Case-sensitive letter and bigram
frequency counts from large-scale English corpora"</code>, by Michael N. Jones and D.J.K. Mewhort, modified to support space and tab characters.</p>
<p>Setting Huffman encoding to <code>utf8<file></code> or <code>utf16<file></code> configures WiredTiger to use a frequency table read from a file. (Note: the <code><</code> and <code>></code> characters are not literal, and should not appear in the string.)</p>
<p>The frequency table file format is lines containing pairs of unsigned integers separated by whitespace. The first integer is the symbol value, the second integer is the frequency value. Symbol values may be specified as hexadecimal numbers (with a leading <code>0x</code> prefix), or as integers. For example, an English-language frequency table for the characters <code>0</code> through <code>9</code> might look like this:</p>
<div class="fragment"><div class="line">0x30 546233</div>
<div class="line">0x31 460946</div>
<div class="line">0x32 333499</div>
<div class="line">0x33 187606</div>
<div class="line">0x34 192528</div>
<div class="line">0x35 374413</div>
<div class="line">0x36 153865</div>
<div class="line">0x37 120094</div>
<div class="line">0x38 182627</div>
<div class="line">0x39 282364</div>
</div><!-- fragment --><p>Frequency table symbol values must be unique. In the case of <code>utf8</code> files, symbol values must be in the range of 0 to 255. In the case of <code>utf16</code> files, symbol values must be in the range of 0 to 65,535. Frequency values do not need to be unique, but must be in the range of 0 to the maximum 32-bit unsigned integer value (4,294,967,295), where the lower a frequency value, the less likely the byte value is to occur.</p>
<p>Any symbol values not listed in the frequency table are assumed to have frequencies of 0. Input containing symbol values that did not appear in the frequency table (or appeared in the frequency table, but with frequency values of 0), are accepted, but will not compress as well as if they are listed in the frequency table, with frequency values other than 0. </p>
</div></div><!-- contents -->
</div><!-- doc-content -->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
<ul>
<li class="navelem"><a class="el" href="index.html">Reference Guide</a></li><li class="navelem"><a class="el" href="programming.html">Writing WiredTiger applications</a></li><li class="navelem"><a class="el" href="file_formats.html">File formats and compression</a></li>
<li class="footer">Copyright (c) 2008-2013 WiredTiger, Inc. All rights reserved. Contact <a href="mailto:[email protected]">[email protected]</a> for more information.</li>
</ul>
</div>
</body>
</html>