-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathlsm.html
131 lines (131 loc) · 10.1 KB
/
lsm.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<title>WiredTiger: Log-Structured Merge Trees</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="navtreedata.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript">
$(document).ready(initResizable);
$(window).load(resizeHeight);
</script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
<link href="wiredtiger.css" rel="stylesheet" type="text/css"/>
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td id="projectlogo"><a href="http://wiredtiger.com/"><img alt="Logo" src="LogoFinal-header.png" alt="WiredTiger" /></a></td>
<td style="padding-left: 0.5em;">
<div id="projectname">
 <span id="projectnumber">Version 2.8.0</span>
</div>
</td>
</tr>
</tbody>
</table>
</div>
<div class="banner">
<a href="https://github.com/wiredtiger/wiredtiger">Fork me on GitHub</a>
<a class="last" href="http://groups.google.com/group/wiredtiger-users">Join my user group</a>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.9.1 -->
<div id="navrow1" class="tabs">
<ul class="tablist">
<li><a href="index.html"><span>Main Page</span></a></li>
<li class="current"><a href="pages.html"><span>Related Pages</span></a></li>
<li><a href="modules.html"><span>Modules</span></a></li>
<li><a href="examples.html"><span>Examples</span></a></li>
<li><a href="community.html"><span>Community</span></a></li>
<li><a href="license.html"><span>License</span></a></li>
</ul>
</div>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
<div id="nav-sync" class="sync"></div>
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
$(document).ready(function(){initNavTree('lsm.html','');});
</script>
<div id="doc-content">
<div class="header">
<div class="headertitle">
<div class="title">Log-Structured Merge Trees </div> </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><h1><a class="anchor" id="lsm_background"></a>
Background</h1>
<p>A common requirement is sustained throughput under a workload that consists of random inserts, where either the key range is chosen so that inserts are very unlikely to conflict (e.g., 128-bit hashes), or where inserts are expected to overwrite existing values.</p>
<p>With traditional btree variants, inserts are very fast while the data set remains in cache, but once the tree overflows the cache, performance drops significantly. There are two factors involved:</p>
<ol type="1">
<li>once the data fills the cache, new inserts have some probability of going to a page that is not in cache, requiring a read; and</li>
<li>the cache is full of dirty pages, so pages have to be written to free space in the cache before the read can be satisfied.</li>
</ol>
<h1><a class="anchor" id="lsm_description"></a>
Description of LSM trees</h1>
<p>Log-Structured Merge Trees are described in this paper by Patrick O'Neil, Edward Cheng, Dieter Gawlick and Elizabeth O'Neil: <a href="http://www.cs.umb.edu/~poneil/lsmtree.pdf">http://www.cs.umb.edu/~poneil/lsmtree.pdf</a></p>
<p>A logical tree is split into several physical pieces so that the most-recently-updated portion of data is in a tree that fits entirely in memory. The size of the in-memory chunk can be configured with the <code>"lsm=(chunk_size)"</code> configuration key to <a class="el" href="struct_w_t___s_e_s_s_i_o_n.html#a358ca4141d59c345f401c58501276bbb" title="Create a table, column group, index or file. ">WT_SESSION::create</a>.</p>
<p>Once the in-memory tree reaches a threshold size, a new in-memory tree is created and the old tree synced to disk. Once written to disk, trees are read-only, though they are merged in the background with other on-disk trees to reduce the cost of reads.</p>
<p>With this structure, "blind writes" can be performed entirely on the in-memory tree. Deletes are implemented by inserting a special "tombstone" record into the in-memory tree.</p>
<h1><a class="anchor" id="lsm_api"></a>
Interface to LSM trees</h1>
<p>An LSM tree can be created as follows, in much the same way as a WiredTiger btree file:</p>
<div class="fragment"><div class="line">session-><a class="code" href="struct_w_t___s_e_s_s_i_o_n.html#a358ca4141d59c345f401c58501276bbb">create</a>(session, <span class="stringliteral">"table:bucket"</span>, <span class="stringliteral">"type=lsm,key_format=S,value_format=S"</span>);</div>
</div><!-- fragment --><p>Once created, the LSM tree can be accessed using the same cursor interface as other data sources in WiredTiger:</p>
<div class="fragment"><div class="line"><a class="code" href="struct_w_t___c_u_r_s_o_r.html">WT_CURSOR</a> *c;</div>
<div class="line"></div>
<div class="line">session-><a class="code" href="struct_w_t___s_e_s_s_i_o_n.html#afb5b4a69c2c5cafe411b2b04fdc1c75d">open_cursor</a>(session, <span class="stringliteral">"table:bucket"</span>, NULL, NULL, &c);</div>
<div class="line"><span class="keywordflow">for</span>(;;) {</div>
<div class="line"> c-><a class="code" href="struct_w_t___c_u_r_s_o_r.html#ad1088d719df40babc1f57d086691ebdc">set_key</a>(c, <span class="stringliteral">"key"</span>);</div>
<div class="line"> c-><a class="code" href="struct_w_t___c_u_r_s_o_r.html#a27f7cbd0cd3e561f6a145704813ad64c">set_value</a>(c, <span class="stringliteral">"value"</span>);</div>
<div class="line"> c-><a class="code" href="struct_w_t___c_u_r_s_o_r.html#aac90d9fbcc031570f924db55f8a1cee3">insert</a>(c);</div>
<div class="line">}</div>
</div><!-- fragment --><p>If an LSM cursor is configured with <code>"overwrite=false"</code> passed to <a class="el" href="struct_w_t___s_e_s_s_i_o_n.html#afb5b4a69c2c5cafe411b2b04fdc1c75d" title="Open a new cursor on a data source or duplicate an existing cursor. ">WT_SESSION::open_cursor</a>, the result will be a search through the levels of the LSM tree before every modification.</p>
<h1><a class="anchor" id="lsm_merge"></a>
Merging</h1>
<p>A background thread is opened for each active LSM tree. This thread is responsible for both writing old chunks to stable storage, and for merging multiple chunks together so that reads can be satisfied from a small number of files. There is currently no way to configure merges: they are performed automatically by the background thread.</p>
<h1><a class="anchor" id="lsm_bloom"></a>
Bloom filters</h1>
<p>WiredTiger creates a Bloom filter when merging. This is an additional file that contains a configurable number of bits per key (default 8). Keys are hashed a configurable number of times (default 4), and the corresponding bits set. The Bloom filter is used to avoid reading from a chunk if the key cannot be present.</p>
<p>With the defaults, the Bloom filter only requires one byte per key, so it usually fits in cache. The Bloom parameters can be configured with <code>"lsm=(bloom_bit_count)"</code> and <code>"lsm=(bloom_hash_count)"</code> configuration keys to <a class="el" href="struct_w_t___s_e_s_s_i_o_n.html#a358ca4141d59c345f401c58501276bbb" title="Create a table, column group, index or file. ">WT_SESSION::create</a>. The Bloom file can be configured with the <code>"lsm=(bloom_config)"</code> key.</p>
<h1><a class="anchor" id="lsm_schema"></a>
Creating tables using LSM trees</h1>
<p>Tables or indices can be stored using LSM trees. Schema support is provided for LSM as an extension to the <a class="el" href="struct_w_t___s_e_s_s_i_o_n.html#a358ca4141d59c345f401c58501276bbb" title="Create a table, column group, index or file. ">WT_SESSION::create</a> method:</p>
<div class="fragment"><div class="line">session-><a class="code" href="struct_w_t___s_e_s_s_i_o_n.html#a358ca4141d59c345f401c58501276bbb">create</a>(session, <span class="stringliteral">"table:T"</span>, <span class="stringliteral">"type=lsm"</span>);</div>
</div><!-- fragment --><p>The default type for all schema objects will continue to be btree.</p>
<h1><a class="anchor" id="lsm_caveats"></a>
Caveats</h1>
<h2><a class="anchor" id="lsm_hazard"></a>
Hazard configuration</h2>
<p>Reads from an LSM cursor may need to position a cursor in each active chunk. The number of chunks depends on the chunk size, and how many chunks have been merged. There must be at least as many hazard pointers available as there are chunks in the tree for each cursor that is open on the LSM tree. The number of hazard pointers is configured with the <code>"hazard_max"</code> configuration key to <a class="el" href="group__wt.html#ga9e6adae3fc6964ef837a62795c7840ed" title="Open a connection to a database. ">wiredtiger_open</a>.</p>
<h2><a class="anchor" id="lsm_checkpoints"></a>
Named checkpoints</h2>
<p>Named checkpoints are not supported on LSM trees, and cursors cannot be opened with a non-empty <code>"checkpoint"</code> configuration (that is, only the most recent standard checkpoint can be read). </p>
</div></div><!-- contents -->
</div><!-- doc-content -->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
<ul>
<li class="navelem"><a class="el" href="index.html">Reference Guide</a></li><li class="navelem"><a class="el" href="programming.html">Writing WiredTiger applications</a></li>
<li class="footer">Copyright (c) 2008-2016 MongoDB, Inc. All rights reserved. Contact <a href="mailto:[email protected]">[email protected]</a> for more information.</li>
</ul>
</div>
</body>
</html>