-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathimpl-hashmaps.html
286 lines (253 loc) · 20.4 KB
/
impl-hashmaps.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<!-- 2022-W11-4 02:13 -->
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>Implementing Hash Maps</title>
<meta name="author" content="Inanna" />
<meta name="description" content="A short introduction to hash maps, how they work, and a short implementation thereof." />
<meta name="generator" content="Org Mode" />
<link href="/site.css" rel="stylesheet" type="text/css" /><link href="images/website-icon.png" rel="icon" />
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
displayAlign: "center",
displayIndent: "0em",
"HTML-CSS": { scale: 100,
linebreaks: { automatic: "false" },
webFont: "TeX"
},
SVG: {scale: 100,
linebreaks: { automatic: "false" },
font: "TeX"},
NativeMML: {scale: 100},
TeX: { equationNumbers: {autoNumber: "AMS"},
MultLineWidth: "85%",
TagSide: "right",
TagIndent: ".8em"
}
});
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS_HTML"></script>
</head>
<body>
<div id="preamble" class="status">
<div><a href="/index.html"><img alt="An abstract logo representing a series of three assembly line stamping machines with the words CONS, DEV, and embalzoned in white on each machine." id="site-logo" src="/images/website-logo.png" /></a></div>
</div>
<div id="content" class="content">
<h1 class="title">Implementing Hash Maps</h1>
<div id="table-of-contents" role="doc-toc">
<h2>Table of Contents</h2>
<div id="text-table-of-contents" role="doc-toc">
<ul>
<li><a href="#orgbc14afd">Introduction</a></li>
<li><a href="#orgee609fa">Algorithm</a></li>
<li><a href="#org079625f">Implementation</a>
<ul>
<li><a href="#org067bf0a">Hash Function</a></li>
<li><a href="#org43d0bbd">The Hashmap Constructor</a></li>
<li><a href="#org0aff215">Adding Items</a></li>
<li><a href="#org1e27653">Getting Items</a></li>
</ul>
</li>
<li><a href="#orgc53612e">Conclusion</a></li>
</ul>
</div>
</div>
<div id="outline-container-orgbc14afd" class="outline-2">
<h2 id="orgbc14afd">Introduction</h2>
<div class="outline-text-2" id="text-orgbc14afd">
<p>
A hashmap is a type of key-value store that uses a hash function to index.
</p>
</div>
</div>
<div id="outline-container-orgee609fa" class="outline-2">
<h2 id="orgee609fa">Algorithm</h2>
<div class="outline-text-2" id="text-orgee609fa">
<p>
A hashmap is fairly simple, but in essence you have an array of 2-tuples with the first containing a key and the second containing a value. To insert into the array you first hash the key and then take the modulus of the key with respect to the length of the array.
</p>
<div id="org7b8afc9" class="figure">
<p><img src="./images/hashmap-adding_c8a4a74c-a139-45d5-a340-c2ed895b2e5a.png" alt="Two items being added to a hashmap." />
</p>
</div>
<p>
However, as you might imagine, there can occasionally be collisions, like this.
</p>
<div id="orgd7e2efb" class="figure">
<p><img src="./images/hashmap-collison_58e621d6-4ffd-4fed-87d3-592912398f2b.png" alt="A third item being added to a hash map but experiencing a hash collision." />
</p>
</div>
<p>
To solve this you can either resize the array, shove the collided items in a sub-map <label class="sidenote-number" for="1"><sup>1</sup></label><input checked="checked" id="1" style="display:none" type="checkbox" /><span class="sidenote"><span class="sidenote-number"> 1</span> Which will over time cause the hash map to have \(O(log(n))\) search times.</span>, or shift them over in some constant direction. Here we choose to resize the array, which is an \(O(n)\) operation.
</p>
<div id="org975c839" class="figure">
<p><img src="./images/hashmap-resizing_58e621d6-4ffd-4fed-87d3-592912398f2b.png" alt="All three itmes being added again to the newly resized has hmap." />
</p>
</div>
<p>
To find items in the array you first find the index of the key just like you did for insertion and then you compare the key to the key in the 2-tuple you stored at that index. If it is the same, then you return the value from the tuple.
</p>
<div id="orgc3bc292" class="figure">
<p><img src="./images/hashmap-finding-key_251f9892-8648-4230-b99b-f8b50575f140.png" alt="The value of foo being obtained from the hash map." />
</p>
</div>
</div>
</div>
<div id="outline-container-org079625f" class="outline-2">
<h2 id="org079625f">Implementation</h2>
<div class="outline-text-2" id="text-org079625f">
</div>
<div id="outline-container-org067bf0a" class="outline-3">
<h3 id="org067bf0a">Hash Function</h3>
<div class="outline-text-3" id="text-org067bf0a">
<p>
This function simply multiplies the list of numbers together with a pad, creating a somewhat random number. It is the exact same as the one in my <a href="impl-bloom-filters.html#ID-a2a663d2-d5be-4e28-a4df-8fd8e502ce1f">Implementing Bloom Filters</a> tutorial.
</p>
<div class="org-src-container">
<pre class="src src-clojure"><span style="color: #494949;">(</span><span style="color: #E53935; font-style: italic;">defn</span> <span style="font-weight: bold; font-style: italic;">hash-obj</span> <span style="color: #bbb;">[</span>obj<span style="color: #bbb;">]</span>
<span style="color: #bbb;">(</span><span style="color: #E53935; font-style: italic;">let</span> <span style="color: #494949;">[</span>pad 3484115242153581271N<span style="color: #494949;">]</span>
<span style="color: #494949;">(</span><span style="color: #E53935; font-style: italic;">->></span> <span style="color: #bbb;">(</span>str obj<span style="color: #bbb;">)</span>
<span style="color: #bbb;">(</span>map int<span style="color: #bbb;">)</span>
<span style="color: #bbb;">(</span>apply <span style="color: #494949;">(</span><span style="color: #E53935; font-style: italic;">fn</span> <span style="color: #bbb;">[</span>x & xs<span style="color: #bbb;">]</span> <span style="color: #bbb;">(</span>cons <span style="color: #494949;">(</span>mod <span style="color: #bbb;">(</span>*' x pad<span style="color: #bbb;">)</span> <span style="color: #E53935;">Long</span>/MAX_VALUE<span style="color: #494949;">)</span> xs<span style="color: #bbb;">)</span><span style="color: #494949;">)</span><span style="color: #bbb;">)</span>
<span style="color: #bbb;">(</span>reduce #<span style="color: #494949;">(</span>mod <span style="color: #bbb;">(</span>*' <span style="font-style: italic;">%1</span> <span style="font-style: italic;">%2</span> pad<span style="color: #bbb;">)</span> <span style="color: #E53935;">Long</span>/MAX_VALUE<span style="color: #494949;">)</span><span style="color: #bbb;">)</span>
long<span style="color: #494949;">)</span><span style="color: #bbb;">)</span><span style="color: #494949;">)</span>
</pre>
</div>
</div>
</div>
<div id="outline-container-org43d0bbd" class="outline-3">
<h3 id="org43d0bbd">The Hashmap Constructor</h3>
<div class="outline-text-3" id="text-org43d0bbd">
<p>
To define a hash map we first have to define a simple object for it and a constructor that creates an array of nils of the relevant size.
</p>
<div class="org-src-container">
<pre class="src src-clojure"><span style="color: #494949;">(</span><span style="color: #E53935; font-style: italic;">defrecord</span> <span style="color: #E53935;">HashMap</span> <span style="color: #bbb;">[</span>arr size<span style="color: #bbb;">]</span><span style="color: #494949;">)</span>
<span style="color: #494949;">(</span><span style="color: #E53935; font-style: italic;">defn</span> <span style="font-weight: bold; font-style: italic;">new-hashmap</span> <span style="color: #bbb;">[</span>size<span style="color: #bbb;">]</span> <span style="color: #bbb;">(</span>->HashMap <span style="color: #494949;">(</span>vec <span style="color: #bbb;">(</span>repeat size <span style="color: #E53935;">nil</span><span style="color: #bbb;">)</span><span style="color: #494949;">)</span> size<span style="color: #bbb;">)</span><span style="color: #494949;">)</span>
</pre>
</div>
</div>
</div>
<div id="outline-container-org0aff215" class="outline-3">
<h3 id="org0aff215">Adding Items</h3>
<div class="outline-text-3" id="text-org0aff215">
<p>
Now that we can get stuff from the hashmap, let's consider adding stuff to it as well. Here we add a function that takes the map, a key, and a value, and then adds that to the hash map. Should a collision occur it resizes.
</p>
<div class="org-src-container">
<pre class="src src-clojure"><span style="color: #494949;">(</span><span style="color: #E53935; font-style: italic;">defn</span> <span style="font-weight: bold; font-style: italic;">add-hash</span>
<span style="color: #bbb;">[</span><span style="color: #494949;">{</span><span style="color: #E53935;">:keys</span> <span style="color: #bbb;">[</span>arr size<span style="color: #bbb;">]</span> <span style="color: #E53935;">:as</span> map<span style="color: #494949;">}</span> key val<span style="color: #bbb;">]</span>
<span style="color: #bbb;">(</span><span style="color: #E53935; font-style: italic;">if</span> <span style="color: #494949;">(</span>nil? <span style="color: #bbb;">(</span>nth arr <span style="color: #494949;">(</span>mod <span style="color: #bbb;">(</span>hash-obj key<span style="color: #bbb;">)</span> size<span style="color: #494949;">)</span><span style="color: #bbb;">)</span><span style="color: #494949;">)</span>
<span style="color: #494949;">(</span>assoc-in map <span style="color: #bbb;">[</span><span style="color: #E53935;">:arr</span> <span style="color: #494949;">(</span>mod <span style="color: #bbb;">(</span>hash-obj key<span style="color: #bbb;">)</span> size<span style="color: #494949;">)</span><span style="color: #bbb;">]</span> <span style="color: #bbb;">[</span>key val<span style="color: #bbb;">]</span><span style="color: #494949;">)</span>
<span style="color: #494949;">(</span><span style="color: #E53935; font-style: italic;">let</span> <span style="color: #bbb;">[</span>new-hashmap <span style="color: #494949;">(</span>new-hashmap <span style="color: #bbb;">(</span>* 2 size<span style="color: #bbb;">)</span><span style="color: #494949;">)</span><span style="color: #bbb;">]</span>
<span style="color: #bbb;">(</span><span style="color: #E53935; font-style: italic;">->></span> arr
<span style="color: #494949;">(</span>filter identity<span style="color: #494949;">)</span>
<span style="color: #494949;">(</span>cons <span style="color: #bbb;">[</span>key val<span style="color: #bbb;">]</span><span style="color: #494949;">)</span>
<span style="color: #494949;">(</span>reduce <span style="color: #bbb;">(</span><span style="color: #E53935; font-style: italic;">fn</span> <span style="color: #494949;">[</span>new-map <span style="color: #bbb;">[</span>key val<span style="color: #bbb;">]</span><span style="color: #494949;">]</span> <span style="color: #494949;">(</span>add-hash new-map key val<span style="color: #494949;">)</span><span style="color: #bbb;">)</span> new-hashmap<span style="color: #494949;">)</span><span style="color: #bbb;">)</span><span style="color: #494949;">)</span><span style="color: #bbb;">)</span><span style="color: #494949;">)</span>
</pre>
</div>
<p>
Here we also see the sole weakness of the hash map, either resizing has to occur or you have to accept degraded performance once a certain size has been reached. Resizing is, while deterministic, not idempotent and it is absolutely possible that it could go on infinitely.<label class="sidenote-number" for="2"><sup>2</sup></label><input checked="checked" id="2" style="display:none" type="checkbox" /><span class="sidenote"><span class="sidenote-number"> 2</span> This is because (ignoring for a moment that our hash function is terrible and probably not properly random) the probability of a collision should be equal to the fraction of the map that is filled. This probability is never exactly 0 for an insertion (and of course resizing itself involves insertions), leading to possible dramatic growth in some cases.</span> You also will not be able to fill <i>all</i> the indices in the hash map, meaning that each component will inevitably have only some set of the items.
</p>
<p>
To demonstrate the resizing ability, consider the execution of the following.
</p>
<p>
First we define a new map.
</p>
<div class="org-src-container">
<pre class="src src-clojure"><span style="color: #494949;">(</span><span style="color: #E53935; font-style: italic;">def</span> <span style="font-style: italic;">hm</span> <span style="color: #bbb;">(</span>new-hashmap 5<span style="color: #bbb;">)</span><span style="color: #494949;">)</span>
</pre>
</div>
<p>
Then we start adding opinions to it about different programming languages.
</p>
<div class="org-src-container">
<pre class="src src-clojure"><span style="color: #494949;">(</span><span style="color: #E53935; font-style: italic;">def</span> <span style="font-style: italic;">hm-with-opinions</span>
<span style="color: #bbb;">(</span><span style="color: #E53935; font-style: italic;">-></span> hm
<span style="color: #494949;">(</span>add-hash <span style="color: #494949;">"Clojure"</span> <span style="color: #494949;">"Is extremely awesome."</span><span style="color: #494949;">)</span>
<span style="color: #494949;">(</span>add-hash <span style="color: #494949;">"JavaScript"</span> <span style="color: #494949;">"Kind of sucks."</span><span style="color: #494949;">)</span><span style="color: #bbb;">)</span><span style="color: #494949;">)</span>
</pre>
</div>
<p>
Which gives us this, and as you can see the two items have been added at their respective indices.
</p>
<div class="org-src-container">
<pre class="src src-clojure">#hm-demo.HashMap<span style="color: #494949;">{</span><span style="color: #E53935;">:arr</span>
<span style="color: #bbb;">[</span><span style="color: #494949;">[</span><span style="color: #494949;">"Clojure"</span> <span style="color: #494949;">"Is extremely awesome."</span><span style="color: #494949;">]</span>
<span style="color: #E53935;">nil</span>
<span style="color: #E53935;">nil</span>
<span style="color: #494949;">[</span><span style="color: #494949;">"Javascript"</span> <span style="color: #494949;">"Kind of sucks."</span><span style="color: #494949;">]</span>
<span style="color: #E53935;">nil</span><span style="color: #bbb;">]</span>,
<span style="color: #E53935;">:size</span> 5<span style="color: #494949;">}</span>
</pre>
</div>
<p>
Then we just insert my opinion on PHP.
</p>
<div class="org-src-container">
<pre class="src src-clojure"><span style="color: #494949;">(</span>add-hash hm-with-opinions <span style="color: #494949;">"PHP"</span> <span style="color: #494949;">"Oh god no, please spare me from this hell."</span><span style="color: #494949;">)</span>
</pre>
</div>
<p>
And now, as you can see, after adding the last element the hasmap automatically resizes, doubling in size.
</p>
<div class="org-src-container">
<pre class="src src-clojure"><span style="color: #494949;">{</span><span style="color: #E53935;">:arr</span>
<span style="color: #bbb;">[</span><span style="color: #494949;">[</span><span style="color: #494949;">"PHP"</span> <span style="color: #494949;">"Oh god no, please spare me from this hell."</span><span style="color: #494949;">]</span>
<span style="color: #E53935;">nil</span>
<span style="color: #E53935;">nil</span>
<span style="color: #E53935;">nil</span>
<span style="color: #E53935;">nil</span>
<span style="color: #494949;">[</span><span style="color: #494949;">"Clojure"</span> <span style="color: #494949;">"Is extremely awesome."</span><span style="color: #494949;">]</span>
<span style="color: #E53935;">nil</span>
<span style="color: #E53935;">nil</span>
<span style="color: #494949;">[</span><span style="color: #494949;">"JavaScript"</span> <span style="color: #494949;">"Kind of sucks."</span><span style="color: #494949;">]</span>
<span style="color: #E53935;">nil</span><span style="color: #bbb;">]</span>,
<span style="color: #E53935;">:size</span> 10<span style="color: #494949;">}</span>
</pre>
</div>
</div>
</div>
<div id="outline-container-org1e27653" class="outline-3">
<h3 id="org1e27653">Getting Items</h3>
<div class="outline-text-3" id="text-org1e27653">
<p>
The <code>get-hash</code> function simply takes the hahmap and a key. Should the key correspond with a value in the map it returns true, else <code>nil</code> or the value provided in <code>not-found</code>, which replicates the functionality of the native <code>get</code> function.
</p>
<div class="org-src-container">
<pre class="src src-clojure"><span style="color: #494949;">(</span><span style="color: #E53935; font-style: italic;">defn</span> <span style="font-weight: bold; font-style: italic;">get-hash</span>
<span style="color: #bbb;">(</span><span style="color: #494949;">[</span>map key<span style="color: #494949;">]</span> <span style="color: #494949;">(</span>get-hash map key <span style="color: #E53935;">nil</span><span style="color: #494949;">)</span><span style="color: #bbb;">)</span>
<span style="color: #bbb;">(</span><span style="color: #494949;">[</span><span style="color: #bbb;">{</span><span style="color: #E53935;">:keys</span> <span style="color: #494949;">[</span>arr size<span style="color: #494949;">]</span><span style="color: #bbb;">}</span> key not-found<span style="color: #494949;">]</span>
<span style="color: #494949;">(</span><span style="color: #E53935; font-style: italic;">let</span> <span style="color: #bbb;">[</span>idx <span style="color: #494949;">(</span>mod <span style="color: #bbb;">(</span>hash-obj key<span style="color: #bbb;">)</span> size<span style="color: #494949;">)</span>
item <span style="color: #494949;">(</span>nth arr idx <span style="color: #494949;">)</span><span style="color: #bbb;">]</span>
<span style="color: #bbb;">(</span><span style="color: #E53935; font-style: italic;">if</span> <span style="color: #494949;">(</span><span style="color: #E53935; font-style: italic;">or</span> <span style="color: #bbb;">(</span>nil? item<span style="color: #bbb;">)</span>
<span style="color: #bbb;">(</span>not= key <span style="color: #494949;">(</span>first item<span style="color: #494949;">)</span><span style="color: #bbb;">)</span><span style="color: #494949;">)</span>
not-found
<span style="color: #494949;">(</span>second item<span style="color: #494949;">)</span><span style="color: #bbb;">)</span><span style="color: #494949;">)</span><span style="color: #bbb;">)</span><span style="color: #494949;">)</span>
</pre>
</div>
</div>
</div>
</div>
<div id="outline-container-orgc53612e" class="outline-2">
<h2 id="orgc53612e">Conclusion</h2>
<div class="outline-text-2" id="text-orgc53612e">
<p>
Hash maps are a great way to store key-value pairs with a constant lookup time. However, due to their limited size-efficiency they can have somewhat non-deterministic insertion times, which can be painful if you are working with very large maps.
</p>
</div>
</div>
<!-- Footnotes --><!--
<div class="footdef"><sup><a id="fn.1" class="footnum" href="#fnr.1" role="doc-backlink">1</a></sup> <div class="footpara" role="doc-footnote"><p class="footpara">Which will over time cause the hash map to have \(O(log(n))\) search times.</p></div></div>
<div class="footdef"><sup><a id="fn.2" class="footnum" href="#fnr.2" role="doc-backlink">2</a></sup> <div class="footpara" role="doc-footnote"><p class="footpara">This is because (ignoring for a moment that our hash function is terrible and probably not properly random) the probability of a collision should be equal to the fraction of the map that is filled. This probability is never exactly 0 for an insertion (and of course resizing itself involves insertions), leading to possible dramatic growth in some cases.</p></div></div>
--></div>
<div id="postamble" class="status">
<p class="date">Last Modified: 2021-W52-2 01:00</p><p class="creator">Generated Using: <a href="https://www.gnu.org/software/emacs/">Emacs</a> 27.2 (<a href="https://orgmode.org">Org</a> mode 9.4.6)</p><p class="license">Except where otherwise noted content on <a href="https://cons.dev">cons.dev</a> is licensed under a <a href="https://creativecommons.org/licenses/by-sa/4.0/" rel="license">Creative Commons Attribution-ShareAlike 4.0 International License</a>.</p>
</div>
</body>
</html>