Skip to content

Commit 7f0d1d1

Browse files
authored
Update Neural_Turing_Machines_Reports_and_Discussion.ipynb
1 parent 653afbf commit 7f0d1d1

File tree

1 file changed

+23
-23
lines changed

1 file changed

+23
-23
lines changed

Neural_Turing_Machines_Reports_and_Discussion.ipynb

+23-23
Original file line numberDiff line numberDiff line change
@@ -208,7 +208,7 @@
208208
"source": [
209209
"### **(b)** Hyper-parameter Values\n",
210210
"We experimented with a large set of candidates to choose our hyper-parameters from. We found out a few suitable candidates for our tasks which worked well across all our methods. Below we are providing the hyper-parameters we eventually used.\n",
211-
"<center><table style=\"width:100%\" border=\"1px solid black\">\n",
211+
"<table style=\"width:100%\" border=\"1px solid black\">\n",
212212
" <tr>\n",
213213
" <th>Method's Name</th>\n",
214214
" <th>Learning Rate</th>\n",
@@ -219,33 +219,33 @@
219219
" <th>Gradien Clipped at Max Norm of</th>\n",
220220
" </tr>\n",
221221
" <tr>\n",
222-
" <td><center>LSTM-NTM</center></td>\n",
223-
" <td><center>0.01</center></td>\n",
224-
" <td><center>0.5</center></td>\n",
225-
" <td><center>100</center></td>\n",
226-
" <td><center>Stochastic Gradient Descent</center></td>\n",
227-
" <td><center>Softmax</center></td>\n",
228-
" <td><center>10</center></td>\n",
222+
" <td>LSTM-NTM</td>\n",
223+
" <td>0.01</td>\n",
224+
" <td>0.5</td>\n",
225+
" <td>100</td>\n",
226+
" <td>Stochastic Gradient Descent</td>\n",
227+
" <td>Softmax</td>\n",
228+
" <td>10</td>\n",
229229
" </tr>\n",
230230
" <tr>\n",
231-
" <td><center>Feedforward-NTM</center></td>\n",
232-
" <td><center>0.01</center></td>\n",
233-
" <td><center>0.5</center></td>\n",
234-
" <td><center>100</center></td>\n",
235-
" <td><center>Stochastic Gradient Descent</center></td>\n",
236-
" <td><center>Softmax</center></td>\n",
237-
" <td><center>10</center></td>\n",
231+
" <td>Feedforward-NTM</td>\n",
232+
" <td>0.01</td>\n",
233+
" <td>0.5</td>\n",
234+
" <td>100</td>\n",
235+
" <td>Stochastic Gradient Descent</td>\n",
236+
" <td>Softmax</td>\n",
237+
" <td>10</td>\n",
238238
" </tr>\n",
239239
" <tr>\n",
240-
" <td><center>LSTM</center></td>\n",
241-
" <td><center>0.001</center></td>\n",
242-
" <td><center>0.5</center></td>\n",
243-
" <td><center>100</center></td>\n",
244-
" <td><center>Adam</center></td>\n",
245-
" <td><center>Softmax</center></td>\n",
246-
" <td><center>10</center></td>\n",
240+
" <td>LSTM</td>\n",
241+
" <td>0.001</td>\n",
242+
" <td>0.5</td>\n",
243+
" <td>100</td>\n",
244+
" <td>Adam</td>\n",
245+
" <td>Softmax</td>\n",
246+
" <td>10</td>\n",
247247
" </tr>\n",
248-
"</table></center><br>\n",
248+
"</table><br>\n",
249249
"* Note: In order to eliminate other hyper-parameter values and to pick the best ones for our tasks, we only let each model to progress for $200$ epochs, we observed that the loss curve decreases towards $0$. We then picked the values for which the loss curve better approached zero. The main reason we did not let our models to be trained passed $200$ epochs was purely due to our limited computational resources. We will discuss these issues further in the **Discussion** section."
250250
]
251251
},

0 commit comments

Comments
 (0)