|
208 | 208 | "source": [
|
209 | 209 | "### **(b)** Hyper-parameter Values\n",
|
210 | 210 | "We experimented with a large set of candidates to choose our hyper-parameters from. We found out a few suitable candidates for our tasks which worked well across all our methods. Below we are providing the hyper-parameters we eventually used.\n",
|
211 |
| - "<center><table style=\"width:100%\" border=\"1px solid black\">\n", |
| 211 | + "<table style=\"width:100%\" border=\"1px solid black\">\n", |
212 | 212 | " <tr>\n",
|
213 | 213 | " <th>Method's Name</th>\n",
|
214 | 214 | " <th>Learning Rate</th>\n",
|
|
219 | 219 | " <th>Gradien Clipped at Max Norm of</th>\n",
|
220 | 220 | " </tr>\n",
|
221 | 221 | " <tr>\n",
|
222 |
| - " <td><center>LSTM-NTM</center></td>\n", |
223 |
| - " <td><center>0.01</center></td>\n", |
224 |
| - " <td><center>0.5</center></td>\n", |
225 |
| - " <td><center>100</center></td>\n", |
226 |
| - " <td><center>Stochastic Gradient Descent</center></td>\n", |
227 |
| - " <td><center>Softmax</center></td>\n", |
228 |
| - " <td><center>10</center></td>\n", |
| 222 | + " <td>LSTM-NTM</td>\n", |
| 223 | + " <td>0.01</td>\n", |
| 224 | + " <td>0.5</td>\n", |
| 225 | + " <td>100</td>\n", |
| 226 | + " <td>Stochastic Gradient Descent</td>\n", |
| 227 | + " <td>Softmax</td>\n", |
| 228 | + " <td>10</td>\n", |
229 | 229 | " </tr>\n",
|
230 | 230 | " <tr>\n",
|
231 |
| - " <td><center>Feedforward-NTM</center></td>\n", |
232 |
| - " <td><center>0.01</center></td>\n", |
233 |
| - " <td><center>0.5</center></td>\n", |
234 |
| - " <td><center>100</center></td>\n", |
235 |
| - " <td><center>Stochastic Gradient Descent</center></td>\n", |
236 |
| - " <td><center>Softmax</center></td>\n", |
237 |
| - " <td><center>10</center></td>\n", |
| 231 | + " <td>Feedforward-NTM</td>\n", |
| 232 | + " <td>0.01</td>\n", |
| 233 | + " <td>0.5</td>\n", |
| 234 | + " <td>100</td>\n", |
| 235 | + " <td>Stochastic Gradient Descent</td>\n", |
| 236 | + " <td>Softmax</td>\n", |
| 237 | + " <td>10</td>\n", |
238 | 238 | " </tr>\n",
|
239 | 239 | " <tr>\n",
|
240 |
| - " <td><center>LSTM</center></td>\n", |
241 |
| - " <td><center>0.001</center></td>\n", |
242 |
| - " <td><center>0.5</center></td>\n", |
243 |
| - " <td><center>100</center></td>\n", |
244 |
| - " <td><center>Adam</center></td>\n", |
245 |
| - " <td><center>Softmax</center></td>\n", |
246 |
| - " <td><center>10</center></td>\n", |
| 240 | + " <td>LSTM</td>\n", |
| 241 | + " <td>0.001</td>\n", |
| 242 | + " <td>0.5</td>\n", |
| 243 | + " <td>100</td>\n", |
| 244 | + " <td>Adam</td>\n", |
| 245 | + " <td>Softmax</td>\n", |
| 246 | + " <td>10</td>\n", |
247 | 247 | " </tr>\n",
|
248 |
| - "</table></center><br>\n", |
| 248 | + "</table><br>\n", |
249 | 249 | "* Note: In order to eliminate other hyper-parameter values and to pick the best ones for our tasks, we only let each model to progress for $200$ epochs, we observed that the loss curve decreases towards $0$. We then picked the values for which the loss curve better approached zero. The main reason we did not let our models to be trained passed $200$ epochs was purely due to our limited computational resources. We will discuss these issues further in the **Discussion** section."
|
250 | 250 | ]
|
251 | 251 | },
|
|
0 commit comments