|
| 1 | +### Papers |
| 2 | + |
| 3 | +- **Dropout** (2012, 2014) |
| 4 | + - **`Regulaizer`**, **`Ensemble`** |
| 5 | + - [arXiv (2012)](https://arxiv.org/abs/1207.0580), [arXiv (2014)](https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf), [note](notes/dropout.md) |
| 6 | +- Regularization of Neural Networks using DropConnect (2013) |
| 7 | + - **`Regulaizer`**, **`Ensemble`** |
| 8 | + - [paper](https://cs.nyu.edu/~wanli/dropc/dropc.pdf), [note](notes/dropconnect.md), [wanli_summary](https://cs.nyu.edu/~wanli/dropc/) |
| 9 | +- Recurrent Neural Network Regularization (2014. 9) |
| 10 | + - **`RNN`**, **`Dropout to Non-Recurrent Connections`** |
| 11 | + - [arXiv](https://arxiv.org/abs/1409.2329) |
| 12 | +- **Batch Normalization** (2015. 2) |
| 13 | + - **`Regulaizer`**, **`Accelerate Training`**, **`CNN`** |
| 14 | + - [arXiv](https://arxiv.org/abs/1502.03167), [note](notes/batch_normalization.md) |
| 15 | +- Training Very Deep Networks (2015. 7) |
| 16 | + - **`Highway`**, **`LSTM-like`** |
| 17 | + - [arXiv](https://arxiv.org/abs/1507.06228), [note](notes/highway_networks.md) |
| 18 | +- A Theoretically Grounded Application of Dropout in Recurrent Neural Networks (2015. 12) |
| 19 | + - **`Variational RNN`**, **`Dropout - RNN`**, **`Bayesian interpretation`** |
| 20 | + - [arXiv](https://arxiv.org/abs/1512.05287) |
| 21 | +- Deep Networks with Stochastic Depth (2016. 3) |
| 22 | + - **`Dropout`**, **`Ensenble`**, **`Beyond 1000 layers`** |
| 23 | + - [arXiv](https://arxiv.org/abs/1603.09382), [note](notes/stochastic_depth.md) |
| 24 | +- Adaptive Computation Time for Recurrent Neural Networks (2016. 3) |
| 25 | + - **`ACT`**, **`Dynamically`**, **`Logic Task`** |
| 26 | + - [arXiv](https://arxiv.org/abs/1603.08983) |
| 27 | +- Layer Normalization (2016. 7) |
| 28 | + - **`Regulaizer`**, **`Accelerate Training`**, **`RNN`** |
| 29 | + - [arXiv](https://arxiv.org/abs/1607.06450), [note](notes/layer_normalization.md) |
| 30 | +- Recurrent Highway Networks (2016. 7) |
| 31 | + - **`RHN`**, **`Highway`**, **`Depth`**, **`RNN`** |
| 32 | + - [arXiv](https://arxiv.org/abs/1607.03474), [note](notes/recurrent_highway.md) |
| 33 | +- Using Fast Weights to Attend to the Recent Past (2016. 10) |
| 34 | + - **`Cognitive`**, **`Attention`**, **`Memory`** |
| 35 | + - [arXiv](https://arxiv.org/abs/1610.06258), [note](notes/fast_weights_attn.md) |
| 36 | +- Professor Forcing: A New Algorithm for Training Recurrent Networks (2016. 10) |
| 37 | + - **`Professor Forcing`**, **`RNN`**, **`Inference Problem`**, **`Training with GAN`** |
| 38 | + - [arXiv](https://arxiv.org/abs/1610.09038), [note](notes/professor_forcing.md) |
| 39 | +- Equality of Opportunity in Supervised Learning (2016. 10) |
| 40 | + - **`Equalized Odds`**, **`Demographic Parity`**, **`Bias`** |
| 41 | + - [arXiv](https://arxiv.org/abs/1610.02413), [the_morning_paper](https://blog.acolyer.org/2018/05/07/equality-of-opportunity-in-supervised-learning/) |
| 42 | +- Categorical Reparameterization with Gumbel-Softmax (2016. 11) |
| 43 | + - **`Gumbel-Softmax distribution `**, **`Reparameterization`**, **`Smooth relaxation`** |
| 44 | + - [arXiv](https://arxiv.org/abs/1611.01144), [open_review](https://openreview.net/forum?id=rkE3y85ee) |
| 45 | +- Understanding deep learning requires rethinking generalization (2016. 11) |
| 46 | + - **`Generalization Error`**, **`Role of Regularization`** |
| 47 | + - [arXiv](https://arxiv.org/abs/1611.03530) |
| 48 | +- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer (2017. 1) |
| 49 | + - **`MoE Layer`**, **`Sparsely-Gated`**, **`Capacity`**, **`Google Brain`** |
| 50 | + - [arXiv](https://arxiv.org/abs/1701.06538), [note](notes/very_large_nn_moe_layer.md) |
| 51 | +- **A simple neural network module for relational reasoning** (2017. 6) |
| 52 | + - **`Relational Reasoning`**, **`DeepMind`** |
| 53 | + - [arXiv](https://arxiv.org/abs/1706.01427), [note](notes/relational_network.md), [code](https://github.com/DongjunLee/relation-network-tensorflow) |
| 54 | +- On Calibration of Modern Neural Networks (2017. 6) |
| 55 | + - **`Confidence calibration`**, **`Maximum Calibration Error (MCE)`** |
| 56 | + - [arXiv](https://arxiv.org/abs/1706.04599) |
| 57 | +- When is a Convolutional Filter Easy To Learn? (2017. 9) |
| 58 | + - **`Conv + ReLU`**, **`Non-Gaussian Case`**, **`Polynomial Time`** |
| 59 | + - [arXiv](https://arxiv.org/abs/1709.06129), [open_review](https://openreview.net/forum?id=SkA-IE06W) |
| 60 | +- mixup: Beyond Empirical Risk Minimization (2017. 10) |
| 61 | + - **`Data Augmentation`**, **`Vicinal Risk Minimization`**, **`Generalization`** |
| 62 | + - [arXiv](https://arxiv.org/abs/1710.09412), [open_review](https://openreview.net/forum?id=r1Ddp1-Rb) |
| 63 | +- Measuring the tendency of CNNs to Learn Surface Statistical Regularities (2017. 11) |
| 64 | + - **`not learn High Level Semantics`**, **`learn Surface Statistical Regularities`** |
| 65 | + - [arXiv](https://arxiv.org/abs/1711.11561), [the_morning_paper](https://blog.acolyer.org/2018/05/29/measuring-the-tendency-of-cnns-to-learn-surface-statistical-regularities/) |
| 66 | +- MentorNet: Regularizing Very Deep Neural Networks on Corrupted Labels (2017. 12) |
| 67 | + - **`MentorNet - StudentNet`**, **`Curriculum Learning`**, **`Output is Weight`** |
| 68 | + - [arXiv](https://arxiv.org/abs/1712.05055) |
| 69 | +- Deep Learning Scaling is Predictable, Empirically (2017. 12) |
| 70 | + - **`Power-Law Exponents`**, **`Grow Training Sets`** |
| 71 | + - [arXiv](https://arxiv.org/abs/1712.00409), [the_morning_paper](https://blog.acolyer.org/2018/03/28/deep-learning-scaling-is-predictable-empirically/) |
| 72 | +- Sensitivity and Generalization in Neural Networks: an Empirical Study (2018. 2) |
| 73 | + - **`Robustness`**, **`Data Perturbations`**, **`Survey`** |
| 74 | + - [arXiv](https://arxiv.org/abs/1802.08760), [open_review](https://openreview.net/forum?id=HJC2SzZCW) |
| 75 | +- Can recurrent neural networks warp time? (2018. 2) |
| 76 | + - **`RNN`**, **`Learnable Gate`**, **`Chrono Initialization`** |
| 77 | + - [open_review](https://openreview.net/forum?id=SJcKhk-Ab) |
| 78 | +- Spectral Normalization for Generative Adversarial Networks (2018. 2) |
| 79 | + - **`GAN`**, **`Training Discriminator`**, **`Constrain Lipschitz`**, **`Power Method`** |
| 80 | + - [open_review](https://openreview.net/forum?id=B1QRgziT-¬eId=BkxnM1TrM) |
| 81 | +- On the importance of single directions for generalization (2018. 3) |
| 82 | + - **`Importance`**, **`Confusiing Neurons`**, **`Selective Neuron`**, **`DeepMind`** |
| 83 | + - [arXiv](https://arxiv.org/abs/1803.06959), [deepmind_blog](https://deepmind.com/blog/understanding-deep-learning-through-neuron-deletion/) |
| 84 | +- Group Normalization (2018. 3) |
| 85 | + - **`Group Normalization (GN)`**, **`Batch (BN)`**, **`Layer (LN)`**, **`Instance (IN)`**, **`Independent Batch Size`** |
| 86 | + - [arXiv](https://arxiv.org/abs/1803.08494) |
| 87 | +- Fast Decoding in Sequence Models using Discrete Latent Variables (2018. 3) |
| 88 | + - **`Autoregressive`**, **`Latent Transformer`**, **`Discretization`** |
| 89 | + - [arXiv](https://arxiv.org/abs/1803.03382) |
| 90 | +- Delayed Impact of Fair Machine Learning (2018. 3) |
| 91 | + - **`Outcome Curve`**, **`Max Profit, Demographic Parity, Equal Opportunity`** |
| 92 | + - [arXiv](https://arxiv.org/abs/1803.04383), [the_morning_paper](https://blog.acolyer.org/2018/08/13/delayed-impact-of-fair-machine-learning/), [bair_blog](https://bair.berkeley.edu/blog/2018/05/17/delayed-impact/) |
| 93 | +- How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) (2018. 5) |
| 94 | + - **`Smoothing Effect`**, **`BatchNorm’s Reparametrization`** |
| 95 | + - [arXiv](https://arxiv.org/abs/1805.11604) |
| 96 | +- When Recurrent Models Don't Need To Be Recurrent (2018. 5) |
| 97 | + - **`Approximate`**, **`Feed-Forward`** |
| 98 | + - [arXiv](https://arxiv.org/abs/1805.10369), [bair_blog](http://bair.berkeley.edu/blog/2018/08/06/recurrent/) |
| 99 | +- Relational inductive biases, deep learning, and graph networks (2018, 6) |
| 100 | + - **`Survey`**, **`Relation`**, **`Graph`** |
| 101 | + - [arXiv](https://arxiv.org/abs/1806.01261) |
| 102 | +- Universal Transformers (2018. 7) |
| 103 | + - **`Transformer`**, **`Weight Sharing`**, **`Adaptive Computation Time (ACT)`** |
| 104 | + - [arXiv](https://arxiv.org/abs/1807.03819), [google_ai_blog](https://ai.googleblog.com/2018/08/moving-beyond-translation-with.html) |
0 commit comments