Repository to show how NLP can tacke real problem. Including the source code, dataset, state-of-the art in NLP
- Data Augmentation in NLP
 - Data Augmentation library for Text
 - Does your NLP model able to prevent adversarial attack?
 - How does Data Noising Help to Improve your NLP Model?
 - Data Augmentation library for Speech Recognition
 - Data Augmentation library for Audio
 - Unsupervied Data Augmentation
 - Adversarial Attacks in Textual Deep Neural Networks
 - Back Translation in Text Augmentation by nlpaug
 
| Section | Sub-Section | Description | Story | 
|---|---|---|---|
| Tokenization | Subword Tokenization | Medium | |
| Tokenization | Word Tokenization | Medium Github | |
| Tokenization | Sentence Tokenization | Medium Github | |
| Part of Speech | Medium Github | ||
| Lemmatization | Medium Github | ||
| Stemming | Medium Github | ||
| Stop Words | Medium Github | ||
| Phrase Word Recognition | |||
| Spell Checking | Lexicon-based | Peter Norvig algorithm | Medium Github | 
| Lexicon-based | Symspell | Medium Github | |
| Machine Translation | Statistical Machine Translation | Medium | |
| Machine Translation | Attention | Medium | |
| String Matching | Fuzzywuzzy | Medium Github | 
| Section | Sub-Section | Research Lab | Story | Source | 
|---|---|---|---|---|
| Traditional Method | Bag-of-words (BoW) | Medium Github | ||
| Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) | Medium Github | |||
| Character Level | Character Embedding | NYU | Medium Github | Paper | 
| Word Level | Negative Sampling and Hierarchical Softmax | Medium | ||
| Word2Vec, GloVe, fastText | Medium Github | |||
| Contextualized Word Vectors (CoVe) | Salesforce | Medium Github | Paper Code | |
| Misspelling Oblivious (word) Embeddings | Medium | Paper | ||
| Embeddings from Language Models (ELMo) | AI2 | Medium Github | Paper Code | |
| Contextual String Embeddings | Zalando Research | Medium | Paper Code | |
| Sentence Level | Skip-thoughts | Medium Github | Paper Code | |
| InferSent | Medium Github | Paper Code | ||
| Quick-Thoughts | Medium | Paper Code | ||
| General Purpose Sentence (GenSen) | Medium | Paper Code | ||
| Bidirectional Encoder Representations from Transformers (BERT) | Medium | Paper(2019) Code | ||
| Generative Pre-Training (GPT) | OpenAI | Medium | Paper(2019) Code | |
| Self-Governing Neural Networks (SGNN) | Medium | Paper | ||
| Multi-Task Deep Neural Networks (MT-DNN) | Microsoft | Medium | Paper(2019) | |
| Generative Pre-Training-2 (GPT-2) | OpenAI | Medium | Paper(2019) Code | |
| Universal Language Model Fine-tuning (ULMFiT) | OpenAI | Medium | Paper Code | |
| BERT in Science Domain | Medium | Paper(2019) Paper(2019) | ||
| BERT in Clinical Domain | NYU/PU | Medium | Paper(2019) Paper(2019) | |
| RoBERTa | UW/Facebook | Medium | Paper(2019) Paper | |
| Unified Language Model for NLP and NLU (UNILM) | Microsoft | Medium | Paper(2019) | |
| Cross-lingual Language Model (XLMs) | Medium | Paper(2019) | ||
| Transformer-XL | CMU/Google | Medium | Paper(2019) | |
| XLNet | CMU/Google | Medium | Paper(2019) | |
| CTRL | Salesforce | Medium | Paper(2019) | |
| ALBERT | Google/Toyota | Medium | Paper(2019) | |
| T5 | Googles | Medium | Paper(2019) | |
| MultiFiT | Medium | Paper(2019) | ||
| XTREME | Medium | Paper(2020) | ||
| REALM | Medium | Paper(2020) | 
| Document Level | lda2vec | | Medium | Paper | | | doc2vec | Google | Medium Github | Paper |
| Section | Sub-Section | Description | Research Lab | Story | Paper & Code | 
|---|---|---|---|---|---|
| Named Entity Recognition (NER) | Pattern-based Recognition | Medium | |||
| Lexicon-based Recognition | Medium | ||||
| spaCy Pre-trained NER | Medium Github | ||||
| Optical Character Recognition (OCR) | Printed Text | Google Cloud Vision API | Medium | Paper | |
| Handwriting | LSTM | Medium | Paper | ||
| Text Summarization | Extractive Approach | Medium Github | |||
| Abstractive Approach | Medium | ||||
| Emotion Recognition | Audio, Text, Visual | 3 Multimodals for Emotion Recognition | Medium | 
| Section | Sub-Section | Description | Research Lab | Story | Paper & Code | 
|---|---|---|---|---|---|
| Feature Representation | Unsupervised Learning | Introduction to Audio Feature Learning | Medium | Paper 1 Paper 2 Paper 3 | |
| Feature Representation | Unsupervised Learning | Speech2Vec and Sentence Level Embeddings | Medium | Paper 1 Paper 2 | |
| Feature Representation | Unsupervised Learning | Wav2vec | Medium | Paper | |
| Speech-to-text | Introduction to Speeh-to-text | Medium | 
| Section | Sub-Section | Description | Research Lab | Story | Paper & Code | 
|---|---|---|---|---|---|
| Euclidean Distance, Cosine Similarity and Jaccard Similarity | Medium Github | ||||
| Edit Distance | Levenshtein Distance | Medium Github | |||
| Word Moving Distance (WMD) | Medium Github | ||||
| Supervised Word Moving Distance (S-WMD) | Medium | ||||
| Manhattan LSTM | Medium | Paper | 
| Section | Sub-Section | Description | Research Lab | Story | Paper & Code | 
|---|---|---|---|---|---|
| ELI5, LIME and Skater | Medium Github | ||||
| SHapley Additive exPlanations (SHAP) | Medium Github | ||||
| Anchors | Medium Github | 
| Section | Sub-Section | Description | Research Lab | Story | Paper & Code | 
|---|---|---|---|---|---|
| Embeddings | TransE, RESCAL, DistMult, ComplEx, PyTorch BigGraph | Medium | RESCAL(2011) TransE(2013) DistMult(2015) ComplEx(2016) PyTorch BigGraph(2019) | ||
| Embeddings | DeepWalk, node2vec, LINE, GraphSAGE | Medium | DeepWalk(2014) node2vec(2015) LINE(2015) GraphSAGE(2018) | ||
| Embeddings | WLG, GCN, GAT, GIN | Medium | WLG(2011) GCN2017) GAT(2017) GraphSAGE(2018) | ||
| Embeddings | PinSAGE(2018) | Medium | |||
| Embeddings | HoIE(2015), SimpIE(2018) | Medium | |||
| Embeddings | ContE(2017), ETE(2017) | Medium | 
| Section | Sub-Section | Description | Story | 
|---|---|---|---|
| Introduction | Matching Nets(2016) MANN(2016) LSTM-based meta-learner(2017) Prototypical Networks(2017) ARC(2017) MAML(2017) MetaNet(2017) | Medium | |
| NLP | Dialog Generation | DAML(2019), PAML(2019), NTMS(2019) | Medium | 
| Classification | Intent Embeddings(2016) LEOPARD(2019) | Medium | |
| CV | Unsupervised Learning | CACTUs(2018) | Medium | 
| General | Siamese Network(1994), Triplet Network(2015) | Medium | |
| MAML+(2018) | Medium | 
| Section | Sub-Section | Description | Research Lab | Story | Paper & Code | 
|---|---|---|---|---|---|
| Object Detection | R-CNN | Medium | Paper(2013) | ||
| Object Detection | Fast R-CNN | Medium | Paper(2015) | ||
| Object Detection | Faster R-CNN | Medium | Paper(2015) | ||
| Object Detection | VGGNet | Medium | Paper(2014) | ||
| Instance Segmentation | Mask R-CNN | FAIR | Medium | Paper(2017) | |
| Image Classification | ResNet(2015) | Microsoft | Medium | ||
| Image Classification | ResNeXt(2016) | Medium | 
| Section | Sub-Section | Description | Story | 
|---|---|---|---|
| Introduction | Medium | ||
| Classification | Confusion Matrix, ROC, AUC | Medium | |
| Regression | MAE, MSE, RMSE, MAPE, WMAPE | Medium | |
| Textual | Perplexity, BLEU, GER, WER, GLUE | Medium | 
| Section | Sub-Section | Description | Link | 
|---|---|---|---|
| Spellcheck | Github | ||
| InferSent | Github |