From f5ae86822196cc1c36b171803bf4aaca0db3ca3f Mon Sep 17 00:00:00 2001
From: Shiva Shankar <shankaarshiva@gmail.com>
Date: Wed, 13 Jul 2022 00:35:13 +0800
Subject: [PATCH] Add Haystack Annotation Tool

SQuAD format focuses on the task of question answering, thus Haystack Annotation Tool provides a easy way to train NLP models.
---
 README.md | 101 +++++++++++++++++++++++++++++++++---------------------
 1 file changed, 62 insertions(+), 39 deletions(-)

diff --git a/README.md b/README.md
index 7c87e60..01751b9 100644
--- a/README.md
+++ b/README.md
@@ -12,43 +12,65 @@ _Please read the [contribution guidelines](contributing.md) before contributing.
 
 ## Contents
 
-* [Research Summaries and Trends](#research-summaries-and-trends)
-* [Prominent NLP Research Labs](#prominent-nlp-research-labs)
-* [Tutorials](#tutorials)
-  * [Reading Content](#reading-content)
-  * [Videos and Courses](#videos-and-online-courses)
-  * [Books](#books)
-* [Libraries](#libraries)
-  * [Node.js](#user-content-node-js)
-  * [Python](#user-content-python)
-  * [C++](#user-content-c++)
-  * [Java](#user-content-java)
-  * [Kotlin](#user-content-kotlin)
-  * [Scala](#user-content-scala)
-  * [R](#user-content-r)
-  * [Clojure](#user-content-clojure)
-  * [Ruby](#user-content-ruby)
-  * [Rust](#user-content-rust)
-* [Services](#services)
-* [Annotation Tools](#annotation-tools)
-* [Datasets](#datasets)
-* [NLP in Korean](#nlp-in-korean)
-* [NLP in Arabic](#nlp-in-arabic)
-* [NLP in Chinese](#nlp-in-chinese)
-* [NLP in German](#nlp-in-german)
-* [NLP in Polish](#nlp-in-polish)
-* [NLP in Spanish](#nlp-in-spanish)
-* [NLP in Indic Languages](#nlp-in-indic-languages)
-* [NLP in Thai](#nlp-in-thai)
-* [NLP in Danish](#nlp-in-danish)
-* [NLP in Vietnamese](#nlp-in-vietnamese)
-* [NLP for Dutch](#nlp-for-dutch)
-* [NLP in Indonesian](#nlp-in-indonesian)
-* [NLP in Urdu](#nlp-in-urdu)
-* [NLP in Persian](#nlp-in-persian)
-* [NLP in Ukrainian](#nlp-in-ukrainian)
-* [Other Languages](#other-languages)
-* [Credits](#credits)
+- [awesome-nlp](#awesome-nlp)
+  - [Contents](#contents)
+  - [Research Summaries and Trends](#research-summaries-and-trends)
+  - [Prominent NLP Research Labs](#prominent-nlp-research-labs)
+  - [Tutorials](#tutorials)
+    - [Reading Content](#reading-content)
+    - [Videos and Online Courses](#videos-and-online-courses)
+    - [Books](#books)
+  - [Libraries](#libraries)
+    - [Services](#services)
+    - [Annotation Tools](#annotation-tools)
+  - [Techniques](#techniques)
+    - [Text Embeddings](#text-embeddings)
+      - [Word Embeddings](#word-embeddings)
+      - [Sentence and Language Model Based Word Embeddings](#sentence-and-language-model-based-word-embeddings)
+    - [Question Answering and Knowledge Extraction](#question-answering-and-knowledge-extraction)
+  - [Datasets](#datasets)
+  - [Multilingual NLP Frameworks](#multilingual-nlp-frameworks)
+  - [NLP in Korean](#nlp-in-korean)
+    - [Libraries](#libraries-1)
+    - [Blogs and Tutorials](#blogs-and-tutorials)
+    - [Datasets](#datasets-1)
+  - [NLP in Arabic](#nlp-in-arabic)
+    - [Libraries](#libraries-2)
+    - [Datasets](#datasets-2)
+  - [NLP in Chinese](#nlp-in-chinese)
+    - [Libraries](#libraries-3)
+    - [Anthology](#anthology)
+  - [NLP in German](#nlp-in-german)
+  - [NLP in Polish](#nlp-in-polish)
+  - [NLP in Spanish](#nlp-in-spanish)
+    - [Libraries](#libraries-4)
+    - [Data](#data)
+    - [Word and Sentence Embeddings](#word-and-sentence-embeddings)
+  - [NLP in Indic languages](#nlp-in-indic-languages)
+    - [Data, Corpora and Treebanks](#data-corpora-and-treebanks)
+      - [Corpora/Datasets that need a login/access can be gained via email](#corporadatasets-that-need-a-loginaccess-can-be-gained-via-email)
+    - [Language Models and Word Embeddings](#language-models-and-word-embeddings)
+    - [Libraries and Tooling](#libraries-and-tooling)
+  - [NLP in Thai](#nlp-in-thai)
+    - [Libraries](#libraries-5)
+    - [Data](#data-1)
+  - [NLP in Danish](#nlp-in-danish)
+  - [NLP in Vietnamese](#nlp-in-vietnamese)
+    - [Libraries](#libraries-6)
+    - [Data](#data-2)
+  - [NLP for Dutch](#nlp-for-dutch)
+  - [NLP in Indonesian](#nlp-in-indonesian)
+    - [Datasets](#datasets-3)
+    - [Libraries & Embedding](#libraries--embedding)
+  - [NLP in Urdu](#nlp-in-urdu)
+    - [Datasets](#datasets-4)
+    - [Libraries](#libraries-7)
+  - [NLP in Persian](#nlp-in-persian)
+    - [Libraries](#libraries-8)
+    - [Datasets](#datasets-5)
+  - [NLP in Ukrainian](#nlp-in-ukrainian)
+  - [Other Languages](#other-languages)
+  - [License](#license)
 
 ## Research Summaries and Trends
 
@@ -69,7 +91,7 @@ _Please read the [contribution guidelines](contributing.md) before contributing.
 * [Language Technologies Institute, Carnegie Mellon University](http://www.cs.cmu.edu/~nasmith/nlp-cl.html) - Notable projects include [Avenue Project](http://www.cs.cmu.edu/~avenue/), a syntax driven machine translation system for endangered languages like Quechua and Aymara and previously, [Noah's Ark](http://www.cs.cmu.edu/~ark/) which created [AQMAR](http://www.cs.cmu.edu/~ark/AQMAR/) to improve NLP tools for Arabic.
 * [NLP research group, Columbia University](http://www1.cs.columbia.edu/nlp/index.cgi) - Responsible for creating BOLT ( interactive error handling for speech translation systems) and an un-named project to characterize laughter in dialogue.
 * [The Center or Language and Speech Processing, John Hopkins University](http://clsp.jhu.edu/) - Recently in the news for developing speech recognition software to create a diagnostic test or Parkinson's Disease, [here](https://www.clsp.jhu.edu/2019/03/27/speech-recognition-software-and-machine-learning-tools-are-being-used-to-create-diagnostic-test-for-parkinsons-disease/#.XNFqrIkzYdU).
-* [Computational Linguistics and Information Processing Group, University of Maryland](https://wiki.umiacs.umd.edu/clip/index.php/Main_Page) - Notable contributions include [Human-Computer Cooperation or Word-by-Word Question Answering](http://www.umiacs.umd.edu/~jbg/projects/IIS-1652666) and modeling development of phonetic representations. 
+* [Computational Linguistics and Information Processing Group, University of Maryland](https://wiki.umiacs.umd.edu/clip/index.php/Main_Page) - Notable contributions include [Human-Computer Cooperation or Word-by-Word Question Answering](http://www.umiacs.umd.edu/~jbg/projects/IIS-1652666) and modeling development of phonetic representations.
 * [Penn Natural Language Processing, University of Pennsylvania](https://nlp.cis.upenn.edu/)- Famous for creating the [Penn Treebank](https://www.seas.upenn.edu/~pdtb/).
 * [The Stanford Nautral Language Processing Group](https://nlp.stanford.edu/)- One of the top NLP research labs in the world, notable for creating [Stanford CoreNLP](https://nlp.stanford.edu/software/corenlp.shtml) and their [coreference resolution system](https://nlp.stanford.edu/software/dcoref.shtml)
 
@@ -174,7 +196,7 @@ Material can be found [here](https://github.com/aws-samples/aws-machine-learning
   - [Rita DSL](https://github.com/zaibacu/rita-dsl) - a DSL, loosely based on [RUTA on Apache UIMA](https://uima.apache.org/ruta.html). Allows to define language patterns (rule-based NLP) which are then translated into [spaCy](https://spacy.io/), or if you prefer less features and lightweight - regex patterns.
   - [Transformers](https://github.com/huggingface/transformers) - Natural Language Processing for TensorFlow 2.0 and PyTorch.
   - [Tokenizers](https://github.com/huggingface/tokenizers) - Tokenizers optimized for Research and Production.
-  - [fairSeq](https://github.com/pytorch/fairseq) Facebook AI Research implementations of SOTA seq2seq models in Pytorch. 
+  - [fairSeq](https://github.com/pytorch/fairseq) Facebook AI Research implementations of SOTA seq2seq models in Pytorch.
   - [corex_topic](https://github.com/gregversteeg/corex_topic) - Hierarchical Topic Modeling with Minimal Domain Knowledge
   - [Sockeye](https://github.com/awslabs/sockeye) - Neural Machine Translation (NMT) toolkit that powers Amazon Translate.
   - [DL Translate](https://github.com/xhlulu/dl-translate) - A deep learning-based translation library for 50 languages, built on `transformers` and Facebook's mBART Large.
@@ -278,6 +300,7 @@ NLP as API with higher level functionality such as NER, Topic tagging and so on
 - [Datasaur](https://datasaur.ai/) support various NLP tasks for individual or teams, freemium based
 - [Konfuzio](https://konfuzio.com/en/) - team-first hosted and on-prem text, image and PDF annotation tool powered by active learning, freemium based, costs $
 - [UBIAI](https://ubiai.tools/) - Easy-to-use text annotation tool for teams with most comprehensive auto-annotation features. Supports NER, relations and document classification as well as OCR annotation for invoice labeling, costs $
+- [Haystack Annotation Tool](https://haystack.deepset.ai/components/annotation) - is free and open source, web-based annotation tool (or local docker container). Easily create questions and answer labels in SQuAD style or instead, use a series of predetermined questions and search the document for the answers (Natural Questions). Export labels in SQuAD format.
 
 ## Techniques