Skip to content

Commit 5cf9aa0

Browse files
authored
Merge pull request #893 from aravinth/patch-1
Update README.md
2 parents 75a5feb + 1f3f100 commit 5cf9aa0

File tree

1 file changed

+7
-4
lines changed

1 file changed

+7
-4
lines changed

README.md

+7-4
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,12 @@
1010
| Translator | [![Build Status](http://jenkins.idc.tarento.com/buildStatus/icon?job=anuvaad%2Fanuvaad-etl-translator)](http://jenkins.idc.tarento.com/job/anuvaad/job/anuvaad-etl-translator/) |
1111

1212

13+
# Anuvaad
1314

15+
Anuvaad is an AI based open source Document Translation Platform to translate documents in Indic languages at scale. Anuvaad provides easy-to-edit capabilities on top the plug & play NMT models. Separate instances of Anuvaad are deployed to Diksha (NCERT), Supreme Court of India (SUVAS) and Supreme Court of Bangladesh (Amar Vasha).
16+
17+
<img width="1135" alt="image" src="https://github.com/project-anuvaad/anuvaad/assets/1707796/84426483-e948-470f-b525-819fb374e77e">
1418

15-
![Anuvaad Solution Diagram](https://github.com/project-anuvaad/anuvaad/blob/master/anuvaad-documentation/images/Anuvaad_Solution_Diagram.png)
1619

1720
### Components ###
1821

@@ -28,7 +31,7 @@ Layout Detector | Microservice interface for Layout detection model.
2831
Block Segmenter | Handles layout detection miss-classifications , region unifying.
2932
Word Detector | Word detection.
3033
Block Merger | An OCR system that extracts texts, images, tables, blocks etc from the input file and makes it avaible in the format which can be utilised by downstream services to perform Translation. This can also be used as an independent product that can perform OCR on files, images, ppts, etc.
31-
Translator | Translator pushes sentences to [OpenNMT](https://opennmt.net/) which are translated and pushed back during the document translation flow.
34+
Translator | Translator pushes sentences to IndicTrans which are translated and pushed back during the document translation flow.
3235
Content Handler | Repository Microservice which maintains and manages all the translated documents
3336
Translation Memory X(TMX) | System translation memory to facilitate overriding NMT translation with user preferred translation. TMX provides three levels of caching - Global , User , Organisation.
3437
User Translation Memory(UTM) | System tracks and remembers individual user translations or corrected translations and applies automatically when same sentences are encountered again.
@@ -41,13 +44,13 @@ Component | Details
4144
[Google Vision](https://cloud.google.com/vision) | Used for OCR in Document Digitization v1.0 , v1.5. Replaced with custom trained Tesseract in latest versions.
4245
[CRAFT](https://github.com/clovaai/CRAFT-pytorch) | Used for Line detection.
4346
[Tesseract](https://github.com/tesseract-ocr) | Custom trained Tesseract used for OCR.
44-
[OpenNMT](https://opennmt.net/) | Custom trained OpenNMT used for translation.
47+
[IndicTrans](https://github.com/AI4Bharat/indicTrans) | Custom trained Indic NMT model used for translation.
4548

4649
### Technology Stack ###
4750

4851
Component | Details
4952
------------- | -------------
50-
[Apache Kafka](https://kafka.apache.org/) | Translator and [OpenNMT](https://opennmt.net/) are integrated through Kafka messaging.
53+
[Apache Kafka](https://kafka.apache.org/) | Translator and [IndicTrans](https://github.com/AI4Bharat/indicTrans) are integrated through Kafka messaging.
5154
[MongoDB](https://www.mongodb.com/) | Primary data storage.
5255
[Redis](https://redis.io/) | Secondary in memory storage.
5356
Cloud Storage | Samba storage is used to store user input files.

0 commit comments

Comments
 (0)