Skip to content

Merlin - bring back auto-language detection #121

@smahoney58

Description

@smahoney58

Newman used to have the capability to auto detect the language used in an email and index it appropriately. Now the user has to pick the language before ingesting. Problems with this include:

1 - how do you know what language is used in the email before you ingest
2 - only works if there are just two languages is in the email dataset (i.e. email datasets that have English, Spanish, and Chinese emails can't be processed since you can only pick one other language).

Currently, the only other language supported is Spanish. Issue #120 is the request to support other languages.

In general, how version 4.x handles multiple languages needs to be re-designed and re-implemented. Almost every dataset we have ingested includes multiple languages.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions