Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

language detection failing on basic English input #41

Closed
illerucis opened this issue Dec 23, 2021 · 2 comments
Closed

language detection failing on basic English input #41

illerucis opened this issue Dec 23, 2021 · 2 comments

Comments

@illerucis
Copy link

illerucis commented Dec 23, 2021

hey folks -

testing out this library with some sample posts from our application, and I'm getting some strange results:

console.log
  post text: take me back home

console.log
  language detection results: [ [ 'pidgin', 0.3275 ], [ 'hawaiian', 0.2816666666666666 ] ]
console.log
post text: i like your hair

console.log
  language detection results: [ [ 'hawaiian', 0.26625 ], [ 'norwegian', 0.25145833333333334 ] ]

I installed via npm install languagedetect --save

@illerucis
Copy link
Author

english is not in the top 5 results

 console.log
   post text: take me back home

console.log
  language detection results: [
    [ 'pidgin', 0.3275 ],
    [ 'hawaiian', 0.2816666666666666 ],
    [ 'hausa', 0.265625 ],
    [ 'dutch', 0.20395833333333335 ],
    [ 'slovene', 0.19854166666666662 ]
  ]


console.log
  post text: i like your hair

console.log
  language detection results: [
    [ 'hawaiian', 0.26625 ],
    [ 'norwegian', 0.25145833333333334 ],
    [ 'icelandic', 0.23479166666666662 ],
    [ 'turkish', 0.22270833333333329 ],
    [ 'welsh', 0.21479166666666671 ]
  ]

@FGRibreau
Copy link
Owner

Yes, since language-detect is trigram based which is statistically based, sometimes it needs more input data to bring the right result :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants