Skip to content

agx-leon/word2vec4j

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is a prototypal implementation of Continuous Skip-gram Models (CSGM) using plain Java and the Fork/Join framework. It has been developed in the scope of a student project at Hochschule der Medien Stuttgart. It yields competitive results when compared to gensim when applied to the first 50k articles of the german wikipedia:

word2vec4j vs. gensim regarding CSGM

|v|=100gensim(numpy)gensim(cython)word2vec4jgensim(BLAS)
kwords/sec0.16180.11205.11309.87
docs/sec0.11138.75145.19238.28

This project is currently just a proof-of-concept. Currently there are still paths to local files and folders specific to my machine. The tests are crappy. It lacks documentation. It is really raw. But it will be refined to a full-fledged library in the future. Stay tuned!

About

word2vec implementation for java

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published