GitHub - agx-leon/word2vec4j: word2vec implementation for java

This is a prototypal implementation of Continuous Skip-gram Models (CSGM) using plain Java and the Fork/Join framework. It has been developed in the scope of a student project at Hochschule der Medien Stuttgart. It yields competitive results when compared to gensim when applied to the first 50k articles of the german wikipedia:

word2vec4j vs. gensim regarding CSGM

\|v\|=100	gensim(numpy)	gensim(cython)	word2vec4j	gensim(BLAS)
kwords/sec	0.16	180.11	205.11	309.87
docs/sec	0.11	138.75	145.19	238.28

This project is currently just a proof-of-concept. Currently there are still paths to local files and folders specific to my machine. The tests are crappy. It lacks documentation. It is really raw. But it will be refined to a full-fledged library in the future. Stay tuned!

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
src		src
.gitignore		.gitignore
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

word2vec4j vs. gensim regarding CSGM

About

Releases

Packages

Languages

agx-leon/word2vec4j

Folders and files

Latest commit

History

Repository files navigation

word2vec4j vs. gensim regarding CSGM

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages