Word embeddings for the the web
Blog post: https://mb-14.github.io/tech/2019/02/19/word-embeddings-js.html
Word embeddings often require a large number of parameters which results in a large memory and storage footprint.
This makes deploying pre-trained word embeddings like fastText and GloVe in mobile and browser environments very difficult. In this project,
we will compress pre-trained word vectors using simple post-processing techniques like PCA dimensionality reduction and production quantization.
The resulting embeddings are significantly smaller compared to the original embeddings with no considerable drop in accuracy. The final vectors along with the helper methods to access them are bundled into a javascript library. The library uses tensorflowjs to decode the word embeddings and perform general purpose operations on it. To speed up inferencing, we set the runtime backend to wasm
for accelerated CPU calculations at near native speed.
You can check out the demo of the js library on this page: https://mb-14.github.io/embeddings.js
- compressor - Module to compress pretrained word embeddings using PCA and product quantization
- sentiment_classification - LSTM model for sentiment classifcation trained on the sentiment140 dataset
This project uses yarn for dependencies
yarn
yarn run demo
You can then check all the demos at http://localhost:8080
yarn build