Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support clustering of vectors #31

Open
mountain opened this issue Jun 11, 2014 · 0 comments
Open

Support clustering of vectors #31

mountain opened this issue Jun 11, 2014 · 0 comments

Comments

@mountain
Copy link
Member

When dealing with the scenario of personalized recommendation, the user profile vector set usually are very large, for example ~10m vectors or even above. We take 10m vectors as a baseline, because it still possible to store all the 10m data into one physical machine.

10m vectors * 2048 dimensions * 4 byte float = 80 G memory

Current solution does not fit into the level, because write latency would be ~30s which is not acceptable.

One idea is that: we do not recommend for a single users, but for a cluster of similar users.

Two choices: online KMeans or SimHash?

@mountain mountain added this to the 0.2.0 milestone Jun 11, 2014
@mountain mountain changed the title Support kmeans clustering Support vector clustering Jun 14, 2014
@mountain mountain changed the title Support vector clustering Support clustering of vectors Jun 14, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant