-
Notifications
You must be signed in to change notification settings - Fork 2
Data structure
robertogithub edited this page Jul 30, 2013
·
11 revisions
users
{
"_id" : ObjectId("51dad1aad8c6cd4700000001"),
"created_at" : ISODate("2013-07-08T14:50:18.146Z"),
"current_sign_in_at" : ISODate("2013-07-17T21:03:40.785Z"),
"current_sign_in_ip" : "127.0.0.1",
"email" : "roberto@gmail.com",
"encrypted_password" : "$2a$10$TFi0BTBPbfK4igvFGe6yqODDzAgCtprrZTw/mfSXcQsof8z6zLg4a",
"last_sign_in_at" : ISODate("2013-07-09T10:35:11.378Z"),
"last_sign_in_ip" : "127.0.0.1",
"name" : "Roberto Bartolome",
"sign_in_count" : 4,
"updated_at" : ISODate("2013-07-17T21:03:40.936Z")
}
searches
{
_id: ObjectId("51e7095fd8c6cdef2f000002"),
user_id: ObjectId("51dad1aad8c6cd4700000001"),
body: "coca cola barcelona"
}
tweets
{
_id: ObjectId("51e70943d8c6cd67c8000001"),
search_id: ObjectId("51e7095fd8c6cdef2f000002"),
(and the rest of the tweet's info...)
}
Some interesting commands to know:
$ db.tweets.ensureIndex({search_id: 1})
$ db.tweets.ensureIndex({search_id: 1}).explain()
$ tweet = {author: "the name",...}
$ db.tweets.save(tweet)
$ db.tweets.find()
$ db.books.find({tags: "comic"})
$ db.books.findAndModify({
query:{inprogress: false},
sort:{priority: -1},
update:{ $set: {inprogress: true, started: new Date()}}
})
According to this site we need to worry about the document size. Even if we care less now. According to the documentation : "The maximum BSON document size is 16 megabytes." So if we've got tons of tweets the document size will be exceeded.
So we should split the collections in three: Users, Searches and Tweets, and use 'referencing' instead of 'embedding'. The structure should be like the one above. What do you think?
About the user and tweet:
- name (i.e. Diego Sanchez)
- screenname (i.e. @Dieguitson)
- date and time
- language
- city
- country
About the tweet content:
- tweet text
- hashtags: list of keywords
- links
About the interaction:
- RT (if it is a retweet, get the original source if possible)
- in_reply_to_screen_name
- favorited
About the relevant
- number of followers_count
- number of friends_count