Skip to content

History DB is a trully scalable (hundreds of millions updates per day) distributed archive system with per user and per day activity statistics

Notifications You must be signed in to change notification settings

reverbrain/historydb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

History DB is a trully scalable (hundreds of millions updates per day) distributed archive system with per user and per day activity statistics.

History DB uses elliptics as storage backend. It supports 2 record types: user logs (updated via append or write) and activity. Each user logs file hosts user logs for certain user and for specified period (by default this equals ot 1 day). Activity is secondary index which hosts information about who has activity during specified period (by default this equals to 1 day).

User log primary key is username (string prefix) + '.' + timestamp (of the day).

Activity primary key is timestamp (of the day) + '.' + chunk number.

All daily activity is devided to chunks. One can specify own activity prefix if needed.

User log is a blob which can be either appended or rewritten.

Interface of the History DB presented in provider.h file.

provider::provider() - contructor

provider::set_session_parameters() - sets parameters for all sessions.
	It includes vector of elliptics groups (replicas) in which HistoryDB stores data and
	minimum number of succeded writes.

provider::add_log - appends data to user log

provider::add_activity - updates user activity

provider::add_log_with_activity - appends data to user log and updates user activity

provider::get_user_logs() - gets user logs.

provider::get_active_user() - gets active user for specified day.

provider::for_user_logs() - iterates over user's logs in specified time period.

provider::for_active_user() - iterates over activity logs in specified time period.

One can grab user logs for specified for specified period of time as well as list of all users, who were active (had at least one log update) during requested period of time.

See extended tutorial

Firstly, complete elliptics tutorial. It is neccessary to start work with elliptics.

Download source tree

git clone http://github.com/reverbrain/historydb.git

Building library

cd historydb
debuild
sudo dpkg -i ../historydb_*.deb

Now you can start to use HistoryDB library: Include historydb/provider.h. Create provider instance. Use provider instance to write/read user logs, update activity, gets active user statistics, repartition shards etc.

HistoryDB has follow HTTP interface implemented via fastcgi-daemon2:

"/add_log" POST - adds record to user logs.
	Parameters:
		user - name of the user
		data - data of the log record
		time or key. If both: key and time are specified - key will be used
			time - timestamp of record
			key - custom key of record

"/add_activity" POST - marks user as active in the day.
	Parameters:
		user - name of the user
		time or key. If both: key and time are specified - key will be used
			time - timestamp of activity statistics
			key - custom key of activity statistics
			
"/add_log_with_activity" POST - Appends log to user logs and updates user activity.
	Parameters:
		user - name of user
		data - data of log record
		time or key. If both: key and time are specified - key will be used
			time - timestamp of record
			key - custom key of record

"/get_active_users" GET - returns users who was active in the day.
	Parameters:
		time or key. If both: key and time are specified - key will be used
			time - timestamp of activity statistics
			key - custom key of activity statistics

"/get_user_logs" GET - returns logs of user.
	Parameters:
		user - name of the user
		begin_time and end_time - time period for logs
		
"/" POST&GET - has no parameters. If all is ok - returns HTTP 200. May be used for checking service.

HistoryDB component element should have follow children:

<log_file>/path/to/log_file</log_file> - path to elliptics client logs

<log_level>LOG_LEVEL</log_level> - valid values = { DATA, ERROR, INFO, NOTICE, DEBUG }

<elliptics> - one <elliptics> for each elliptics node
	<addr>address</addr> - address of elliptics node
	<port>port</port> - listening port on elliptics node
	<family>family</family> - protocol family
</elliptics>

<group>group_number</group> - group number with which historydb will works. One  for each elliptics group.

<min_writes>number</min_writes> - minimum number of succeded writes in groups. For example, if historydb tries to write in 5 groups and min_writes is 3
the attemp will be failed if write will be succeded in less then 3 groups.

The tool goes through activity statisitics for specified days, aggregates logs for each active users and saves it in specified new subkeys.

Usage: hdb_tool.py KEYS NEW_KEY [options]

Options:
	-h, --help            show this help message and exit
	-b BATCH_SIZE, --batch-size=BATCH_SIZE
		Number of keys in read_bulk/write_bulk batch [default:1024]
	-d, --debug           Enable debug output [default: False]
	-r ELLIPTICS_REMOTE, --remote=ELLIPTICS_REMOTE
		Elliptics node address [default: none]
	-g ELLIPTICS_GROUPS, --groups=ELLIPTICS_GROUPS
		Comma separated list of groups [default: all]
	-l FILE, --log=FILE   Output log messages from library to file [default:hdb_tool.log]
	-L ELLIPTICS_LOG_LEVEL, --log-level=ELLIPTICS_LOG_LEVEL
		Elliptics client verbosity [default: 1]
	-u USERS, --user=USERS
		User whose logs should be aggregated

About

History DB is a trully scalable (hundreds of millions updates per day) distributed archive system with per user and per day activity statistics

Resources

Stars

Watchers

Forks

Packages

No packages published