CloudProject

The purpose of this product is to provide a robust and distributed framework for realtime extraction, processing along with streamlined analytics on data from various e-commerce websites.

We built this system using the components Scrapy, Apache Kafka, Elasticsearch and Kibana. Each individual component is capable of demonstrating high scalability with large volumes of data within reasonable low latencies.

Main Objectives of the project are

Selecting the targeted websites and extracting the required data through configured Xpaths using Scrapy.
Formatting the extracted data into JSON.
Pushing the JSON data using Kafka Producers into Kafka Brokers as topics.
Reading the data from Topics using Kafka Consumers and indexing the data into Elasticsearch.
Showing the indexed data in Kibana from Elasticsearch in terms of user defined Dashboards

Details of each component and their features are mentioned in respective their respective readme files.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Scrapy Crawler		Scrapy Crawler
consumer		consumer
elasticsearch-1.7.2		elasticsearch-1.7.2
elasticsearch-river-kafka-master		elasticsearch-river-kafka-master
kafkaComponents		kafkaComponents
kafka_2.10-0.8.2.2		kafka_2.10-0.8.2.2
kibana-4.1.2-linux-x64		kibana-4.1.2-linux-x64
README.md		README.md
kafkaComponents-1.0-jar-with-dependencies.jar		kafkaComponents-1.0-jar-with-dependencies.jar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CloudProject

About

Releases

Packages

Languages

chennakesava/CloudProject

Folders and files

Latest commit

History

Repository files navigation

CloudProject

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages