Skip to content

Latest commit

 

History

History
20 lines (15 loc) · 857 Bytes

README.md

File metadata and controls

20 lines (15 loc) · 857 Bytes

BigDataETLAndSentimentAnalysis

Overview

This project provides a comprehensive solution for processing and analyzing Reuters news data. It includes:

  • A Java application for parsing and storing news articles in MongoDB.
  • An Apache Spark job for word frequency analysis directly from .sgm files.
  • A Java-based sentiment analysis implementation using a Bag-of-Words model which provides polarity of words.

Features

  • Data Parsing and Storage: Extracts news articles from .sgm files and stores them in a MongoDB database.
  • Word Frequency Analysis: Utilizes Apache Spark to count word frequencies in news articles.
  • Sentiment Analysis: Implements a Bag-of-Words model in Java to classify news article titles as positive, negative, or neutral.

Technologies Used

  • Java
  • MongoDB
  • Apache Spark
  • Bag-of-Words Model