This project provides a comprehensive solution for processing and analyzing Reuters news data. It includes:
- A Java application for parsing and storing news articles in MongoDB.
- An Apache Spark job for word frequency analysis directly from .sgm files.
- A Java-based sentiment analysis implementation using a Bag-of-Words model which provides polarity of words.
- Data Parsing and Storage: Extracts news articles from .sgm files and stores them in a MongoDB database.
- Word Frequency Analysis: Utilizes Apache Spark to count word frequencies in news articles.
- Sentiment Analysis: Implements a Bag-of-Words model in Java to classify news article titles as positive, negative, or neutral.
- Java
- MongoDB
- Apache Spark
- Bag-of-Words Model