In the process of software development, many questions arise. Developers often resort to Q&A websites like Stack Overflow to seek answers. Stack Overflow, a part of the Stack Exchange Network, allows users to ask and answer questions, vote on content, and earn reputation points and badges. Higher reputation unlocks additional privileges such as voting, commenting, and editing.
This project involves developing a web application using Spring Boot to store, analyze, and visualize Stack Overflow Q&A data related to Java programming. The goal is to understand common questions, answers, and resolution activities associated with Java programming.
- Objective: Collect Stack Overflow threads tagged
java
. - Requirement: Gather at least 1000 threads.
- Approach:
- Use Stack Overflow's REST API for data retrieval.
- Persist data offline in a database (PostgreSQL, MySQL, etc.) or local files.
- Avoid on-the-fly API requests; all analysis should be performed on collected data.
Each question in this section requires:
- Identifying relevant data.
- Implementing backend data analysis.
- Visualizing results on the frontend.
Question: What are the top N most frequently asked Java topics on Stack Overflow?
Question: What are the top N topics with the most engagement from high-reputation users?
- Engagement includes edits, answers, comments, upvotes, and downvotes.
Question: What are the top N errors and exceptions frequently discussed by Java developers?
- Classify errors as fatal errors (e.g.,
OutOfMemoryError
) and exceptions (e.g., checked and runtime exceptions). - Extract error-related data from thread content using techniques like regular expressions.
Question: What factors contribute to high-quality answers (accepted or highly upvoted answers)?
- Investigate:
- Elapsed time between question creation and first answer.
- Reputation of the answering user.
- Propose one additional factor affecting answer quality.
- Visualize findings effectively.
Develop a REST API to answer the following questions in JSON format:
- Topic Frequency: Retrieve the frequency of a specific topic or the top N topics sorted by frequency.
- Bug Frequency: Retrieve the frequency of a specific error/exception or the top N errors/exceptions sorted by frequency.
- Implement data analysis using Java Collections, Lambda, and Stream API.
- Backend: Spring Boot.
- Database: PostgreSQL/MySQL (or local files for storage).
- API responses should be in JSON format.
- Start data collection early due to API rate limits and potential instabilities.
- Ensure meaningful data analysis and effective visualizations.
- Utilize frontend charts for presenting insights.
- Use
docker compose up -d --build
to start the services. - Run the project.
- Check the Dockerfile configuration for database connection details.
Deadline: 2024-11-10