This project performs sentiment analysis on reviews from MP2L Master’s students using a text-processing pipeline based on Apache Spark (PySpark) and a logistic regression model.
- Automate the classification of student reviews (positive, neutral, negative)
- Visualize the distribution of sentiments globally, by semester, and by year
- Provide an interactive web interface to dynamically explore the results
- Python 3: Main programming language
- Apache Spark (PySpark): Distributed computing framework for large-scale data processing
- Flask: Web interface
- Spark MLlib : Machine learning library in Spark used for logistic regression
- Hadoop (requires
winutils.execonfiguration on Windows) - Bibliothèques Python :
pysparkpandasmatplotlibPrettyTable|jsonpathlib|
sentiment-analysis-spark/
├── data/
│ └── avis_etudiants_dataset.csv # Student reviews dataset
├── src/
│ ├── main.py # Main analysis script
│ └── webapp/ # Flask interface
│ ├── app.py
│ ├── static/
│ │ └── style.css
│ ├── templates/
│ │ └── index.html
│ └── images/
│ ├── uvt_logo.png
│ └── isi_logo.png
├── output/ # Results
│ ├── results.txt # Detailed predictions for 15 samples
│ ├── results.png # Sentiment distribution chart
│ ├── sentiments_par_annee.json # Data by year (for the web interface)
│ └── sentiments_par_semestre.json # Data by semester (for the web interface)
└── README.md
- Run the data analysis:
python src/main.py- Launch the web interface:
python src/webapp/app.py- Model accuracy: ~86%
- Predictions (sample) saved in output/results.txt
- Global sentiment distribution chart automatically generated:
- Interactive charts by year available in the Flask interface (loaded from sentiments_by_year.json)
- Interactive charts by semester available in the Flask interface (loaded from sentiments_par_semestre.json)
- Global sentiment distribution visualization
- Interactive charts by year and semester
- Dynamic display using Chart.js
- Smooth navigation and clean design
- Responsive and minimalist interface
- Add new data sources (surveys, forums, etc.)
- Automatically generate educational recommendations
- Implement an authentication system for personalized access
This project is distributed under the MIT License — you are free to reuse it for educational or personal purposes, provided that the original author is credited.


