Please note that comments are available in the notebook itself:
EE0005_Mini_Project_Fraudulent_Job_Postings_(FINAL).ipynb
Due to the large amount of modules imported, we have made a duplicate Google Colab notebook here for ease of running this notebook.
- Identify fake job postings by sentiment analysis.
- Which model is the best, in terms of metric scores, memory used and speed?
- Identify countries which are associated with fraud postings.
- Find associated features with a fraudulent and non-fraudulent posting.
This project was submitted as part of the requirements of EE0005 Introduction to Data Science and Artificial Intelligence.
| Name (Alphabetical Order) | Contributions |
|---|---|
| Goh Lee Hua | Text pre-processing, Random forest classifier and GridSearchCV, Markdown comments |
| Hansel Tay | Lemmatization, Metrics, Oversampling and undersampling techniques |
| Philip Lee Hann Yung (Team Leader) | Feature extraction, TF-IDF vectorization, Modelling and Hyperparameter tuning, Organization of project pipeline, Markdown comments |
| Tan Keng Soon | Visualisation, Exploratory data analysis (EDA) |