Skip to content

Stack to collect bank reviews, evaluate classic ML and DL models to predict stars and classify opinions on bank service efficacy, communication and value using semantic proximity.

License

Notifications You must be signed in to change notification settings

LeWaHe/SentiBank

 
 

Repository files navigation

SentiBank

This repo is a fork from a Data Science training project @ DataScientest / La Sorbonne university.

Team:

  • Leonardo Heyerdahl (lead)
  • Alexis Garatti
  • Huazhen Hou
  • Alexandre PRZYBYLSKI

Objectives:

  • Collect banking reviews
  • Predict stars using review texts and meta data
  • Characterize user opinions regarding banking services

Results:

  • We collected 170k banking reviews in French from Truspilot
  • We evaluated the performance of classic ML models (KNN, lr, Random Forests, XGBoost) and one Transformer based model (CAMEMBERT), which outperformed (f1 = .64)
  • We created a 3 axis dichotomic taxonomy to characterize user opinions about banks:
    • Communication Good/Bad
    • Value Good / Bad
    • Efficacy Good / Bad
  • We hand labelled 200 reviews using these 6 labels
  • We evaluated different strategies for automatic labelling using
    • Different granularities (sentence based review, whole review, sentence based label references,..)
    • CamemBERT infered sentiment filtering
    • All words vs tailored stop words filtering
  • Our semantic labeler performed between .54 and .80 on test results.
  • We labelled the whole data set and produced global notations on the 3 axis for all banking actors in France

About

Stack to collect bank reviews, evaluate classic ML and DL models to predict stars and classify opinions on bank service efficacy, communication and value using semantic proximity.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.8%
  • Python 0.2%