Natural Language Processing projects in this repository:
- Language Modeling for sentence auto-completion: Building language models for sentence auto-completion. Preprocessing text corpus, creating unigram, bigram and trigram language models, and using smoothed bigram and trigram to predict the next words in the sentence. Calculating perplexity scores.
- Named Entity Recognition using BERT: Developing a Named Entity Recognition pipeline for sentences using a pre-trained BERT language model. NER data is split with train and validation, and the model is evaluated on the validation set.
- Sentiment Prediction using Naive Bayes and LSTM Classifier: Classifying movie reviews from the IMDB dataset as positive or negative using a Naive Bayes classifier and bidirectional LSTM based classifier. Stemming and Lematization is performed as preprocessing steps, and accuracy scores from both models are compared.
Python frameworks used:
- beautifulSoup
- NLTK
- Pandas
- NumPy
- Skicit-Learn
- Regex
- Keras