Text-Mining-with-movie-scripts

General info

In this project, we will use a dataset of movie scripts to perform some Text Mining tasks like classification, clustering, and topic analysis. The dataset was built by scraping some web pages with the scripts of the movies and then enriched by integrating pieces of information using tmdb API, for the purpose of this project, we will use just the raw corpus of the scripts and the Infos about the main genres of each movie. The movies scripts are represented via a multitude of ways, first of all by a document-term matrix, then we investigated more complex ways of representing the long and sparse nature of the data, we used dimensionality reduction methods as SVD (https://en.wikipedia.org/wiki/Singular_value_decomposition), Word2Vec (https://en.wikipedia.org/wiki/Word2vec) and GloVe (https://en.wikipedia.org/wiki/GloVe_(machine_learning)), we then used these word embeddings to perform our tasks and analysis, we also used t-SNE (https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding) to reduce even more the dimensionality to just 2 components to visualize our scripts in a plane.

Technologies

Project is created with:

numpy
scipy
IPython
pandas
nltk
tensorflow
sklearn
matplotlib
gensim
pyLDAvis

Needed Files:

to obtain glove.6B.300d.txt.word2vec follow these steps

to obtain the pre-trained Doc2Vec files download English Wikipedia Skip-Gram (1.4GB)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Presentazione_Text_Mining.pdf		Presentazione_Text_Mining.pdf
README.md		README.md
Report_Text_Mining_and_Search.pdf		Report_Text_Mining_and_Search.pdf
Supervised_learning.ipynb		Supervised_learning.ipynb
Topic Modelling.ipynb		Topic Modelling.ipynb
Unsupervised Learning.ipynb		Unsupervised Learning.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-Mining-with-movie-scripts

General info

Technologies

Needed Files:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Text-Mining-with-movie-scripts

General info

Technologies

Needed Files:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages