The file news.csv has data extracted from news on the internet with its corresponding classification depending on its content. They have five classifications: economy, sports, science, culture and entertainment.
We create a function that takes the data from the csv file and prints the five classifications with a list of the x most repeated words for each classification. The function has two parameters: the name of the file to be read and the number of words to show.
We use pandas, NLTK, gensim and collections.