Following Spring 2023 CS 4395 course taught by Karen Mazidi, UT Dallas
A document discussing basic background information (definition, history, personal interest) about NLP.
A program that takes a CSV file and processes them with format checking. It then creates a dictionary from all of the entries and loads it into a pickle file. It immediately unpacks the pickle file and is able to use the objects and their methods.
Download the contents of the folder labeled "Homework1." Upload it into the preferred IDE. Sysargs should be specified to point to the location of the data file. It was data\data.csv in my case. Also the script path should be specified as the .py file named "Homework1_cmt180004.py."
Python is useful for text processing because of how dynamically it treats data. Strings are simply lists of characters, so operations are very simple and straight forward. It also has a lot of built in methods for applicable checking and manipulating: alphabetical, cases, empty. This flexibility can also come as a weakness. Code can easily run without raising errors so it could cause unexpected behavior requiring extensive testing to catch.
This assignment enabled me to learn how to use regular expressions. I also have never used pickle files before. It seems to be helpful for developing code and working with data where it would waste a lot of time to process the data over and over. Rather, it can be saved in a pickle file to unpack for future steps. This assignment was also a useful review of Python lists and classes.
A program that accepts a text file and does some preprocessing (including calculating lexical diversity, filtering, pos-tagging using NLTK) before starting a hangman game with the user.
A program that explores different features of WordNet and sentiment analysis.
This assignment uses two programs. The first uses given texts to train three different models representing a different language. The second then unpacks the models and runs them with some test data to recognize the language being used. Finally, the models' outputs are compared with the solution given by a human annotator to assess the performance of the model.
A document that uses three kinds of parsers for a complex sentence. All of these parsers aim to reduce ambiguity in different ways.
A program that recursively scrapes websites to get info about a predefined topic (Mediterranean food!). It does its best to filter out what may not be helpful and keep what is. It builds a knowledge base that can have further applications, such as a chatbot.
A report where I created 3 different ML models that attempts sentiment analysis, a common application of NLP. I found that the logistic regression performed best with this data set and my chosen hyperparameters.
A program that attempts to implement a chatbot that can answer AI/ML/NLP related questions.
A program that does multi-class classification using deep learning. Two different architectures are examined and embedding is tested.
Throughout this semester I learned about the many uses for natural language processing and the approaches to it. I was able to strengthen my skills listed here. As with most technology, advancements are constantly being made. I especially believe it true for NLP because to be honest, some of the libraries I used were far from perfect. Even through the complex human mind, it takes 10,000 years for a new language to evolve from an existing one. One will have never fully mastered a language in their lifetime. I hope to stay in-the-know about new developments in NLP, and not just 'hype' ones. I have a strong interest in literature, so I think one day I may use the skills I learned from this class. I hope to one day be a data scientist, and this class further confirmed my interest in statistics and artificial intelligence.