- Get the data
- Modify the data for analysis
- Produce basic plots/graphs for understanding the data
- Apply simple kmeans methods
- Apply hierarchical clustering
- Apply EM
- Apply a new method based on research
- Test and Compare methods
- Produce visualizations
- Create Final Report
- Crete presentation
- Change topic from sentiment analysis to stock market clustering and predictions.
- Create informal project proposal.
- Determine as a group which stocks we would like to perform our analysis on. Currently, we are looking forward to analyzing SP500 stocks.
- Determine as a group what time periods we would like to look at in order to avoid outlier years.
- Get all group members familiar with scikit-learn and R through individual exploration.
- Gather all data from the stocks and convert into a format needed for analysis
- Produce visualization graphics using dummy data
- Create and test different models created using different algorithms
- Create a visualization that demonstrates our results
- If we get good results play with the data and attempt to do predictions on stock prices given related stocks. This would be a form of a supervised learning done by altering the data to be given stock prices of the cluster and have to predict what our stock will be.
- Begin work on the project progress report.
- Finish project progress report.
- Attempt to use alternative algorithms to cluster the data.
- Begin work on final project report.
- Finish final project report.
- Begin working on the project presentation.
- Gather data using R or Python techniques
- Saving the data in correct file formats for future analysis
Task 2: Determining the most important attributes to use and what types of machine learning techniques should be implemented (in short Data manipulation)
- Analyze importance of each attribute
- Adding or removing attributes
- Determine what type of algorithms would work best
- Design and create the optimal models using basic and advanced algorithms
- Support Vector Machines
- K-Nearest Neighbor
- Expectation Maximization
- Density-Based Clustering
- Test the methods on the data
- Modify and optimize the methods based on the testing
- Finding trends in the data results
- Creating charts and graphs to visualize the trends
- Creating network structure to represent similarities between different stocks
- Combine the visual results along with the concluding ideas to form a final report
- Use charts and graphs to present the trends found in our data and analysis results