Skip to content

rmkeezer/Clustering-Analysis

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 

Repository files navigation

Task Checklist

  • Get the data
  • Modify the data for analysis
  • Produce basic plots/graphs for understanding the data
  • Apply simple kmeans methods
  • Apply hierarchical clustering
  • Apply EM
  • Apply a new method based on research
  • Test and Compare methods
  • Produce visualizations
  • Create Final Report
  • Crete presentation

Project Plan

Week of 6/7

  • Change topic from sentiment analysis to stock market clustering and predictions.
  • Create informal project proposal.

Week of 6/14

  • Determine as a group which stocks we would like to perform our analysis on. Currently, we are looking forward to analyzing SP500 stocks.
  • Determine as a group what time periods we would like to look at in order to avoid outlier years.
  • Get all group members familiar with scikit-learn and R through individual exploration.
  • Gather all data from the stocks and convert into a format needed for analysis

Week of 6/21

  • Produce visualization graphics using dummy data
  • Create and test different models created using different algorithms

Week of 6/28

  • Create a visualization that demonstrates our results
  • If we get good results play with the data and attempt to do predictions on stock prices given related stocks. This would be a form of a supervised learning done by altering the data to be given stock prices of the cluster and have to predict what our stock will be.
  • Begin work on the project progress report.

Week of 7/5

  • Finish project progress report.
  • Attempt to use alternative algorithms to cluster the data.
  • Begin work on final project report.

Week of 7/12

  • Finish final project report.
  • Begin working on the project presentation.

Individual Tasks

Task 1: Gathering data using R or Python (everyone)

  • Gather data using R or Python techniques
  • Saving the data in correct file formats for future analysis

Task 2: Determining the most important attributes to use and what types of machine learning techniques should be implemented (in short Data manipulation)

  • Analyze importance of each attribute
  • Adding or removing attributes
  • Determine what type of algorithms would work best

Task 3: Generating and testing models.

  • Design and create the optimal models using basic and advanced algorithms
  • Support Vector Machines
  • K-Nearest Neighbor
  • Expectation Maximization
  • Density-Based Clustering
  • Test the methods on the data
  • Modify and optimize the methods based on the testing

Task 4: Visualizing results

  • Finding trends in the data results
  • Creating charts and graphs to visualize the trends
  • Creating network structure to represent similarities between different stocks

Task 5: Writing the final report

  • Combine the visual results along with the concluding ideas to form a final report

Task 6: Create the Presentation

  • Use charts and graphs to present the trends found in our data and analysis results

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 55.8%
  • Python 44.2%