This repository contains two distinct projects that leverage Apache Spark for large-scale data processing and machine learning tasks. Each project is designed to demonstrate the power of Spark in handling complex computations and large datasets efficiently.
-
Linear Regression and Gradient Descent
- Implements linear regression using both the normal equation and gradient descent optimization techniques.
- Utilizes Spark RDDs and Breeze for matrix operations and inversion.
- Includes RMSE calculation for model performance evaluation.
-
Matrix Multiplication using Spark
- Demonstrates efficient matrix multiplication using Spark's distributed computation capabilities.
- Handles large matrices that may not fit into the memory of a single machine, showcasing Spark's scalability.
- Apache Spark
- Scala
- Breeze (for numerical operations in the linear regression project)
Go to the individual project directories for detailed instructions on running each project:
Linear_Regression/: Contains the implementation and instructions for the linear regression and gradient descent project.Matrix_Multiplication/: Contains the implementation and instructions for the matrix multiplication project.
This repository is licensed under the MIT License. See the LICENSE file for more details.