This project implements the KMeans clustering algorithm in Scala, showcasing an efficient approach to partition a set of points in three-dimensional space into clusters. This algorithm is crucial for data analysis, pattern recognition, and image processing. It demonstrates the power of parallel collections in Scala for improving performance.
Parallel Collections: Utilizes Scala's parallel collections for efficient data processing, enabling faster computation by taking advantage of multicore processors.
Custom Point Class: Defines a Point class to represent points in three-dimensional space, providing functionalities to calculate distances between points.
Modular Interface: Implements a trait KMeansInterface to define the core functionalities of the KMeans algorithm, ensuring a clear and modular structure.
Efficient Clustering: Includes methods for classifying points into clusters based on their proximity to the centroids, updating centroids based on classified points, and checking for convergence.
Performance Measurement: Incorporates performance measurement using ScalaMeter, allowing for benchmarking of sequential vs. parallel execution times, showcasing the speedup achieved through parallelism. Utility Functions: Provides utility functions for generating random points and initializing means, facilitating the testing and demonstration of the algorithm.
To utilize this KMeans clustering project:
Clone the repository: git clone https://github.com/Emre-Akgul/K-Means-Clustering/
Ensure Scala and sbt are installed on your system. In the project directory, use the following command to run sbt: sbt
After setting up the project, you can run the main application to execute the KMeans algorithm and measure its performance. Use the following command in sbt: run
For testing and benchmarking, use the following command in sbt: test
The project includes a performance measurement setup that compares the execution time of the algorithm using sequential and parallel collections. It demonstrates the effectiveness of parallelism in Scala for computational tasks like KMeans clustering.
For any additional questions or collaboration opportunities, feel free to contact me at emreakgulcs@gmail.com or visit my GitHub Profile.