Skip to content

Emre-Akgul/K-Means-Clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KMeans Clustering Implementation in Scala

Introduction

This project implements the KMeans clustering algorithm in Scala, showcasing an efficient approach to partition a set of points in three-dimensional space into clusters. This algorithm is crucial for data analysis, pattern recognition, and image processing. It demonstrates the power of parallel collections in Scala for improving performance.

Features

Parallel Collections: Utilizes Scala's parallel collections for efficient data processing, enabling faster computation by taking advantage of multicore processors.

Custom Point Class: Defines a Point class to represent points in three-dimensional space, providing functionalities to calculate distances between points.

Modular Interface: Implements a trait KMeansInterface to define the core functionalities of the KMeans algorithm, ensuring a clear and modular structure.

Efficient Clustering: Includes methods for classifying points into clusters based on their proximity to the centroids, updating centroids based on classified points, and checking for convergence.

Performance Measurement: Incorporates performance measurement using ScalaMeter, allowing for benchmarking of sequential vs. parallel execution times, showcasing the speedup achieved through parallelism. Utility Functions: Provides utility functions for generating random points and initializing means, facilitating the testing and demonstration of the algorithm.

Installation and Setup

To utilize this KMeans clustering project:

Clone the repository: git clone https://github.com/Emre-Akgul/K-Means-Clustering/

Ensure Scala and sbt are installed on your system. In the project directory, use the following command to run sbt: sbt

Usage

After setting up the project, you can run the main application to execute the KMeans algorithm and measure its performance. Use the following command in sbt: run

For testing and benchmarking, use the following command in sbt: test

Performance Insights

The project includes a performance measurement setup that compares the execution time of the algorithm using sequential and parallel collections. It demonstrates the effectiveness of parallelism in Scala for computational tasks like KMeans clustering.

Contact Information

For any additional questions or collaboration opportunities, feel free to contact me at emreakgulcs@gmail.com or visit my GitHub Profile.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages