This repository was archived by the owner on Jan 31, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 71
This repository was archived by the owner on Jan 31, 2024. It is now read-only.
K-means Clustering #2
Copy link
Copy link
Open
Description
Hi Imad,
Your article, K-means Clustering: Algorithm, Applications, Evaluation Methods, and Drawbacks, is great!
I would like to point out a couple of issues:
-
In Kmeans class implementation
- In
def initializ_centroids, simply puttingnp.random.RandomStatein a line doesn't have an effect (FYR). You could do$ r = np.random.RandomState(self.random_state) $ random_idx = r.permutation(X.shape[0])
- In
def predict, old_centroids is out of the scope. You could do$ distance = self.compute_distance(X, self.centroids)
- In
def compute_distance, squaring the calculated distance is unnecessary, although it doesn't hurt. I would do$ distance[:, k] = norm(X - centroids[k, :], axis=1)
- In
-
In the Image Compression instance, the following description doesn't make sense to me:
"The original image size was 396 x 396 x 24 = 3,763,584 bits; however, the new compressed image would be 30 x 24 + 396 x 396 x 4 = 627,984 bits. The huge difference comes from the fact that we’ll be using centroids as a lookup for pixels’ colors and that would reduce the size of each pixel location to 4-bit instead of 8-bit."- The original size of the image is 396 x 396 x 24 because the image has in total 396 x 396 pixels and each pixel has 24-bit color representation; however, after the compression, each pixel has 30 colors that can be represented with at least 5 bits (4-bit can represent 16 colors); Plus the overhead storage of 30 colors, the number of bits should be 30 x 24 + 396 x 396 x 5.
- The number of bits at each pixel location is reduced to 5-bit from 24-bit.
Thanks!
Metadata
Metadata
Assignees
Labels
No labels