K-means Clustering

Hi Imad, 

Your article, [K-means Clustering: Algorithm, Applications, Evaluation Methods, and Drawbacks](https://github.com/ImadDabbura/blog-posts/blob/master/notebooks/Kmeans-Clustering.ipynb), is great!

I would like to point out a couple of issues:
- In Kmeans class implementation
    1. In `def initializ_centroids`, simply putting `np.random.RandomState` in a line doesn't have an effect ([FYR](https://stackoverflow.com/questions/22994423/difference-between-np-random-seed-and-np-random-randomstate)). You could do
        ```sh
        $ r = np.random.RandomState(self.random_state)
        $ random_idx = r.permutation(X.shape[0])
        ```
    2. In `def predict`, old_centroids is out of the scope. You could do
        ```sh
        $ distance = self.compute_distance(X, self.centroids)
        ```
    3. In `def compute_distance`, squaring the calculated distance is unnecessary, although it doesn't hurt. I would do
        ```sh
        $ distance[:, k] = norm(X - centroids[k, :], axis=1)
        ```
- In the Image Compression instance, the following description doesn't make sense to me:
"The original image size was 396 x 396 x 24 = 3,763,584 bits; however, the new compressed image would be 30 x 24 + 396 x 396 x 4 = 627,984 bits. The huge difference comes from the fact that we’ll be using centroids as a lookup for pixels’ colors and that would reduce the size of each pixel location to 4-bit instead of 8-bit."

    1. The original size of the image is 396 x 396 x 24 because the image has in total 396 x 396 pixels and each pixel has 24-bit color representation; however, after the compression, each pixel has 30 colors that can be represented with at least 5 bits (4-bit can represent 16 colors); Plus the overhead storage of 30 colors, the number of bits should be 30 x 24 + 396 x 396 x 5.
    2. The number of bits at each pixel location is reduced to 5-bit from 24-bit.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

K-means Clustering #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

K-means Clustering #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions