You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: questions/162_implement-k-nearest-neighbors/learn.md
+1-20Lines changed: 1 addition & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,23 +7,4 @@ The key insight is to use numpy's vectorized operations to efficiently calculate
7
7
Convert to numpy arrays - Transform the input tuples into numpy arrays for vectorized operations
8
8
Calculate distances - Use broadcasting to compute Euclidean distance from query point to all points at once
9
9
Find k nearest - Use np.argsort() to get indices of points sorted by distance, then take first k
10
-
Return as tuples - Convert the selected points back to tuple format
11
-
12
-
## Key Implementation Details:
13
-
14
-
Vectorized distance calculation: np.sqrt(np.sum((points_array - query_array) ** 2, axis=1)) computes all distances in one operation instead of looping
15
-
Broadcasting: numpy automatically handles the subtraction between the query point and all data points
16
-
Efficient sorting: np.argsort() returns indices of sorted elements without actually sorting the array, allowing us to select just the k smallest
17
-
Dimension agnostic: The solution works for any number of dimensions (2D, 3D, etc.) without modification
18
-
19
-
Time Complexity: O(n log n) where n is the number of points, dominated by the sorting step
20
-
Space Complexity: O(n) for storing the distance array
21
-
22
-
## Edge Case Handling:
23
-
24
-
Empty points list returns empty result
25
-
k larger than available points returns all points
26
-
Single point datasets work correctly
27
-
Duplicate points at same distance are handled by numpy's stable sorting
28
-
29
-
The numpy approach is much more efficient than a naive loop-based implementation, especially for large datasets, as it leverages optimized C implementations for mathematical operations.
10
+
Return as tuples - Convert the selected points back to tuple format
0 commit comments