Implementation of KNN based on the Spark-ML-LSH 

Hash Function: h_i(x) = floor(r_i.dot(x) / bucketLength)
threshold = 2000
W = bucketLength
NHT = # of HashTables
   * The number of buckets will be `(max L2 norm of input vectors) / bucketLength`.
   * If input vectors are normalized,  `1-10 times of pow(numRecords, -1/inputDim) ` would be a reasonable value

k |NHT|W|Accuracy_train|Accuracy_test| T_index| T_query
-------|------|----------|------------|------------|------------|------------
1 |3|2  |- |0.9087 |54|175848
5 |3|2  |- |0.893 |54|174651
9 |3|2  |- |0.8808 |54| 155673
1 |5|2  |- |0.9291 |29| 251302
5 |5|2  |- |0.9137|29|275162
9 |5|2  |- |0.9036 |29|367008
1 |7|2  |- |0.9372 |34| 523696
5 |7|2  |- |0.9238|34|460986
9 |7|2  |- |0.9145 |34|485565
1 |3|5  |- | 0.9357 |30| 367245
5 |3|5 |- | 0.9263|30|340930
9 |3|5 |- | 0.9171 |30|341963
1 |5|5  |- | 0.9459 |41| 596984
5 |5|5 |- | 0.9401|41|559091
9 |5|5 |- | 0.93 |41|561646
1 |7|5  |- | 0.9496 |22| 770659
5 |7|5 |- | 0.9465|22|787571
9 |7|5 |- | 0.9385 |22|841044
1 |3|8  |- | 0.9419 |37| 439672
5 |3|8 |- | 0.9348|37|417642
9 |3|8 |- | 0.9253 |37|422822
1 |5|8  |- |0.9481 |24| 605899
5 |5|8 |- | 0.9438|24|609686
9 |5|8 |- | 0.9358 |24|609061
1 |7|8  |- | 0.9511 |22| 780209
5 |7|8 |- | 0.9447|22|769710
9 |7|8 |- | 0.9409 |22|769710

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implementation of KNN based on the Spark-ML-LSH #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

k	NHT	W	Accuracy_train	Accuracy_test	T_index	T_query
1	3	2	-	0.9087	54	175848
5	3	2	-	0.893	54	174651
9	3	2	-	0.8808	54	155673
1	5	2	-	0.9291	29	251302
5	5	2	-	0.9137	29	275162
9	5	2	-	0.9036	29	367008
1	7	2	-	0.9372	34	523696
5	7	2	-	0.9238	34	460986
9	7	2	-	0.9145	34	485565
1	3	5	-	0.9357	30	367245
5	3	5	-	0.9263	30	340930
9	3	5	-	0.9171	30	341963
1	5	5	-	0.9459	41	596984
5	5	5	-	0.9401	41	559091
9	5	5	-	0.93	41	561646
1	7	5	-	0.9496	22	770659
5	7	5	-	0.9465	22	787571
9	7	5	-	0.9385	22	841044
1	3	8	-	0.9419	37	439672
5	3	8	-	0.9348	37	417642
9	3	8	-	0.9253	37	422822
1	5	8	-	0.9481	24	605899
5	5	8	-	0.9438	24	609686
9	5	8	-	0.9358	24	609061
1	7	8	-	0.9511	22	780209
5	7	8	-	0.9447	22	769710
9	7	8	-	0.9409	22	769710

Implementation of KNN based on the Spark-ML-LSH #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions