Benchmark

Main.Benchmark History

Hide minor edits - Show changes to output

April 16, 2014, at 03:47 PM by 108.211.200.12 -
Changed line 39 from:
||index time ||21.7s ||156.2s||114.3s      ||40s [--or 252.7s in serial--] ||255.9
to:
||index time ||21.7s ||156.2s||114.3s      ||40s [--or 252.7s in serial--] ||112.8
April 16, 2014, at 03:42 PM by 108.211.200.12 -
Changed line 39 from:
||index time ||21.7s ||156.2s||114.3s      ||40s [--or 252.7s in serial--] ||114.6
to:
||index time ||21.7s ||156.2s||114.3s      ||40s [--or 252.7s in serial--] ||255.9
April 16, 2014, at 03:39 PM by 108.211.200.12 -
Changed line 38 from:
||          ||kdtree||kmeans||hierarchical||kgraph ||kgraph-pruned
to:
||          ||kdtree||kmeans||hierarchical||kgraph ||kgraph, level-2 pruned
Changed line 41 from:
||parameters ||trees=8||default    ||trees=12    ||default, with --prune=1||default, with --prune2, search with -M 50
to:
||parameters ||trees=8||default    ||trees=12    ||default, with --prune=1||default, with --prune=2, search with -M 50
April 16, 2014, at 03:39 PM by 108.211.200.12 -
April 16, 2014, at 03:38 PM by 108.211.200.12 -
Added line 36:
Changed line 39 from:
||index time ||21.7s ||156.2s||114.3s      ||40s [--or 252.7s in serial--] || 114.6
to:
||index time ||21.7s ||156.2s||114.3s      ||40s [--or 252.7s in serial--] ||114.6
April 16, 2014, at 03:37 PM by 108.211.200.12 -
Changed lines 37-40 from:
||          ||kdtree||kmeans||hierarchical||kgraph
||index time ||21.7s ||156.2s||114.3s      ||40s [--or 252.7s in serial--]
||index size ||138M ||250M  ||120M        ||107M
||parameters ||trees=8||default    ||trees=12    ||default, with --prune=1
to:
||          ||kdtree||kmeans||hierarchical||kgraph ||kgraph-pruned
||
index time ||21.7s ||156.2s||114.3s      ||40s [--or 252.7s in serial--] || 114.6
||index size ||138M  ||250M  ||120M        ||107M ||97M
||
parameters ||trees=8||default    ||trees=12    ||default, with --prune=1||default, with --prune2, search with -M 50
March 14, 2014, at 11:24 PM by 108.211.200.12 -
Changed lines 34-35 from:
I've tuned the indices to have comparable size (for kmeans it appears there's nothing I can to to tune size).
to:
I've tuned the indices to have comparable size (for kmeans it appears there's nothing I can to to tune the size).  For the same algorithm, a bigger index typically allows it to achieve higher recall at the same online search cost.
Changed line 38 from:
||index time ||21.7s ||156.2s||114.3s      ||40s %red%in parallel%%
to:
||index time ||21.7s ||156.2s||114.3s      ||40s [--or 252.7s in serial--]
March 14, 2014, at 11:18 PM by 108.211.200.12 -
Changed line 33 from:
Index construction time and size.  (FLANN indexing building is not parallelized, while KGraph is.  It should be rather straightforward to parallelize FLANN indexing and achieve 5x speedup and beat KGraph in index building.  At least it will make their auto tuning run much faster.)
to:
Index construction time and size.  (FLANN indexing building is not parallelized, while KGraph is.  It should be rather straightforward to parallelize FLANN indexing and achieve 5x speedup and beat KGraph in index building.  It will also make their auto tuning run much faster.)
March 14, 2014, at 11:18 PM by 108.211.200.12 -
Changed line 33 from:
Index construction time and size.  (FLANN indexing building is not parallelized, while KGraph is.)
to:
Index construction time and size.  (FLANN indexing building is not parallelized, while KGraph is.  It should be rather straightforward to parallelize FLANN indexing and achieve 5x speedup and beat KGraph in index building.  At least it will make their auto tuning run much faster.)
Changed line 38 from:
||index time ||21.7s ||156.2s||114.3s      ||40s
to:
||index time ||21.7s ||156.2s||114.3s      ||40s %red%in parallel%%
March 14, 2014, at 11:08 PM by 108.211.200.12 -
Changed line 12 from:
All the benchmarks in this page are performed on a desktop machine with a 6-core 3930K CPU with 12 threads, and 64GB main memory.  Only SSE2 instructions are used and AVX instructions, which is not so commonly available, are not enabled.
to:
All the benchmarks in this page are performed on a desktop machine with a 6-core 3930K CPU with 12 threads, and 64GB main memory.  Only SSE2 instructions are used; AVX instructions, which is not so commonly available, are not enabled.
March 14, 2014, at 11:07 PM by 108.211.200.12 -
Changed line 10 from:
The most common mistake people make (including me) when writing a paper is when they benchmark an approximate search algorithm against brutal force.  For example, if brutal force finishes in 100 seconds, and the index finishes searching in 10 seconds with 50% recall (percentage of exact nearest neighbors found, 50% == 0.5, recall of brutal force scanning the whole dataset is always 1), sometimes they  go ahead and claim that the index is 10x the speed of brutal force.  But that is unfair, because recall is not fixed.  Actually, we can safely assume that it would take brutal force about 50 seconds to scan half the dataset and achieve the same 50% recall.  So the index is only about 5x the speed of brutal force at the recall of 50%.
to:
The most common mistake people make (including me) when writing a paper is when they benchmark an approximate search algorithm against brutal force.  For example, if brutal force finishes in 100 seconds, and the index finishes searching in 10 seconds with 50% recall (percentage of exact nearest neighbors found, 50% == 0.5, recall of brutal force scanning the whole dataset is always 1), sometimes they  go ahead and claim that the index is 10x the speed of brutal force.  But that is unfair, because recall is not the same for the two methods.  Actually, we can safely assume that it would take brutal force about 50 seconds to scan half the dataset and achieve the same 50% recall.  So the index is only about 5x the speed of brutal force at the recall of 50%.
March 14, 2014, at 11:07 PM by 108.211.200.12 -
Changed line 10 from:
The most common mistake people make (including me) when writing a paper is when they benchmark an indexed search algorithm against brutal force.  For example, if brutal force finishes in 100 seconds, and the index finishes searching in 10 seconds with 50% recall (percentage of exact nearest neighbors found, 50% == 0.5, recall of brutal force scanning the whole dataset is always 1), sometimes they  go ahead and claim that the index is 10x the speed of brutal force.  But that is unfair, because recall is not fixed.  Actually, we can safely assume that it would take brutal force about 50 seconds to scan half the dataset and achieve the same 50% recall.  So the index is only about 5x the speed of brutal force at the recall of 50%.
to:
The most common mistake people make (including me) when writing a paper is when they benchmark an approximate search algorithm against brutal force.  For example, if brutal force finishes in 100 seconds, and the index finishes searching in 10 seconds with 50% recall (percentage of exact nearest neighbors found, 50% == 0.5, recall of brutal force scanning the whole dataset is always 1), sometimes they  go ahead and claim that the index is 10x the speed of brutal force.  But that is unfair, because recall is not fixed.  Actually, we can safely assume that it would take brutal force about 50 seconds to scan half the dataset and achieve the same 50% recall.  So the index is only about 5x the speed of brutal force at the recall of 50%.
March 14, 2014, at 11:06 PM by 108.211.200.12 -
Changed line 10 from:
The most common mistake people make (including me) when writing paper is when they benchmark an indexed search algorithm against brutal force.  For example, if brutal force finishes in 100 seconds, and the index finishes searching in 10 seconds with 50% recall (percentage of exact nearest neighbors found, 50% == 0.5, recall of brutal force scanning the whole dataset is always 1), sometimes they  go ahead and claim that the index is 10x the speed of brutal force.  But that is unfair, because recall is not fixed.  Actually, we can safely assume that it would take brutal force about 50 seconds to scan half the dataset and achieve the same 50% recall.  So the index is only about 5x the speed of brutal force at the recall of 50%.
to:
The most common mistake people make (including me) when writing a paper is when they benchmark an indexed search algorithm against brutal force.  For example, if brutal force finishes in 100 seconds, and the index finishes searching in 10 seconds with 50% recall (percentage of exact nearest neighbors found, 50% == 0.5, recall of brutal force scanning the whole dataset is always 1), sometimes they  go ahead and claim that the index is 10x the speed of brutal force.  But that is unfair, because recall is not fixed.  Actually, we can safely assume that it would take brutal force about 50 seconds to scan half the dataset and achieve the same 50% recall.  So the index is only about 5x the speed of brutal force at the recall of 50%.
March 14, 2014, at 10:43 PM by 108.211.200.12 -
Added line 43:
The curves are generated by varying the C parameter of FLANN and the P parameter of KGraph.
March 14, 2014, at 10:42 PM by 108.211.200.12 -
Changed line 20 from:
We use FLANN's brutal force as baseline, and compute speedup using
to:
I use FLANN's brutal force as baseline, and compute speedup using
March 14, 2014, at 10:41 PM by 108.211.200.12 -
Changed line 18 from:
This shows the implementation dependent speedup of KGraph over FLANN.  Which is pretty much the difference between hand-coded L2 distance code vs machine generated code.
to:
This shows the implementation dependent speedup of KGraph over FLANN.  The difference is purely generated by code quality.
March 14, 2014, at 10:40 PM by 108.211.200.12 -
Added lines 31-32:
!3. Comparison of Various Methods
Changed line 42 from:
!3. Comparison of Various Methods
to:
Online speed comparison.  The time reported is the total amount of time spent on running 10,000 queries in parallel.
March 14, 2014, at 10:39 PM by 108.211.200.12 -
Changed lines 12-13 from:
All the benchmarks in this page are performed on a desktop machine with a 6-core 3770K CPU with 12 threads, and 64GB main memory.  Only SSE2 instructions are used and AVX instructions, which is not so commonly available, are not enabled.
to:
All the benchmarks in this page are performed on a desktop machine with a 6-core 3930K CPU with 12 threads, and 64GB main memory.  Only SSE2 instructions are used and AVX instructions, which is not so commonly available, are not enabled.
Added lines 24-25:
All online search are parallelized by query.  The time reported is the total time spent on running 10000 queries in parallel.
Added lines 31-38:
Index construction time and size.  (FLANN indexing building is not parallelized, while KGraph is.)
I've tuned the indices to have comparable size (for kmeans it appears there's nothing I can to to tune size).

||border=1
||          ||kdtree||kmeans||hierarchical||kgraph
||index time ||21.7s ||156.2s||114.3s      ||40s
||index size ||138M  ||250M  ||120M        ||107M
||parameters ||trees=8||default    ||trees=12    ||default, with --prune=1
March 14, 2014, at 10:21 PM by 108.211.200.12 -
Deleted lines 29-31:


Added lines 31-32:

||Attach:knn-time.png||Attach:knn-speedup.png
March 14, 2014, at 11:52 AM by 108.211.200.12 -
Changed line 10 from:
The most common mistake people make (including me) when writing paper is when they benchmark an indexed search algorithm against brutal force.  For example, if brutal force finishes in 100 seconds, and the index finishes searching in 10 seconds with 50% recall, sometimes they  go ahead and claim that the index is 10x the speed of brutal force.  But that is unfair, because recall is not fixed.  Actually, we can safely assume that it would take brutal force about 50 seconds to scan half the dataset and achieve the same 50% recall.  So the index is only about 5x the speed of brutal force at the recall of 50%.
to:
The most common mistake people make (including me) when writing paper is when they benchmark an indexed search algorithm against brutal force.  For example, if brutal force finishes in 100 seconds, and the index finishes searching in 10 seconds with 50% recall (percentage of exact nearest neighbors found, 50% == 0.5, recall of brutal force scanning the whole dataset is always 1), sometimes they  go ahead and claim that the index is 10x the speed of brutal force.  But that is unfair, because recall is not fixed.  Actually, we can safely assume that it would take brutal force about 50 seconds to scan half the dataset and achieve the same 50% recall.  So the index is only about 5x the speed of brutal force at the recall of 50%.
March 14, 2014, at 11:48 AM by 108.211.200.12 -
Changed line 27 from:
|| speedup || 1.0x || 1.48x ||
to:
|| speedup || 1.0x || 1.54x ||
March 13, 2014, at 04:59 PM by 108.211.200.12 -
Changed line 26 from:
||search time/s || 135.1 || 91.1 ||
to:
||search time/s || 135.1 || 87.3 ||
March 13, 2014, at 03:51 PM by 108.211.200.12 -
Changed lines 26-27 from:
||search time/s || 131.9 || 91.1 ||
|| speedup || 1.0x || ||
to:
||search time/s || 135.1 || 91.1 ||
|| speedup || 1.0x || 1.48x ||
March 13, 2014, at 03:51 PM by 108.211.200.12 -
Added lines 20-23:
We use FLANN's brutal force as baseline, and compute speedup using

%red%speedup = (flann_brutal_force_time * index_search_recall) / (index_search_time)

Changed lines 26-31 from:
||search time/s || 131.9 || 88.0 ||
|| speedup || 1.0x || 1.50x ||

In the rest benchmarking, we use FLANN's brutal force as baseline, and compute speedup using

speedup = (brutal-force-time x index-recall) / (index-time)
to:
||search time/s || 131.9 || 91.1 ||
|| speedup || 1.0x || ||



March 13, 2014, at 03:47 PM by 108.211.200.12 -
Changed line 14 from:
The dataset I use is the [[http://corpus-texmex.irisa.fr/|one million SIFT feature dataset]] with 10,000 queries.  Each SIFT feature is represented as a 128-dimensional vector of 32-bit floating point numbers.  Similarity is the L2 square distance. Recall is measured with the first 20 nearest neighbors.
to:
The dataset I use is the [[http://corpus-texmex.irisa.fr/|one million SIFT feature dataset]] with 10,000 queries.  Each SIFT feature is represented as a 128-dimensional vector of 32-bit floating point numbers.  Similarity is the L2 square distance. Recall is measured with the first 10 nearest neighbors.
March 13, 2014, at 02:51 PM by 108.211.200.12 -
March 13, 2014, at 02:44 PM by 108.211.200.12 -
Changed line 23 from:
| speedup || 1.0x || 1.50x ||
to:
|| speedup || 1.0x || 1.50x ||
March 13, 2014, at 02:44 PM by 108.211.200.12 -
Changed line 23 from:
to:
| speedup || 1.0x || 1.50x ||
March 13, 2014, at 02:43 PM by 108.211.200.12 -
Changed lines 1-2 from:
1. Benchmarking Fairly
to:
!1. Benchmarking Fairly
Changed lines 16-17 from:
2. Brutal Force
to:
!2. Brutal Force
Changed line 29 from:
3. Comparison of Various Methods
to:
!3. Comparison of Various Methods
March 13, 2014, at 02:43 PM by 108.211.200.12 -
Changed line 27 from:
(brutal-force-time x index-recall) / (index-time)
to:
speedup = (brutal-force-time x index-recall) / (index-time)
March 13, 2014, at 02:42 PM by 108.211.200.12 -
Changed lines 21-34 from:
|| || Brutal Force, FLANN || Brutal Force, KGRAPH ||
||Search Time/s || 131.9 || 88.0 ||


3. FLANN Brutal Force vs KD-tree

||border=1
|| || Brutal Force, FLANN || KD-tree || KD-tree, C=6500 ||
||Search Time
/s || 131.9 || 0.097 ||  6.1
||Recall || 1.0 || 0.118 || 0.90
||Speedup || 1x || 160x || 19x


to:
|| || brutal force, flann || brutal force, kgraph ||
||search time/s || 131.9 || 88.0 ||


In the rest benchmarking, we use FLANN's brutal force as baseline, and compute speedup using

(brutal-force-time x index-recall)
/ (index-time)

3. Comparison of Various Methods
March 13, 2014, at 02:39 PM by 108.211.200.12 -
Changed lines 18-19 from:
That shows the implementation dependent speedup of KGraph over FLANN.
to:
This shows the implementation dependent speedup of KGraph over FLANN.  Which is pretty much the difference between hand-coded L2 distance code vs machine generated code.
Changed lines 22-24 from:
||Time/s || 131.9 || 88.0 ||

to:
||Search Time/s || 131.9 || 88.0 ||


3. FLANN Brutal Force vs KD-tree

||border=1
|| || Brutal Force, FLANN || KD-tree || KD-tree, C=6500 ||
||Search Time/s || 131.9 || 0.097 ||  6.1
||Recall || 1.0 || 0.118 || 0.90
||Speedup || 1x || 160x || 19x
March 13, 2014, at 02:29 PM by 108.211.200.12 -
Added lines 18-22:
That shows the implementation dependent speedup of KGraph over FLANN.

||border=1
|| || Brutal Force, FLANN || Brutal Force, KGRAPH ||
||Time/s || 131.9 || 88.0 ||
March 13, 2014, at 02:16 PM by 108.211.200.12 -
Added lines 11-16:

All the benchmarks in this page are performed on a desktop machine with a 6-core 3770K CPU with 12 threads, and 64GB main memory.  Only SSE2 instructions are used and AVX instructions, which is not so commonly available, are not enabled.

The dataset I use is the [[http://corpus-texmex.irisa.fr/|one million SIFT feature dataset]] with 10,000 queries.  Each SIFT feature is represented as a 128-dimensional vector of 32-bit floating point numbers.  Similarity is the L2 square distance. Recall is measured with the first 20 nearest neighbors.

2. Brutal Force
March 13, 2014, at 02:02 PM by 108.211.200.12 -
Changed line 1 from:
1. Benchmark in a Fair Way
to:
1. Benchmarking Fairly
March 13, 2014, at 02:02 PM by 108.211.200.12 -
Changed lines 1-17 from:
Coming...
to:
1. Benchmark in a Fair Way

Software performance usually involves several aspects
.  For nearest neighbor search, relevant aspects includes index constructing time, index size, online search speed, search accuracy.  To fairly compare two programs, we need to tune the programs such that

* The performance is acceptable on aspects we don't care so much, like index constructing time -- so long as it finish within reasonable amount of time, people can usually wait for offline indexing to be done. We still need to report the number, so they can be considered when the programs are to be deployed. (Depending on the scenario, index construction time can be the most critical factor, and the benchmarking scheme has to be changed accordingly.)
* They perform equivalently on aspects that are hard to control and that we don't care the most, like index size.  They have to produce indices of similar size.  For example, we consider 4GB and 5GB as similar sizes.  It is usually not possible to make the program generate an index that is exactly the given size.  But it is not very fair to compare an index of 4GB to one of 40GB.
* Fix all critical factors except for one.  For similarity search, we need to make their recall/accuracy the same, say 95%.
* Compare the remaining one factor, the online search speed.

The most common mistake people make (including me) when writing paper is when they benchmark an indexed search algorithm against brutal force.  For example, if brutal force finishes in 100 seconds, and the index finishes searching in 10 seconds with 50% recall, sometimes they  go ahead and claim that the index is 10x the speed of brutal force.  But that is unfair, because recall is not fixed.  Actually, we can safely assume that it would take brutal force about 50 seconds to scan half the dataset and achieve the same 50% recall.  So the index is only about 5x the speed of brutal force at the recall of 50%.






March 12, 2014, at 09:16 PM by 108.211.200.12 -
Added line 1:
Coming...