## State Of Art Algorithm

## What's wrong with LSH?

The problem of LSH is that an ad hoc hash function family has to be constructed for each similarity measure. The recent kernelized LSH partially solved that problem, but the distortion of the distance measure by the hash approximation remains huge. It turns out that the most effective hash function family, at least for L2 distance, is K-means clustering.

## What's wrong with K-means clustering?

You have to be able to take the "average" of a bunch of objects. That pretty much limits it's application to vector-formed objects.

## What's wrong with KD-Trees, randomized or not?

It only applies to vector-formed objects.