Whats Wrong With Hierarchical Approach In General

Here's why I believe in high-dimensional space a non-logN (non-hierarchical) algorithm might win.

Summary of notations:

  • N: # points.
  • D: dimension
  • R: radius of the dataset (distance between two farthest pair of points in the dataset), R ~ N^(1/D)
  • delta: in each step, you walk this distance towards your target.

In a hierarchical algorithm, you do logN stops.

In a flat algorithm, the radius is R, and you make sure in every step you become a little bit closer to your target (>= delta). Then you can get to the target in at most R/delta steps, or N^(1/D)/delta steps.

Now we are basically comparing logN vs N^(1/D). For hierarchical algorithm to win, one needs

logN < N^(1/D)

Let X = logN, then equivalently one needs

logX < X/D

Or logX / X < 1/D.

If D = 128, then X will have to be > 865, or N will have to be > exp(865). That's a HUGE number. Even N > 2^D is not big enough.

The argument for hierarchical algorithm is that when N is REALLY big, logN always beats polynomial. But my argument against hierarchical algorithm is that when a big D kicks in, even in theory one can hardly reach such a big N.