0xDE ([info]11011110) wrote,
@ 2008-02-07 20:10:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
Entry tags:splay trees

Static optimality for splay trees
Earlier this week in my graduate data structures course we covered splay trees. I found a slightly different way of covering the static optimality property (splay trees are to within a constant factor as good as any fixed tree for any sequence of data accesses) that I think makes it easier to understand than the usual entropy-based approach.

To begin with, I followed the analysis of Sleator and Tarjan's original STOC '83 paper (repeated in Tarjan's Data Structures and Network Algorithms). According to this analysis, if each data item is given a weight wi, with W denoting the sum of the weights, then the amortized cost to access item i is O(log W/wi). One of the beauties of splay trees is that they automatically adjust themselves to achieve this performance even when these weights are not known by the algorithm.

Proving this amortized time bound is not difficult. As in the STOC paper, define the rank of an item to be the floor of the log (base two) of the sum of the weights in its subtree (the 1985 JACM version gets a tighter analysis by omitting the floor). If one uses as a potential function the sum of the ranks, then one can analyze the sequence of splay steps by which item i is accessed, by partitioning the steps into the ones in which the rank of i increases and the ones in which it stays the same. The number of steps in which the rank increases can be at most the difference between the smallest and largest ranks available to i, which is log W/wi, and the potential function increases in these steps by a telescoping sum that ends up proportional to the same amount. In the remaining steps, the potential function decreases, canceling the amortized time for the steps.

The point, though, is not so much how one gets to this O(log W/wi) bound but how one sets the weights wi. If they are all equal to one, then W = n and the amortized cost per operation is O(log n). If the access sequence is chosen randomly at each step from some distribution with probability wi of accessing item i, then W = 1 and the average amortized cost per operation ends up as O(Σwi log 1/wi), the Shannon entropy.

At this point it's typical to wave one's hands, assert that optimal binary search trees have an access cost proportional to the Shannon entropy, and conclude that splay trees are within a constant factor of optimal. Sleator and Tarjan, for instance, say that this follows by “a standard theorem of information theory” and cite Abramson's information theory book, without even a hint at a page number where such a theorem may be found.

Instead, it turns out to be easier to prove directly that splay trees are as good as any static tree T (and therefore, as good as the optimal binary search tree, whatever tree that might be). To do so, let li denote the number of nodes on the path to item i in tree T, and let wi = 3-li. In any finite binary tree, W < 1 (in a complete infinite binary tree, the sum is exactly one, as can be seen by summing each level separately and then summing the levels as a geometric series); this is related to Kraft's inequality. Therefore, using these weights for the splay tree analysis, the time per access is O(log 1/3-li) = O(li), within a constant factor of the time in tree T as was to be shown.

Sadly, this approach doesn't seem to work to prove dynamic optimality of splay trees. Conceivably, one can define weights based on exponential functions of the access cost in some unknown dynamic optimal search tree, and use these weights to analyze a splay tree on the same access sequence, but it seems that a change to the optimal tree can cause a large increase in the potential of the splay tree, making it have a large amortized cost for that operation.



(7 comments) - (Post a new comment)

Nice argument
(Anonymous)
2008-03-07 01:09 pm UTC (link)
That is nice. I normally just show them that the cost of an access sequence is proportional to the the length of the sequence times the empirical entropy of the sequence. After that I say something vague about how, for long enough sequences where the queries are iid, the empirical entropy converges to the entropy of the underlying distribution.

Your way is much nicer and has other consequences, including the static finger property. I like that you can just point to a particular binary search tree and say "it's as good as that one."

Pat Morin

(Reply to this)


(Anonymous)
2008-04-01 07:34 pm UTC (link)
Nice

(Reply to this)

Question about Splay Trees?
(Anonymous)
2008-05-16 01:25 pm UTC (link)

In which cases can i use this dynamic datastructure?
can i use splay trees in online applications?
Thanks

(Reply to this) (Thread)

Re: Question about Splay Trees?
[info]11011110
2008-05-16 01:56 pm UTC (link)
Sure, they're just a binary search tree. You can use them almost any place you would use any other binary search tree. The only situations I know of where they don't work are (1) if you really need a worst-case time bound per operation rather than an amortized bound (say as part of a real-time system), or (2) in a multilevel data structure where you're attaching extra information per node that can't be updated when the tree is rotated.

(Reply to this) (Parent)

Splay Trees??
[info]nizarbn
2008-05-16 02:47 pm UTC (link)
Can i use Splay Tree to implement the IP Lookup foncionality used in high speed routers???
In the other words , Can i use Splay Trees to exploit the skewness in the traffic IP, and exploit the locality in the incoming IP packets...
Thanks
NizarBN

(Reply to this) (Thread)

Re: Splay Trees??
[info]11011110
2008-05-16 03:26 pm UTC (link)
I imagine that they could perform that task better than other dynamic binary tree data structures -- you pay a little in the performance of splay trees, compared with other kinds of trees, to do the rotations on every query, and it depends on how unbalanced the query stream is whether that cost is made up for by the performance gain. That's the kind of question that would have to be answered by experiment rather than by theory.

But my understanding is that in high speed routers, binary search trees are in general not good enough, and that you want to use table lookup techniques that take less than logarithmic time per query.

(Reply to this) (Parent)(Thread)

Re: Splay Trees??
[info]nizarbn
2008-05-16 08:53 pm UTC (link)
Thank you for your answers :-)
I have find this paper published in 2006:
http://paper.ijcsns.org/07_book/200605/200605B05.pdf
This paper is about the use of splay trees to resolve the packet classification problem implemented in the high speed routers.
It is what am searching for
:-)
Thanks
NizarBN

(Reply to this) (Parent)


(7 comments) - (Post a new comment)

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…