Treap is a data-structure, a kind of binary tree which we already have seen, but with very curious property - it uses randomization to optimize internal data distribution. This means that building Treap from the same data produces different actual tree shape and placement of values inside.
Let's study this interesting concept. Particularly we'll try to see how much it is affected by poor choice of randomizer.
Naive binary search tree, like one we met in Tree Builder has obvious issue - in
unlucky case the data could be inserted in such an order that the whole structure is not well-balanced. In worst
case (think of adding 1, 2, 3, ... n
in order) it will become a string of nodes of length n
.
Many ideas were proposed to automatically re-balance the data. In case of Treap
idea is the following:
X
fields of the nodesY
fields are assigned a random value(X, Y)
found its proper place - this may include
some "rotations" of nodes, but overall it is O(log(N))
operationsX
is just as usual search over tree, also in O(log(N))
steps.It may be regarded as a Tree regarding X
and min-Heap regarding Y
- hence the name (Tree + Heap).
Actually that's all! If you have seen Cartesian Tree, the only more details you need are about mentioned
insertion steps. It is like this: firstly insert the new node according to its X
field as in normal
binary tree (i.e. descend in left-right fashion to some empty leaf and dangle it here). Now we see that Y
of this node could be in wrong relation to its parent (we remember in Cartesian Tree all Y
should be ordered
from top to bottom). In this case it is necessary to swap (or more exactly, "rotate") the node with its parent.
This involves possible detaching of some child of that parent and attaching it to our new node.
This process is repeated until the new node "bubbles up" to proper place according to its Y
(while "rotations"
preserve X
ordering). Let's regard some example of rotation:
600 600
/ /
300 500
/ \ / \
100 500 300 550
/ \ / \
400 550 100 400
Suppose we want 500
to rotate with its parent 300
(their Y
are not shown, but suppose their order
needs this rotation). This means that:
500
should become left child of 600
instead of 300
300
which was left child of 600
before, will now become left child of 500
itself400
which was left child of 500
will become right child of 300
(obviously it was greater than its grandparent)Rotation is always about changing 3
links between nodes (which probably are double-links), though some of them
may be pointing at nulls.
We can cheat by first putting all the nodes into array and sorting them, then building a tree from array as in Cartesian Tree problem - but rotations are used in other trees also so it's better to do things properly and practice this useful trick.
We shall use simple shift register randomizer.
Generate N
sequential random values to be inserted into the tree (as X
-s), starting with seed S
.
Now conduct M
tests for which we are given M
values of seed for randomizer. On every
test re-initialize randomizer with one of these seeds and create the tree anew (e.g. always with the same X
-s
but different Y
-s every time).
For every run calculate average depth (height) across all nodes. Find which run gives the best and the worst average. (ideally we want randomizer which usually gives similar results, but this one is not ideal).
For example, here is a tree with X:Y
values specified using colon while depth is shown in parentheses. It
is generated using seed S = 44257
and test-seed 17753
.
17753:7306 (1)
left:
5479:29226 (2)
right:
10958:58453 (3)
right:
22128:8876 (2)
left:
21916:51371 (3)
right:
35507:14613 (3)
right:
43832:37206 (4)
You see, the root has height 1
, it's children 2
and so on. Average over 7
elements is
(1 + 2 + 3 + 2 + 3 + 3 + 4) / 7 = 18 / 7 = 2.5714...
Randomizer
Use the following formula (it creates sequence with period 65535
and comes from the example in Wikipedia):
# calculate next value for 'seed'
bit = (seed ^ (seed >> 2) ^ (seed >> 3) ^ (seed >> 5)) & 1
seed = (seed >> 1) | (bit << 15)
For example, seed of ACE1
is followed by values 5670
, AB38
, 559C
, 2ACE
, 1567
, 8AB3
, 4559
(all
hexadecimals).
Input data contain three values in the first line: N
is the size of the tree, S
is the starting seed
for generating X
values for the tree, and M
is the amount of test-seeds, following in the second line.
Answer should give 4
values - best test-seed, best average, worst test-seed, worst average.
Example
input data:
30 44257 7
57401 6876 16061 17524 40078 41354 34145
answer
34145 4.4666667 41354 6.1666667
Note: if you need to review tree structure for larger example, here is one
for N=30
, S=44257
and TEST_SEED=41354
. Here are also references to parent nodes after "depth" in parentheses.