Tree review (also see cs280 slides) generic tree - no limit on the number of children (branching) and balance factors. ------------ binary tree (BT) b=2 ----------- heap (see prev. lecture) ---- binary search tree (BST) ------------------ left child < parent right child > parent find - trivial (recursive or iterative) search is O(log n) average, but O(n) worst insert - trivial since no balancing (recursive or iterative) insert is O(log n) average, but O(n) worst immediate predecessor: right-most in the left branch, or (if left branch is empty) the first predecessor (parent of parent of ...) which is reached (as your ascend) through a right link. immediate successor: DIY delete: 3 cases: leaf/external (definition - has no children) trivial non-leaf/inner with 1 child: trivial non-leaf/inner with 2 children: 1) copy in-order predecessor values into the node 2) delete the original predecessor location thing to understand: 2-child node's predecessor has 0 or 1 child (prove it by assuming the opposite) note: using successor also works. Good implementation will alternate. BALANCING: re-balancing is always done by rotations: Two very useful numbers: height: the longest path from node to a leaf balance factor: height(left tree) - height(right tree) height may be calculated each time recursively (inefficient) height(x) = max( height(x->left), height(x->right) ) + 1 running time O(n) - prove therefore efficient implementation needs to "maintain" height (and balance factor) notice that during regular - non-balancing - insert it is simple: just add a loop in the end of the insert function that goes up to the root and recalculates height using height(x) = max( x->left->height, x->right->height ) + 1 compare to the previous height(x) formula to make sure there is no recursion now. Balance factor is updated in the same loop. left rotation ------------- x y / \ / \ A y x C / \ / \ B C A B prove: the resulting tree is still a BST. Code: xParent = x->parent; // need in the last line, but will be overwritten x->setRight(y->left); // x-->B if ( x->right ) { x->right->setParent(x); } // B-->x y->setLeft(x); // y-->x if ( y->left ) { y->left->setParent(y); } // x-->y y->setParent(xParent); // y to the rest of the tree which is not shown on the diagram maintaining height: notice that nodes (and thus their's heights) inside A,B,C are not changed. x->height = max( root of A ->height , root of B ->height ) +1 y->height = max( x->height, root of C ->height ) +1 NOTE: usually you'll do height/balance manipulation inside setLeft/setRight/setParent methods also all ancestors of y should be updated: p = y; while ( p = p->parent ) { p->height = max( p->left ? p->left->height : 0, p->right : p->right->height : 0) +1; balance = ( p->left ? p->left->height : 0 ) - ( p->right : p->right->height : 0 ); } NOTE: from now on we will not pay attention to NULL pointer (empty subtrees) to make formulas simpler. right rotation (the opposite of left) ------------- x y / \ / \ y C A x / \ / \ A B B C if B is empty x y / \ / \ y C A x / \ A C rotation is just "pulling" complexity: ---------- rotation itself is constant, but height maintaining is O(log n) if implemented efficiently, O(n) if recursively. splay tree ---------- no height, no balancing factor - decision to rotate is based on access. Invented by D.D. Sleator and R.E. Tarjan in 1985. * Most of the time, we don't make any assumptions about the data. * Usually assume equal distribution of data and random values. * Non-random data can lead to worst-case situations (building a BST from already-sorted data). * Non-uniform distribution of data can also lead to worst-case situations. * If we know how the data is distributed, we can choose better data structures. A splay tree uses this knowledge to an advantage. * A splay tree is a binary search tree. * Newly inserted items are propagated (promoted) to the root. * This propagation occurs when writing (inserting) and reading (accessing) an item. * We call this propagation of a node splaying. The idea behind splay trees is that frequently accessed data is always near the top. * Splay trees are not guaranteed to be balanced * Worst-case is not guaranteed to be "good" * Average time may be excellent (this may be more important than worst case) * Algorithms for splaying a node are simple (only require rotation) * The algorithm is a variation of the more general BST root insertion Splaying algorithm * We want to splay a node two levels at a time. * This means we want to promote the node to the position of its grandparent (parent's parent) * The algorithm depends on the node's orientation to its grandparent (1 of 4 orientations) left-left left-right right-left right-right G G G G / / \ \ P P P P / \ / \ C C C C Promoting a node simply means rotating about the node's parent (which we've done). Promoting a node doesn't require you to specify left or right. The direction is implied. * If node is a right-child, rotate parent LEFT * If node is a left-child, rotate parent RIGHT # left-left, promote the parent, promote the node # left-right, promote the node, promote the node (node is promoted twice) # right-left, promote the node, promote the node (node is promoted twice) # right-right, promote the parent, promote the node or using rotate around X terminology # left-left : G right, P right # left-right : P left , G right # right-left : P right, G left # right-right : G left , P left * We continue to promote until we reach the root. * The "special case" is if our parent is the root. If the parent is the root, simply perform a rotation to bring the node to the root. AVL tree -------- One type of balanced tree is the AVL tree. (Two Russian Mathematicians, Adel'son-Vel'skii and Landis) Адел'сон-Вел'ский / Ландис. Tree is balanced if balance factor is -1,0,1 for EVERY node. * An AVL tree is essentially a balanced binary search tree (BST). * The insert and delete operations are more complicated (need to maintain the balanced property). * Still, fairly simple to understand and implement. * Worst case for searching is now O(lg N), which is very good. Insert using nodes with parent pointer (non-recursive = iterative): 1. Insert as usual 2. Iteratively go up starting with the parent of the new node and check balance factor. The factor may be -2,-1,0,1,2 only. Why? -1,0,1 - continue looping. -2 or 2 - need do fix balance, this is our LAST iteration when inserting (why? -- see proof below) continue looping when deleting (after delete you may need several rebalancing) -2 and 2 are symmetric, consider case -2 (right branch is 2 levels deeper then the left) NOTE: from now on height() of h() will refer to the heigth variable (and NOT recursive function) Balance factor = -2: There are 2 cases: case 1: height(V) <= height(W) - right-right note: during insert height(V) != height(W) (so balance is -1 or 1) while during delete may be height(V) == height(W) (so balance is -1 or 1, or 0) balance 0 is handled by the case that requires a single rotation. single left rotation about y: y u / \ / \ A u y W / \ / \ V W A V prove that -1 <= height(y) - height(W) <= 1 that is u is balanced when inserting AND deleting Proof: since after insert, but before rotation h(A)-h(u) = -2, h(A) - max(h(V),h(W)) = -1. (1) min( h(A) - h(V), h(A) - h(W) ) = -1 ( -max(i,j) = min(-i,-j) !!!) but h(V) < h(W) (see case 1 condition), since we did not rebalance u at the previous iteration!!!, we have h(V) - h(W) = -1. so h(A) - h(W) = h(A) - h(V) -1, and combining with (1) so h(A) - h(W) = -1, and h(A) - h(V) = 0 thus after the rotation: h(y) - h(W) = max( h(A), h(V) ) - h(W) = max( h(A) - h(W), h(V) - h(W) ) = max ( -1, -1 ) = -1 DONE using notations h_bi() - height "before insert" h_ai() - height "after insert, but before rotation" h() - height "after insert and rotation" the following is true: h_bi(A) = h_bi(V) = h_bi(W) h(A) = h_ai(A) = h_bi(A) // no change h(V) = h_ai(V) = h_bi(V) // no change h(W) = h_ai(W) = h_bi(W) + 1 // insert was in W -- why? prove that all ancestors of u are balanced WHEN INSERTING (not true if deleting -- see example below) Proof: basically we have to show that it's safe to terminate iteration at this point: before the insert (which happened in W) h_bi(y) = 2+h_bi(W) = h(A) + 1 after insert and rotation: h(u) = max ( h(y), h(W) ) = max ( max ( h(A), h(V) ), h(W) ) = = h(W) = h(A) + 1 so h_bi(y) = h(u) that is the height of the subtree remained the same, thus all balance factor "above" y (now u), do not change, and since the tree was balanced before insert, it remains balanced. DONE example of a tree that requires 2 rebalancings on delete: uses array int a[] = {368,66,972,616,228,938,984,502,217,798,111,85,607,946,504,222,288,271,982,389,993,801,470,626,272,713,490,746,539,106,511,259,172,835,876,753,125,212,255,342}; int n = sizeof(a)/sizeof(a[0]); inserts and deletes are in the array order (delete uses swap with successor) 876 -> 0 (h=1,b=0) / 835 -> 0 (h=2,b=-1) / 753 -> 0 (h=3,b=-1) / \ 539 -> 0 (h=1,b=0) 511 -> 0 (h=5,b=1) 342 -> 0 (h=2,b=1) / \ 259 -> 0 (h=1,b=0) \ 255 -> 0 (h=4,b=1) 212 -> 0 (h=1,b=0) / 172 -> 0 (h=2,b=-1) / \ 125 -> 0 (h=3,b=-1) \ 106 -> 0 (h=1,b=0) -------------------- after delete 539 (no rebalancing yet) 876 -> 0 (h=1,b=0) / 835 -> 0 (h=2,b=-1) / 753 -> 0 (h=3,b=-2) <------------------ first unbalanced node / 511 -> 0 (h=5,b=1) 342 -> 0 (h=2,b=1) / \ 259 -> 0 (h=1,b=0) \ 255 -> 0 (h=4,b=1) 212 -> 0 (h=1,b=0) / 172 -> 0 (h=2,b=-1) / \ 125 -> 0 (h=3,b=-1) \ 106 -> 0 (h=1,b=0) after the first rebalancing left rotation around 753 -------------------- 876 -> 0 (h=1,b=0) / 835 -> 0 (h=2,b=0) / \ 753 -> 0 (h=1,b=0) 511 -> 0 (h=5,b=2) <------------------ second unbalanced node 342 -> 0 (h=2,b=1) / \ 259 -> 0 (h=1,b=0) \ 255 -> 0 (h=4,b=1) 212 -> 0 (h=1,b=0) / 172 -> 0 (h=2,b=-1) / \ 125 -> 0 (h=3,b=-1) \ 106 -> 0 (h=1,b=0) -------------------- after the second rebalancing right rotation around 511 -------------------- 876 -> 0 (h=1,b=0) / 835 -> 0 (h=2,b=0) / \ 753 -> 0 (h=1,b=0) 511 -> 0 (h=3,b=0) / \ 342 -> 0 (h=2,b=1) \ 259 -> 0 (h=1,b=0) 255 -> 0 (h=4,b=0) 212 -> 0 (h=1,b=0) / 172 -> 0 (h=2,b=-1) / \ 125 -> 0 (h=3,b=-1) \ 106 -> 0 (h=1,b=0) ----------------------------------- Question (extra-extra): is it true that maximum number of rebalancings on delete is 2? case 2: ( height(v) > height(W) ) - right-left right rotation about u y y / \ / \ A u A v / \ / \ v W C u / \ / \ C D D W followed by left rotation about y y / \ A v v / \ / \ C u y u / \ / \ / \ D W A C D W prove that -1 <= height(y) - height(u) <= 1 that is v is balanced prove that all ancestors of v are balanced Insert using nodes without parent pointer (need a stack, stack is dynamically allocated)which may slow down execution: This next example shows how you can traverse back up the tree without having a pointer to your parent and without using recursion. (You could use recursion to achieve the same effect.) Pseudocode for Insertion: 1. Insert the item into the tree using the same algorithm for BSTs. Call this new node x. * While traversing the tree looking for the appropriate insertion point for x, push the visited nodes onto a stack. (Actually, you are pushing pointers to the nodes.) It is not necessary to push x onto the stack. 2. Check if there are more nodes on the stack. A. If the stack is empty, the algorithm is complete and the tree is balanced. B. If any nodes remain on the stack, go to step 3. 3. Remove the top node pointer from the stack and call it y. 4. Check the height of the left and right subtrees of y. A. If they are equal or differ by no more than 1 (hence, balanced), go to step 2. B. If they differ by more than 1, perform a rotation on one or two nodes as described below. After the rotation(s), the algorithm is complete and the tree is balanced. 2-3-4 trees (2-4/2,4) ---------- Search Trees (but not binary) Multi-way Search Trees Each internal node has up to 3 keys, thus upto 4 children 3 , 6 , 9 / | | \ 1,2 5 8 12,13 ALL LEAVES in 2-3-4 tree are on the same level, thus all nodes on the same level have the same heigth, thus 2-3-4 tree is ALWAYS balanced. Search is done similar to binary tree, but instead of 1 comparison may need upto 3. Insert first does search, then adds a child to a 2-node or 3-node (making them 3-node and 4-node correspondingly), when adding to a 4-node (which is called overflow) insert performs a split. Split breaks the illegal 5-node into a 3-node and 2-node (5=3+2) by sending the third key up to the parent (possibly creating a new root). insert 1 into: 10 / \ 2,3,4 11,12 results in a 5-node which is then split: 10 3,10 / \ ---> / | \ 1,2,3,4 11,12 1,2 4 11,12 Notice that if there is no parent to sent 3d key to, the tree GROWS. Make sure you understand that this is the only way 2-3-4 tree will grow - at root, unlike other trees which grow at bottom. The result is that 2-3-4 remains balanced when it grows - really, adding a new root doesn't change nodes' heights at all, and since the tree was balanced, it remains balanced: insert 1 into: 2,3,4 results in a 5-node which is then split, there is no parent to send 3 to, so tree grows: 3 / \ 1,2,3,4 ---> 1,2 4 you may need to apply split several time -- O(log n), if the parent which accepts a key also is also a 4-node. insert 1 into 10,20,30 / \ \ \ 2,3,4 11 ..... results in 5-node 10,20,30 / \ \ \ 1,2,3,4 11 ..... split by sending 3 up: 3,10,20,30 / | \ \ \ 1,2 4 11 ..... now root is 5-node, split again by sending 3d key (20) up - create new root: 20 / \ 3,10 30 / | \ \ \ 1,2 4 11 ... delete: first part is similar to a regular delete - search and swap with predecessor to make sure that item is at the bottom: using tree above and deleting 3, after swap the tree looks 20 / \ 2,10 30 / | \ \ \ 1,3 4 11 ... second part: case 1: deleting from a 3-node or 4-node, just remove the key and make it into 2-node or 3-node correspondingly. in the previous example: 20 / \ 2,10 30 / | \ \ \ 1 4 11 ... case 2: delete from 2-node proceed with delete, you'll get an empty node (making 2-3-4 tree illegal) so need to fix (situation is called underflow). Fix by "transfer" - take a key from parent and while parent covers the whole with a key from another child (sibling of our node) delete 1 from 20 / \ 2,10 30 / | \ \ \ 1 4,5 11 ... delete 1, results in empty node (??) 20 / \ 2,10 30 / | \ \ \ ?? 4,5 11 ... steal 2 from parent 20 / \ ??,10 30 / | \ \ \ 2 4,5 11 ... parent takes 4 from 4,5 20 / \ 4,10 30 / | \ \ \ 2 5 11 ... case 2: special case 1: what if sibling is also a 2-node: delete 2 from 20 / \ 4,10 30 / | \ \ \ 2 5 11 ... then "fuse" empty node and "5" by taking key "4" from parent 20 / \ 4,10 30 / | \ \ \ ?? 5 11 ... fuse 20 / \ 4,10 30 / \ \ \ 5 11 ... notice that tree is illegal (3-node "4,10" has only 2 children), that's why we need 4 from parent (making parent a 2-node): 20 / \ 10 30 / \ \ \ 4,5 11 ... case 2: special case 2: what if the parent was a 2-node? the underflow will go (cascade) up the tree (same as overflow - no more than O(log n) times). BTW if underflow reaches the root, the root will be deleted, and tree will shrink. delete 5 from 20 / \ 10 30 / \ \ \ 5 11 ... just delete 20 / \ 10 30 / \ \ \ ? 11 ... fuse and take 10 from parent 20 / \ ??? 30 / \ \ 10,11 ... ??? is underflow, fuse and borrow 20 from parent ??? / 20,30 / \ \ 10,11 ... remove ??? (underflow at root - just get rid of the root 20,30 / \ \ 10,11 ... In conclusion: 2-3-4 trees are always balanced - no need to keep track of height and balance search, delete, insert all O(log n) simple algorithms. BUT have 3 different node structures Red-Black tree --------------