Talk:AA tree

is this right?
I'm not sure this is right: "AA trees require O(log(N)) bits of metadata per node, in the form of an integer 'level'." If this were true, then to store 32 billion nodes or so I'd need a 32-bit level. But what is level, anyway? It's basically how many child nodes you've got until you hit a leaf node (with leaf nodes having a value of 1), if I'm not mistaken. Don't take my word for it since I am learning this for the first time, but I'm thinking a balanced binary search tree with 4 billion nodes can have a maximum level of 64 or so, i.e. roughly double log(N). If we ignore the estimated factor of 2 (I could be way off here, but this is what I think) then the number of bits required by the level should be O(log(log(N))). This article is in critical need of the attention of an expert. What kind of complexity guarantee's do AA trees have?? It is not always perfectly balanced, but in what way does this affect performance? Is this article the same as the trees listed here? http://eternallyconfuzzled.com/tuts/datastructures/jsw_tut_andersson.aspx

(not the author of the question above) Also noticed the missing log and fixed the article. It's definitely O(log(log(N))). — Preceding unsigned comment added by 94.245.87.79 (talk) 12:27, 5 July 2019 (UTC)

aa
There seems to be lots of errors with this article : Insert does not have a while loop. Insert compares the new data item with ones already in the tree, so it does not cater for data types without an ordering. -Nic Roets 09:53, 15 July 2006 (UTC)

Recursion
I see one of the editors argued that recursion should be freely used in the example code, and the code I replaced used recursion. My opinion is that recursion should NOT be used the sample code because a reader may just want to implement the code in assembly. Recursion may also significantly increase the space requirements of the code.

Comment on above: AA-tree major feature is algorithm simplicity. Current "assembler-friendly non-recursive" example code does major disservice to public by obfuscating that. I recently implemented AA-tree using online resources and code in current wikipedia page was totally useless in either understanding the algorithm or helping to write the code. I think whether the example code is recursive/non-recursive does not matter at all. Only thing that matters if the code illustrates underlying algorithm. And there the current code fails absolutely. —Preceding unsigned comment added by 217.159.130.170 (talk) 12:25, 16 October 2007 (UTC)

If people want to implement in assembly there are other resources on the webnishantjr (talk) 18:39, 11 December 2008 (UTC)

Psuedocode
I totally agree with the above. Look at the Wikipedia page on algorithms. This stuff should be explained at a high level. I think this page should be rewritten with pseudocode, which is what I may or may not be about to do, depending on whether or not I can follow through on it. I just implemented the data structure in Haskell, and this page's description of the algorithm basically conflicts with the description of the original inventor when it comes to node deletion. —Preceding unsigned comment added by 18.248.3.158 (talk) 20:59, 9 December 2007 (UTC)

I rewrote it according to the above. I deleted the example deletion, as pretty as it was, because it was no longer consistent with the algorithm described on the page.

Rewrite
I think the recent edits (176876788 onwards) are a step backwards. I approve of the move to pseudocode in principle but I don't think it's been done well. Criticisms of the page rewrite include:


 * Removal of the brief definition at the beginning of the article is unhelpful
 * The 'five invariants' given do not insist that the level of a node must be at most one less than the level of its parent
 * The pseudocode for skew and split implies that the entire tree is passed as the parameter
 * Insertion: 'the increment of level(T)" is not the only thing that could necessitate recursion up the tree
 * I don't like the pseudocode in general. It's not really pseudocode anyway as it's full of implied functions. e.g. nil(T) should be phrased as "if T is the empty tree". Also what does node(X, 1, Nil, Nil) mean? See http://en.wikipedia.org/wiki/Pseudocode#Examples_of_pseudocode.
 * Insert pseudocode: there's no need to assume the existence of the functions 'successor' and 'predecessor'. AA trees need not be threaded.
 * The preamble before the deletion pseudocode is totally misleading. There is certainly no need to skew and split "the entire level" - remember the operations are all O(log(n)) for a tree of n nodes. It's unnecessary to criticism Andersson's deletion algorithm as being unoptimised. The sentence "This approach was favored, because when laid down conceptually, it has three easily understood separate steps:" sounds like it's taken from a dissertation and is not neutral. Also "easily understood" is likely to make the reader feel inadequate. Even for people who have a thorough grounding in advanced data structures, it is by no means obvious that you need at most three skews and two splits during each iteration of the rebalance after deletion.

I suggest a revert. Dragonmeat (talk) 23:43, 10 December 2007 (UTC)

Rewrite can be done
The pseudocode is difficult to understand, I implemented the tree in Haskell (which seems to outperform Data.Map!, still testing), but it wasn't simple as stated. This wasn't because the tree is very complex. The pseudocode contains a lot of implicit assumptions. For example in function split:

We have two horizontal right links. Take the middle node, elevate it, and return it. R = right(T) right(T) := left(R) left(R) := T       level(R) := level(R) + 1 return R

Note: := is a assignment (the pointer is dereferenced), and = is a pointer assignment. (A hint can be found in skew)

R = right T -- save pointer to right right T = left R -- The right node of T is replaced by the left node of the right node of T left R  = T -- The left node of T is replaced by T level R  = level R + 1 -- The level of the right node is incremented by 1 return R -- Return the modified right node

I didn't spotted := and = at first, so especially the level R = level R + 1 line became somewhat difficult to grasp. Do I have to change the level of right(T), the original value, or R, the modified value? This partially because in the case of Haskell := and = are the same operation. They replace a value. I had to implement the tree to understand the code. And tested it rather thoroughly to spot the bug.

Here is the situation in haskell: if you didn't catch the difference between := and =. the level (R) + 1 could also be placed here: split :: a :- b -- AATree a b -> AATree a b        level(right(right(T))     level(R) := level (R) + 1           /                                                    /                                    /                       / split t@(Node d a b l (Node dr ra rb rl rr@(Node drr _ _ _ _))) | d == drr  = Node (dr + 1) ra rb (Node d a b l rl) rr                    | |              |   |                      | otherwise = t                    | right (T) = R  |   right (right (T)) = right(R)                    | left (T)       | left(right(T)) = left(R) {-- Catch all other cases --} split t = t

In haskell it is easy to not see := and =. There are probably more languages, which don't have pointers or references.

Besides that, the algorithm seems to be correct. It nicely matches the behaviour of a binary search tree.

I became somewhat overly verbose, but what I wanted to say was, that I don't think a revert is necessary. The page just need some explanation.

--195.241.100.248 (talk) 17:13, 19 July 2010 (UTC)

Name
The article should start mentioning what AA stands for, I am guessing Arne Andersson but I don't know. 80.203.66.120 (talk) 19:31, 8 August 2008 (UTC)
 * I added that as I saw that it had previously been removed from the article without stating reason. 80.203.66.120 (talk) 21:17, 8 August 2008 (UTC)

Wrong assumption regarding removal of internal nodes
"Because of the AA property of all nodes of level greater than one having two children, the successor or predecessor node will be in level 1, making their removal trivial."

This sentence is either false or misleading (depending on the meaning of "or"). The successor/predecessor does not have two children, but it might still have one, making removal nontrivial. Removing an internal node may involve two swaps with a successor/predecessor. For instance, in the following tree, removing 0 may require two swaps (0 with 1 and 1 with 2):

0   / \  -1   1        \         2

Koning robot (talk) 08:05, 25 May 2011 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 1 one external link on AA tree. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20071022052324/http://www.softpedia.com:80/get/Others/Home-Education/AA-Visual-2007.shtml to http://www.softpedia.com/get/Others/Home-Education/AA-Visual-2007.shtml

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at ).

Cheers.— InternetArchiveBot  (Report bug) 23:53, 30 September 2016 (UTC)

An alternative way of providing a new node during insert
I was looking into adapting this material to use in a garbage collector, to access blocks. I realised that, in this use, I would already have the data structure around when the code needed to do an insert and that it would be unsuitable to call a helper function like node to get it. It struck me that the way to go - for my needs - is to pass a pointer to that data structure to the insert as yet another parameter, that it would either use directly or pass further on via one of the recursive calls of insert.

So far, so good, for my own purposes. But I was wondering if it was worth covering this alternative in this article, or if it would be too distracting and/or superfluous? If it is covered, I could do it, but I would need to show that the supplied structure:-

- was not used more than once; and

- was used once, i.e. that cases like a key already existing were handled OK, with something there to discard the data structure if that had to be done (this is a non-problem for my current purposes, but it might be for other work).

Thoughts? PMLawrence (talk) 13:19, 4 April 2023 (UTC)


 * Has this technique been published anywhere? Wikipedia isn't for original material. MrOllie (talk) 13:31, 4 April 2023 (UTC)


 * I get that, but it's not about the technique as such - it's basically a limited case of "passing the world" as a parameter, used in monads, so it's not new - it's about whether covering more than one variant would make for clutter in a write up, what with having two variants of insert around and/or flabby comments saying "... but if you do this instead ...". PMLawrence (talk) 14:04, 4 April 2023 (UTC)

The issue of passing a new node to the insert code from outside it came up in a project I am very occasionally working on. Something related but trickier came up in the delete code, and I thought it was interesting enough to let Professor Andersson know about it (if my emails ever get through!). As it might be relevant to others interested in this article, here is the body of that email:-


 * I thought you might be interested to hear of something that recently came up in the design stage of a garbage collector for a concatenative programming language, a low priority and amateur project in odd pieces of free time. There is something common to both your example code for deletion and the variant of it at wikipedia: they assume that the key value is independent data that can be simply amended on its own. However, in this use - accessing blocks according to their size, location and free/allocated/live status - the key will have to be generated from those attributes and cannot change independently. This is only a minor complication, though still a real one, as it will be straightforward to make deletion work by cutting and pasting the replacement leaf node in place of the node to be deleted and then updating its other fields. The main issue will be avoiding bugs, which is harder when there are several things to do.

Perhaps I should emphasise that "cutting and pasting the replacement leaf node in place of the node to be deleted" involves changing pointers to it as well. Or is that teaching people to suck eggs? PMLawrence (talk) 03:49, 13 May 2023 (UTC)