Talk:Binary search tree/Archive 1

"Value" versus "Key"
The definition and description use the term "value", and some of the code examples use the term "key". I believe they are interchangeable in this article. We should pick one and be consistent. I propose we pick "key". I think it is more precise. —Preceding unsigned comment added by Adamuu (talk • contribs) 21:20, 21 July 2009 (UTC)
 * (This issue is hardly unique to this article.)  " ...each node has a comparable key (and an associated value)'  certainly implies that they are distinct, but what - a tree is not a hash!

68.183.92.210 (talk)

Nodes with Equal Sort Value
Does anyone know what a tree with duplicate sort values looks like? -Smoke003723 —Preceding unsigned comment added by 203.143.250.6 (talk) 10:41, 7 July 2008 (UTC)

"Introduction to Algorithms" by Cormen, Liesersonand ,Rivest define the binary search tree as: "Let x be a node in a binary search tre. If y is a node in the left subtree of x, then key[y] <= key[x]. If y is a node in the right subtree of x, then key[x] <= key[y]" (Section 13.1 p 245 16th printing 1996). (talk) 20:13, 28 February 2010 (UTC)

bst
Shouldn't the definition of BST be that every node's right subtree has values greater than or equal to the node's value? The current definition says it's strictly greater, which isn't the case.
 * You can put equal nodes either on the left or the right, but not both. It's an arbitrary decision. I'll say this somewhere. Deco 01:17, 12 May 2005 (UTC)

I disagree. Here is an excerpt from the lecture I once took at UC Berkeley: "A binary search tree satisfies the _binary_search_tree_invariant_: for any node X, every key in the left subtree of X is less than or equal to X's key, and every key in the right subtree of X is greater than or equal to X's key.". The full lecture can be found here (http://www.cs.berkeley.edu/~jrs/61b/lec/26). I've also verified this with my reference book on algorithms.


 * I'm sorry, it can be done this way (although there are authoritative references which don't do it this way). It makes some operations more difficult, such as finding all the nodes with a given value, but it really makes almost no difference which way it is defined. In the article I've used your definition, then clarified that in some definitions equal values can be excluded from the left or right side. Deco 01:02, 14 May 2005 (UTC)

By this definition self-balancing binary search trees are not binary search tree. Rotations do not keep this _binary_search_tree_invariant_. To be concrete: Consider an empty AVL tree and insert three equal values. After the rotation there are equal values to both sides of the root node. So you either need to exclude equal values from binary search trees or allow equal values to both sides of a node. E.g: "The left subtree of a node contains only values less or equal to the node's value. The right subtree of a node contains only values greater than or equal to the node's value." 141.84.26.134 01:04, 27 February 2007 (UTC)

It is confusing that the article starts by claiming that equal nodes should always be in the right subtree, while the example image puts equal nodes (such as the seven, which equals the root) in the left subtree. 128.84.153.196 13:29, 29 September 2006 (UTC)

Copyright violation?
Why does the blurb about Optimal BSTs have a reference to "section 15.5 of Introduction to Algorithms" ? Is this text copied and pasted out of a book? Is this a (c) violation?
 * Copyright does not cover ideas but an explicit realization of those ideas (see idea-expression divide). The text here, written by me, was based on my understanding of the material in CLRS and not copied verbatim from that book. Deco 00:41, 15 December 2005 (UTC)

Why are the examples in Python?
I know that the example algorithms are merely for show - but wouldn't examples in a more mainstream language like C/C++ or Java be better for the purpose of understanding? The Phython deletion algorithm, for instance, uses return structures that are foreign to C/C++ or similar languages.
 * If you think a mainstream programmer might find them difficult to understand that's a legitimate concern. I agree that "find_remove_max" is quite obscure in its use of tuples of two return values. On the other hand, I think the use of tuples or records combined with pattern matching really abbreviates things like the insertion and search algorithms, while still being pretty clear, which is a pretty good reason to use them instead of C/C++/Java. I'm going to look at rewriting parts of the deletion code in a more intuitive way... er, as soon as I learn a little Python. :-) Deco 22:27, 15 February 2006 (UTC)
 * I decided to write "destructive" versions of insert and delete in C++, which lends itself quite well to this purpose, as you can see from the relative simplicity of the resulting code. I left in the Python insertion to demonstrate the concept of persistent trees, but took out the excessively complex Python deletion example. I hope this makes things clearer for you! Deco 03:48, 16 February 2006 (UTC)

Yes - thank you very much. The C++ example is much more clear to me - and you are right, it was the returning tuples that was causing my head to spin. --208.186.134.102 06:00, 16 February 2006 (UTC) Wouldn't the code in the new C++ delete example cause memory leaks when you set "node=node->right" and don't delete node->right? (or node->left)... (thanks for adding it... it's a good, simple example. very clean) ++Ryangardner
 * Good point. I've gotten too used to garbage collection. :-) Thanks for your positive feedback. Deco 06:27, 16 February 2006 (UTC)

I see a few other (minor) flaws in the algorithm on the page. If you are deleting a leaf node, you must remove the link from the parent of that node to the node itself. I also think that the delete method as it is implemented would cause an error, because you are deleting the node and then trying to get data from it... Ryangardner

Slightly off topic, but could the python examples be made clearer by the use of exceptions instead of returning numbers (eg. -1, -2)? --RohanMitchell (talk) 00:10, 11 June 2008 (UTC)

I also find it bizarre & irritating that the examples are in python. C is the appropriate choice here. —Preceding unsigned comment added by 124.168.85.52 (talk) 14:53, 30 August 2010 (UTC)

I can't understand the "Traversal" example because I don't know what "callback" is or does. An example in another language is needed.

68.183.92.210 (talk) — Preceding undated comment added 00:22, 15 May 2014 (UTC)

This seems like a discussion without proper arguments. People prefer different languages. I find Python more readable. I don't think the article should have any examples in concrete programming languages. Pseudocode should be more than enough, this is an encyclopedic article, not a coding cookbook. I think we should try making it as easy to understand for non-computer-scientists as reasonably possible. Blocks of code don't really help with this objective. --192.35.17.12 (talk) 15:06, 4 March 2016 (UTC)

Merge with binary search algorithm
It seems to me this article details the same concept as Binary search algorithm. I suggest a merger.He Who Is 01:59, 25 May 2006 (UTC)


 * No, they are very different. The binary search algorithm is an algorithm that works on sorted arrays.  To use it, you need an array and you need to sort it beforehand (which is expensive).  The binary search tree is a data structure.  Unlike arrays, binary search trees do not preserve order of insertion.  Binary search trees are used because they have the advantage that the tree structure encodes the relative ordering of the elements (i.e. it is always "sorted"), without needing to sort. --Spoon! 00:04, 24 August 2006 (UTC)

Errors
The deletion algorithm should check for the case of both left and right being NULL before checking either side.

No, the code works fine - I tested it. If both sides are NULL, then instead of assigning the other side it will assign NULL (after deleting the node) - which is correct behavior. —Preceding unsigned comment added by 68.207.121.5 (talk) 05:35, 9 November 2007 (UTC)


 * Indeed, the deletion doesn't seem to work. Could anybody fix this? Jogers (talk) 15:14, 6 September 2006 (UTC)
 * Yup, it does seem to be broken. Here is some code that should work.  It compiles, but I don't feel up to verifying that it works 100% by actually testing it.  It will be better than what's there, though...

void DeleteNode(struct node* node) { struct node **parentLink; // If there is no parent, we are a root node if (parent == NULL) { free(node); return; }     // Figure out which child of the parent we are if (node == node->parent->left) parentLink = &(node->parent->left); else if (node == node->parent->right) parentLink = &(node->parent->right); // This will handle cases where both children are null. if (node->left == NULL) { *parentLink = node->right; free(node); // Free any memory allocated for the node } else if (node->right == NULL) { *parentLink = node->left; free(node); // In C++, delete node could be used here } else { // Follow the right side of the tree until there is an empty space struct node* temp = node->left; while (temp->right != NULL) { temp = temp->right; }         node->value = temp->value; DeleteNode(temp); } }


 * Wolever 02:50, 31 March 2007 (UTC)

I worked on understanding the code in the article and found some issues. I described them on my blog, see http://kasterma.wordpress.com/2008/06/15/the-wikipedia-entry-on-bst/ 20:50, 15 June 2008 (GMT+1)  —Preceding unsigned comment added by 83.117.8.28 (talk)

I hate to keep just pasting code around like this, but here's my take on the C version of removeNode. Tiger of Doom (talk) 01:33, 26 June 2008 (UTC)

I implemented a recursive delete function - since BSTs are recursive abstractions I think that the deletion function in the article should use recursion. Enzoferber (talk) 16:55, 23 April 2015 (UTC)

Does it necessarily need a total ordering?
C++ STL's "Sorted Associative Containers" (which includes "set", "multiset", "map", and "multimap" classes, all of which are binary search trees) only requires its key types to be a strict weak ordering: http://www.sgi.com/tech/stl/SortedAssociativeContainer.html --Spoon! 00:08, 24 August 2006 (UTC)
 * You're right, a strict weak ordering suffices to order the tree and enable useful operations. The details behind this are rather complicated though and I think would be best relegated to their own section. Deco 04:09, 24 August 2006 (UTC)

Optimal binary search trees
In the section about Optimal binary search trees, it is mentioned that: We can then use a dynamic programming solution, detailed in section 15.5 of Introduction to Algorithms by Thomas H. Cormen Sec Edition, to construct the tree with the least possible expected search cost. I wish that someone who has the book (or has a nearby library from which he can borrow it) could take the time to write about it.

Thanks. --Hosam Aly 20:26, 7 January 2007 (UTC)


 * And the given code-piece "procedure Optimum Search Tree" is obviously incomplete. --Nomen4Omen (talk) 21:00, 22 December 2012 (UTC)

Insertion
I think the insertion code is not very clear, so I suggest this one, written in pseudocode:

x is current node, y always points to x's parent.

Tree-Insert(T, z) y = null                        //x will be set to root, root's parent is null x = root(T) while x != null do             //we will traverse the tree until we find the leaf y = x                       //we are going to change x, so store x as it will became next parent if key(z) < key(x)          //if the value of inserted node is lesser then actual node value, move left then x = left(x) else x = right(x) //now we should have y pointing to the leaf, and x to null (notice when the while loop terminates) parent(z) = y                  //this leaf will become our parent if y = null                    //this is a special case, when we are inserting into an empty tree then root(T) = z             //so the inserted node will be the root node else if key(z) < key(y)         //tree is non-empty, so insert the node then left(y) = z       else right(y) = z

Kremso 17:15, 11 January 2007 (UTC)
 * Heck, why not just re-write all the code in psudo code? It will keep people from directly copy+pasting it in to their code (which generally won't work so well) and it is quite a bit easier to understand.  Wolever 02:50, 31 March 2007 (UTC)
 * I agree. Pseudocode would be much better than the current two-language mess (but it would be even worse if we had Python, C and pseudocode). LittleDan talk 04:39, 15 April 2007 (UTC)


 * I disagree. Using a specific language like C or C++ makes it clear how the memory is being modified. If you use pseudo code it might not be clear the operations, because generally much of the traversal code deals with double pointers (or references to pointers). You want to be able to modify not only the children of the node, but also may want to NULL out the node itself (after freeing it), so any other nodes that point to the node will point to a NULL reference. That might not be clear if using psuedo code. —Preceding unsigned comment added by 68.207.121.5 (talk) 15:18, 10 November 2007 (UTC)

The C++ code shown in the example is basically a C code with stripped structs, can we add the structs and call it C? Using pointers and not using templates makes it an eyesore for a C++ programmer. — Preceding unsigned comment added by 77.92.12.214 (talk) 07:15, 1 December 2012 (UTC)

Deletion
"Once we find either the in-order successor or predecessor, swap it with N, and then delete it. Since both the successor and the predecessor must have fewer than two children, either one can be deleted using the previous two cases. In a good implementation, it is generally recommended[citation needed] to avoid consistently using one of these nodes, because this can unbalance the tree."

I disagree with this. Depending on which side of a node the equal values go to (left or right), you can only base this deletion schema in predecessor OR sucessor (not arbitrary). There are cases where taking the wrong choice will make equal values appear to the side they shouldn't be in. 13:11, 5 June 2007


 * I think the point is mute, as well. If you were truely concerned with balanced trees, you would use red-black or AVL trees. Otherwise, it is pretty hard to ensure balanced trees unless you carefully insert the whole dataset in a specific order —Preceding unsigned comment added by 68.207.121.5 (talk) 15:21, 10 November 2007 (UTC)


 * I'm pretty sure you're right if you allow equal nodes in the tree, you have to be consistent in your insertion and deletion policy. If you disallow them, then it's pretty obvious that you'll get an unbalanced tree if you consistently choose to fiddle with the left or right subtree.
 * Aside: Am I the only one that thinks Wikipedia is getting a bit funny with the citations for the bloody obvious? It really does feel like http://bash.org/?757724 sometimes. (`Here is my impression of Wikipedia. "There are five fingers on the human hand [citation needed]"') 129.11.126.179 20:05, 12 June 2007 (UTC)

I'm just wondering whether the example code for Deletion is correct... couldn't a tree be built such that it would lose additional nodes given the example code (e.g. if your nodes are integers and you add them via the above code in the order 6 3 5 4 and then delete 5 would you not then lose Node 4 as well?

The example seems to fail when the deleted key is not found in the tree. It also acts differently than the upper image (the example deletes the minimum of the right side, where the image deletes the max of the left side). — Preceding unsigned comment added by 79.177.180.19 (talk) 18:27, 5 February 2016 (UTC)

I created an iterative delete function for a binary search tree:

omega n squared worst case for insertion?
The current text says:


 * In either version, this operation requires time proportional to the height of the tree in the worst case, which is O(log n) time in the average case over all trees, but &Omega;(n2) time in the worst case.

Isn't Omega(n) the worst case?

-- nyenyec &#9742; 23:15, 10 November 2007 (UTC)
 * Yes, assuming n is the number of items already in the tree. It's &Omega;(n2) in the worst case to insert n items, and that's probably where the mix-up came from. Dcoetzee 23:24, 10 November 2007 (UTC)

Clarity Rewrite
I'm a student at Cal Poly State University San Luis Obispo, and as part of a class on teaching technical subjects we were asked to find an article that was unclear or otherwise inaccessible for the wider audience and clarify it, with an emphasis on improving understanding from a teaching point of view.

Most of the changes I am making are directed towards improving the _clarity_ of the subject matter, and NOT an attempt to make them more factually accurate.

One large change I made is to remove the section on BST's being sets/multisets from the introduction. I'm sure most people reading this article aren't interested in graph/set theory, so that information is more pertinent in other articles. Feel free to reintegrate that information if you'd like, but in my opinion it doesn't belong in this article, or if it does it belongs in it's own section.

My wiki markup skills are not very good, so feel free to correct any incorrect wiki-links I may inadvertently create. I've done my best to make them at least link somewhere correct, but I can't get the hang of linking to sections in other articles. Mgius (talk) 23:14, 26 February 2008 (UTC)

Improved Image
the left subtree of a node contains only values less than the node's value;

So the image can contain one or more duplicated values, to show where they go. —Preceding unsigned comment added by 79.36.54.169 (talk) 18:02, 12 June 2008 (UTC)

The search tree may become:

This allows to look for None items too

External link
I have done several articles on BST, which can be located at [http://simpleprogrammingtutorials.com/tutorials/bst-overview.php Binary Search Tree Tutorial. Java and C++ implementations]. I am asking senior wikipedia editors to approve this link and add it to the current article. Thanks in advance. —Preceding unsigned comment added by 92.39.232.52 (talk) 23:12, 19 January 2009 (UTC)

Iterative search algorithm
I added an iterative algorithm to search a BST in C++. I also made the intro clearer. I am pretty sure the code is correct, but any tips would be helpful. GRHooked (talk) 05:15, 12 July 2009 (UTC)

this recent edit
Regarding the commit comment, there is no way for val to be neither equal to, less than nor greater than next->value, well, technically, there is. What if either value is NaN? -- RoySmith (talk) 18:45, 1 August 2010 (UTC)


 * Floats with NaN can't be ordered, so they're not valid values to put in an ordered data structure. --192.35.17.12 (talk) 13:02, 4 March 2016 (UTC)

Observations
Hi, What do you think of adding an observations section with regard to the data structure? it's like deductions based on its state. for example, to know the expected range of numbers below you, if you're a left child, go up to get the end, and then left to get the beginning of the range.

more observations, every node's children are already sorted, so if you delete the node's parent, you can add that node to the tree's root. just one insertion. this would be simpler for people to understand, though not most effective. Another observation is how to get the next or previous. for example, go to the right child and then choose left repeatedly to get the next item. if you don't have a child, then it depends if you're a left or right child. if you're a left, then your parent is the next. if you're a right, you need to go up until you see the first node that is a left-child, and then get its parent. etc... What about modified structures where the root contains the 'count', or has 'max' and 'min' fields, so it can have fewer operations in such cases? (for example, directly adding to the max reference, instead of recursing the tree)

So should I edit away? or how things really work here? sorry, still a beginner. I don't have much background in CS, but I wrote a binary-tree vocabulary once.

worst-case theta(n^2) for sorting
I changed this from theta(n^2) to O(n^2).. it does not make sense for something to have worst case complexity defined by theta notation. Theta notation bounds from above and below by a constant factor so saying that it is theta(n^2) implies that it is at least n^2 time (e.g. big-omega). —Preceding unsigned comment added by 128.12.16.152 (talk) 07:29, 7 December 2010 (UTC)

There is no parent pointer (there shouldn't be)
The Python code assumes a parent pointer. The definition of a BST does not include a parent pointer. --Ysangkok (talk) 16:18, 31 March 2013 (UTC)