Talk:Set (abstract data type)

Contains vs. Belongs
In set theory, the terms "contains" and "belongs" have well-established, non-overlapping standard meanings. The predicate "contains" ($$\supseteq$$) is strictly a relation between two subsets of the same set: "A contains B" means "B is a subset of A". The relationship between a set and its elements ($$\in$$) is read either "belongs" or "is in": "x belongs to A" is the same as "x is an element of A". The symmetric predicate ($$\ni$$) is rarely used, and may be read "has" or "owns", never "contains". So, to avoid conflict with the mathematical tradition, methinks that the best choice for the abstract set type operations would be
 * &equiv; $$x \in S$$, and
 * &equiv; $$S\subseteq T$$.

All the best, --Jorge Stolfi (talk) 23:58, 22 May 2009 (UTC)

Setoids
What is a setoid? Is this appropriate for the page that is linked to describe std::set?

Bunches and packaging
I would suggest to remove the part about bunches and packing, at least from the introduction. As far as I can tell this are no widely used concepts and I could only find the one paper cited as reference from 2001, A theory of bunches, that introduced the concepts.

Four basic data structures claim
I have added a 'dubious - discuss' to the claim that "there are four basic data structures". This claim has two citations, but they are from the same author (Hehner). I looked at the second citation, here it is:

Eric Hehner, copyright 1993 to 2018, this is a 250-page book

This segment of the Wikipedia article is a very brief summary of Chapter 2 of the book linked above. I read the whole chapter, it is very interesting. I should point out a few things:


 * 1) What Hehner calls a "string" is explicitly neither null-terminated (as in the C programming language) nor length-prefixed (as in the Pascal programming language) nor anything more complex (as in most "modern" programming languages - for instance, object-oriented languages often make strings first-class objects with methods, and a user-defined class can often inherit from the string class). What Hehner calls a string is very much like a string in a "real" programming language, but it is more abstract and more basic, so the length is not stored, but the length must be known in order to actually concatenate strings or print the whole string and stop at the end exactly. Hehner assumes it is possible to concatenate strings and print strings, but that is an implementation detail that is explicitly not covered in his book and will be left up to the implementer.
 * 2) These "four basic data structures" are for storing data elements that are of fixed size or of unimportant size. If you want to be really abstract, you can imagine the data elements as being of unimportant size. So perhaps they are integers, but they can be arbitrarily large, they are not necessarily 64-bit integers, if you want a bigger integer you can use it. That's fine, the chapter is still internally consistent and valid. The four data structures are equally valid (and easier to imagine) if you use data elements of fixed size: 8-bit characters from the ISO 8859-1 (Latin 1) character set, 32-bit characters from UTF-32, not UTF-16 (that may contain some 32-bit characters for high codepoints), booleans, integers of fixed size, floating point numbers of fixed size, or even JPEG image files padded to be all the same size. The chapter is not valid if the data elements are of specific size but variable size, such as a string of UTF-8 including some one-byte ASCII characters and some emoji that are four bytes per character (the classic smiling face emoji is "\xF0\x9F\x98\x8A" in UTF-8), or if you want to make a list of strings and the strings are different lengths.
 * 3) The definition of "basic" is actually clear, in the book (not in this Wikipedia article): You are only allowed to do two "advanced" things, you are allowed to package data elements and you are allowed to index data elements. Because two operations are allowed, this makes a 2x2 matrix of data collection types, and these are the four resulting types. Aside from the bunch data collection type, which is notable mainly for being useless and is mentioned to show why the other three are far more useful (and for completeness), the other three data collections are indeed present in most programming languages. But if you allow data collections to have more properties, you get far more than four data collections that seem fairly basic. One that immediately springs to mind is the map, hash table, or dictionary, which is a set of key-value pairs. The set of key-value pairs with fast lookup given a key is a very important data collection, but it is not "basic" in the sense of this chapter of this book. In the setting of a real programming language, packaging and indexing are not the only two choices in a collection of data elements. Is it mutable? Is it pass-by-reference or pass-by-value? Is it a first-class object (important in object oriented paradigms)? If evaluated, is it allowed to cause side effects (important in functional programming paradigms)? Is it thread-local (important in parallel programming and asynchronous situations)? And of course, how is it stored in memory?

All that said, it is interesting to see how the "set" data type is fundamentally similar to the "string" and "list" data types, and that "list" is kind of like the Cartesian product of "set" and "string". As a result, you can find the intersection of two lists (just like two sets), and you can concatenate a list with another copy of itself (just like a string), but sets only support the first operation and strings only support the second operation. The book lists "axioms" and "operations" for each of the data collection types, and it is interesting in my opinion.

But the claim that there are exactly four "basic" ways to collect data is misleading at best. Especially because one of the four is the degenerate non-collection, which is neither packaged nor indexed. That's kind of like saying: "There are four types of government: democracy, dictatorship, hybrid democracy-dictatorship, and countries with absolutely no people. Notice that countries with exactly one person qualify as dictatorships. Only the empty country has the fourth kind of government. The Moon currently has such a government." Democracy is a useful term, but how often you vote is a very important "implementation detail", much like the null-terminated string versus length-prefixed string. Fluoborate (talk) 07:50, 18 October 2018 (UTC)


 * @Fluoborate, I do agree with all what you say here (including the interest of this classification in itself). Simply, the claim as said in the article is in my view not only wrong as is, but does not fit in this article, less so in the introduction (it is also lengthy), even more less so since it is the view of a single author, thus just does not belong to Wikipedia. As a consequence, and since the editor who originally introduced this claim does not manifest him- or herself here to support it (and give more references), I will just erase it. denis &#39;spir&#39; (talk) 08:29, 7 February 2019 (UTC)
 * EDIT: Done. But provided a definition and link to multiset. denis &#39;spir&#39; (talk) 08:35, 7 February 2019 (UTC)

Sorted Set
The definition of this term is missing. I know many people would be telling that it's obvious. If it's obvious, why not define it somewhere, e.g. here? There's a good Oracle definition - but that's just Java.

Vlad Patryshev (talk) 23:09, 14 November 2020 (UTC)

Move discussion in progress
There is a move discussion in progress on Talk:Associative array which affects this page. Please participate on that page and not in this talk page section. Thank you. —RMCD bot 00:32, 27 January 2022 (UTC)

India Education Program course assignment
This article was the subject of an educational assignment supported by Wikipedia Ambassadors through the India Education Program.

The above message was substituted from by PrimeBOT (talk) on 20:11, 1 February 2023 (UTC)

Adding SETL to the list of language support
In the language support section it might be pertinent to add SETL. While it is no longer in any real use, it is described as a Set theoretic programming language and it's core types are unordered sets and sequences. It contains universal and existential quantification as keywords, set-based functions like union etc, and set-builder notation :

EverythingIsPhine (talk) 03:43, 23 December 2023 (UTC)