Talk:Apriori algorithm

Set Notation
Is it correct to surround the items with curly braces when they are pairs or triplets? Most people reading the article will know what is meant, but strictly speaking the set {1, 2, 3, 4} does not contain the set {1, 2} (this one does: {{1, 2}, 3, 4}). So in the tables in the example, I'm saying that under the "Item" column, we should not say {1, 2} but maybe rename the column to Items and just list 1 and 2 without the curly brackets. — Preceding unsigned comment added by 199.212.215.11 (talk) 17:26, 30 May 2012 (UTC)
 * I tried to improve this example and make it always clear by saying "x is a superset of y" rather than "x contains y". --a3_nm (talk) 13:57, 1 January 2013 (UTC)

Removal of important content
The first version of this page included a good definition and pseudocode that I wrote. I now see that, some wise person has removed the pseudocode, where is the algorithm? Without an explanation of the algorithm, this page is useless. I'm going to add it back, this is unacceptable Exa (talk)

It would be very useful to include a bit more context here, and perhaps a more conceptual (as opposed to mathematical) definition. Also, if we use variable/set symbols, they should probably be defined explicitly within the article. As it is written now, I suspect the only people who would understand the article are those people who are already familiar with the topic (i.e. those who already understand it, and thus don't need the article).

12.64.120.218 15:34, 31 May 2004 (UTC)

I second that! I was going to say an implementation in C or Java or something would be helpful.

-- Just added a much better example as opposed to the vague subsetting thing that was there before to help give a more concrete explanation. VSEPR (talk) 16:25, 16 February 2009 (UTC)

Page title
AFAIK, this is the Apriori algorithm, but the page is entitled A priori algorithm. I'm going to move it. --aciel 16:57, 30 August 2005 (UTC)

The Apriori algorithm for finding frequent itemsets is a little outdated. There are more efficient algorithms for finding frequent itemsets. Here are some good sources:

http://fimi.cs.helsinki.fi/fimi03/ http://fimi.cs.helsinki.fi/fimi04/

Well, the apriori algorithm might be outdated but a) this page is about that algorithm! and b) not necessary to state, but it is the first significant algorithm, and the basic idea is used again and again in several succeeding algorithms so it is important to understand it.Exa 18:33, 16 May 2007 (UTC)

Could you include references to other algorithms also in the article, pleaseAndthu (talk) 14:16, 2 January 2008 (UTC)

Example
Please someone generate the "list of all 3-triples of the frequent items" at the end of the example --ooscarr (talk) 07:19, 17 January 2011 (UTC)

Done. Not exactly, because there is no such list, so it is explained.

--5/3/2012: Done a little better. I generated the list of all possible "3-triples" (triplets), which was only one entry -- {2,3,4}, but fell below the minimum support of 3/7. The paragraph explaining this was wrong... it mentioned {1,2,4} which wouldn't have even been checked because {1,4} was discarded in the previous step.

Can somebody please introduce the concept of a confidence threshold as well to the article? — Preceding unsigned comment added by 74.192.12.203 (talk) 06:28, 3 May 2012 (UTC)

Boilerplate messages
confusing: Lots of jargon (some of which is redlinked), without a good explanation of what it's supposed to mean.

wikify: The algorithm block is ugly. It's set in monospace with LaTeX blocks on part of the line, meaning bits are rendered as images and others are rendered as plain text. Not really a suitable format. Would suggest stripping the leading spaces, instead using the :-prefix, along with making full statements in math markup (spacing commands are available, such as \,).

cleanup-context: There is one sentence explaining in brief that it's an algorithm, then dives deep into the technical detail. Needs a much better WP:LEAD section, and I'm not in a position where I can write one.
 * PS: ironically, \LaTeX and \TeX are not supported by the LaTeX extension.

-- 81.104.165.184 22:13, 22 May 2006 (UTC)

Article Cleanup Co-Ordination Point
{| style="width:100%;background:none" ! bgcolor="#abcdef" colspan="2" bgcolor="#abcdef" | Cleanup Co-ordination The article may have been flagged as needing cleanup because it has been suggested that: For a full list of possible problems see Manual of Style.
 * width=60 bgcolor="#ffdead" |[[Image:Janitor's bucket with mop.jpg|100px]]
 * bgcolor="#ffdead" | This article has recently been tagged as requiring cleanup to meet Wikipedia's quality standards.
 * bgcolor="#ffdead" | This article has recently been tagged as requiring cleanup to meet Wikipedia's quality standards.
 * the article needs formatting, proofreading, or rephrasing in comprehensible English.
 * the article has multiple overlapping problems.
 * the article is very short and might need expanding, removal or merging with a broader article

As part of the cleanup process, the automated bot PocKleanBot has generated this notice as a focus of cleanup efforts, and also contacted several contributing editors of the article to bring their attention to the problem. You should use this section to discuss possible resolution of the problem and achieve consensus for action. Only when there is a consensus that the article is now cleaned up should you then de-list it by deleting the cleanup tag from the article, this causes the article to drop off the monthly cleanup-needed list page.
 * colspan="2" bgcolor="white" |
 * colspan="2" bgcolor="white" |

Discussion

 * }

In the first sentence of the first paragraph, we say that Apriori is for transactional data. Then we say that (all?) other algorithms are for finding association rules on non-transactional data, which is obviously not the case.

Also, is Apriori really for "learning association rules"? To me it looks like the output of the algorithm is a set of frequencies of data per transaction, not a set of association rules.

Also, Winepi and Minepi are not examples of data, they are algorithms, so, if included at all, they should be in brackets after the words "Other algorithms", and not after the words "data having no transactions".

Also, I can't figure out why we're mentioning timestamps and DNA sequencing in the last sentence of the first paragraph. First we say Apriori is for transactional data, then we say there are other algorithms for non-transactional data... as far as I can tell, that covers all the cases: transactional and non-transactional... so why do we go on to talk about data that has no timestamp? We are implying that data with no timestamp is neither transactional nor non-transactional, aren't we? So is this a Schroedinger's Cat kind of thing? (Sarcasm.) — Preceding unsigned comment added by 199.212.215.11 (talk) 17:53, 30 May 2012 (UTC)

Pruning missing: Although the description of the algorithm talks about pruning the candidate list, the pseudocode does not do any pruning after computing the join of L(k-1) with L(k-1). That is, all (k-1)-subsets of C(k) not present in L(k-1) can be removed from C(k). — Preceding unsigned comment added by Vaivaswatha (talk • contribs) 15:42, 22 March 2014 (UTC)

Vocabulary and definitions: in the referenced papers, as well as in the article, a definition is given for itemset but in the article the noun phrase item set is used instead. Also, a definition of support and large itemset are missing; Finally, the threshold C seems to be the support threshold epsilon mentioned later, so I think the notation should be consistent (and C is more confusing). If there's no objection on these changes, I'll make them. Stefpac (talk) 00:50, 24 November 2014 (UTC)

Programing code
You can edit or download sample code for this algorithm at:

1. Data Mining C# simple Apriori code by thsot In console mode 2.C# APRIORI ALGORITHM SOURCE CODE RELEASE - 2001 VERSION FOR MARKET BASKET ANALYSIS  Copyright by http://www.kdkeys.net/ Writer :thepbac-Vietnamese —Preceding unsigned comment added by Thepbac (talk • contribs) 04:53, 19 February 2009 (UTC)


 * Can someone write an example code in any programming language and post in this article? 64.128.27.82 (talk) —Preceding undated comment added 16:21, 2 November 2012 (UTC)

Large 1 - itemsets ?
What does this even mean? — Preceding unsigned comment added by 83.153.126.238 (talk) 12:22, 18 June 2016 (UTC)

Proposed Deletion (PROD)
This article doesn't feel encyclopedic. The Association Rules article seems to cover this topic sufficiently well. It appears that this article hasn't had any appreciable attention since RichardWeiss's refimprove from several years ago, to say nothing of the previous decade of obviously confused comments here in the talk page.

PERSONAL USER EXPERIENCE: Despite having used the algorithm several times, this article seems more confusing than helpful. Especially given the refimprove, the Association Rules article seems sufficient.

As such, I'm replacing refimprove with a PROD. Happy to be overruled though! Ipsherman (talk) 04:01, 13 February 2022 (UTC)

Example 2 - Mistake?
Either I'm missing something or there's a problem in Example 2. It says that both {1,3} and {1,4} are infrequent, and therefor they should be pruned. Both of those pairs appear in the first set {1,2,3,4}, so to my understanding based on the text that set should be pruned. If it is pruned, though, then there is only one instance of the triple {2,3,4} remaining, not two. If I am wrong, then I was mislead by the explanation and we need to fix that. But if I'm right then we need to fix the example. --Eliyahu S Talk 01:12, 27 November 2023 (UTC)