Wikipedia:Reference desk/Archives/Mathematics/2007 April 24

= April 24 =

PrefixSpan Algorithm
Hello, I have a question regarding the PrefixSpan algorithm. This algorithm can be consulted in the following paper "Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach" by Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Helen Pinto.

My problem is with this sentence: "To avoid checking every possible combination of a potential candidate sequence, one can first fix the order of items within each element. Since items within an element of a sequence can be listed in any order, without loss of generality, one can assume that they are always listed alphabetically."

Furthermore in the definition of a prefix say that among other things a sequence A is only a prefix of B if each item of the last set of items of A is alphabetically before every missing item of the corresponding set of items in B. Then they go on how to define a suffix of a sequence given a prefix an then that a projection of a sequence database with regards to A is a the set of suffixes given A as prefix.

I don't understand how the assumption of alphabetical ordering does not result in loss of generality. For example given a database like:

a(abc)bd

a(ac)bdef

A frequent pattern with minimum support count 2 would be a(ac)bd (i.e. article a, followed by article a and c together then article b then article d. If we follow the lexicographical ordering within itemsets assumption we will fail to find this itemset since a(ac) is not a prefix of both sequences ? Or am I misreading something in the article.

Kind regards and thanks for your time.

62.48.159.19 09:27, 24 April 2007 (UTC)

Correlated random walks
Can you please include an article on correlated random walks, their definition, history and applications to date.


 * Are you talking about random walks that are correlated with each other somehow, or ones that are correlated with themselves (aka autocorrelated)? If the second, then it's not a fantastic article at the moment but I suspect you'd be looking at something like an ARIMA model with appropriate parameters. For history, I should have some good references about me somewhere but I can't find them, but for applications you'd be looking at things like X-12-ARIMA and similar packages which are used for modelling a large number of time series that have some kind of autocorrelation structure, and which are in use at many statistical agencies around the word (including, I would guess, many of those listed in Category:National statistical services). Confusing Manifestation 23:36, 25 April 2007 (UTC)

Non-random number generator
Someone who wishes to remain anonymous has a Klondike solitaire card game on their computer. Probability with random card ordering would predict that each ace would appear first, second, third or fourth with equal frequency, yet their observations (made over hundreds of games) show that the ace of hearts is the last of the four aces to appear in 75-80% of games. This includes a sequence of 8 consecutive games where the ace of hearts was the last ace. Why might this deviation from probability occur? What (if it was random) is the probability of the 8 consecutive games occurring? Can the error be corrected? Seans Potato Business 17:11, 24 April 2007 (UTC)


 * The probability of 8 games in a row is $$(1/4)^8 = 1/65536$$. However, if you ran many series of 8 games, then the probability that one of those series would have that pattern is then much higher.  I'd say the program used a rather poor method for selecting pseudo-random numbers.  I doubt if it can be fixed without rewriting the program.  There is something you can sometimes do called "setting the random number seed", but that would likely need to be done in the program.  You might want to post this on the Computer Ref Desk, too. StuRat 17:27, 24 April 2007 (UTC)


 * Than chance that you'd observe more than 40 games out of 100, with AH the last of the four is 1 in thousand. It becomes 1 in million for over 47. For 75, it is virtually zero. (See Binomial distribution.) Hence the probability generator of your Klondike game is definitely biased, assuming ur observations are correct. --Hirak 99 17:45, 24 April 2007 (UTC)
 * Just a quick observation, but you probably want to look at the chance of any specific ace being the last ace in 75% of the games. The reason is that you have to consider that if you were paying attention to the order aces came out you probably also would be suspicious if you noticed the Ace of Spades being last most of the time, or the Ace of Clubs or Ace of Diamonds.  Therefore there are actually four possible scenarios that would have set off a red flag in your head, and you just happened to witness one of those four.  The question then is what are the odds of a randomized deck delivering one of the four scenarios you would notice (which is I believe four times the probability of the Ace of Hearts specific scenario).  That probability is still really, really low, but technically not quite as low as the probabilities mentioned previously. Dugwiki 23:04, 24 April 2007 (UTC)


 * If they wish to remain anonymous, they should make sure the random number generator in their crypto software is better than the one in their card games. :) --TotoBaggins 17:09, 25 April 2007 (UTC)


 * See clustering illusion: you may be imagining a pattern where non exists. --h2g2bob 03:50, 26 April 2007 (UTC)