Talk:Isotonic regression

Pool-Adjacent Violators Algorithm

 * I've created this here since I'm not a user. I apologize but I do not recall my sources. I found something on Google (I think I searched "pool-adjacent violators" and it was the first hit), and borrowed it for my purposes. I suppose I could claim the exact syntax of the psuedocode is somewhat original, although it is based on something I read, so that some symbol conventions are clearly borrowed. I'll let people who care about this stuff fix it up. All I personally care about is the math itself, not the people behind it. Math transcends people. I think it is silly for people to claim to "own" math. Still, I am very grateful for all who came before me and all deserve credit, but I would rather fill my brain with ideas, rather than record who created them (waste of disk space).

Suppose we have a set of $$ n\ $$ numbers $$ X_1, ... , X_n\ $$ with weights $$ W_1 , ... , W_n\ $$. The goal of the algorithm is to optimally shift these X-values so that the result, which we'll call $$ Y_1\, ... , Y_n\ $$ , are monotone increasing; that is, $$ Y_1 \le Y_2 \le ... \le Y_n $$. The shifting takes into account the weights, so that values with larger weights are shifted less than values with smaller weights. It is optimal since it can be shown (I don't know how) that the weighted mean squared deviation of the X's from the Y's is minimimzed.

We start with the first value and proceed right until we reach the first pair of values that are not in order. Suppose we call these $$ X_i, X_{i+1}\ $$. Note that $$ X_i > X_{i+1}\ $$. Next, we "pool" these values and replace both with the weighted mean (call it $$ M\ $$ ) of the two values. Then, we proceed backward one value at a time, and check to see if anything that was in order before, has now become out of order. Suppose we reach a $$ j\ $$ where $$ X_j > M \ $$. Then we pool again, yet this time we pool all the values $$ X_j, ... , X_{i+1}\ $$ and replace them with their weighted mean (also, we redefine $$ M\ $$ to be this new mean). Eventually this backtracking will terminate, and then we can jump back to position $$ i+1\ $$, and proceed as we did initially. This back and forth process will terminate, leaving a monotone sequence.


 * A naive analysis of this algorithm is that it is $$O(n^2)$$. Is there a sharper analysis that shows it to be $$O(n)$$ as claimed in the main article? Eclecticos (talk) 17:17, 28 March 2021 (UTC)

Here is pseudocode

( Caution, it is UNTESTED!! )

For i = 1 to n Do    Y(i) = X(i) End For i = 1 to n-1 Do   If Y(i) > Y(i+1) Then Do     TotY = W(i) * Y(i)  +  W(i+1) * Y(i+1) TotW = W(i)        +  W(i+1) MeanY = TotY / TotW j = i - 1 While ( j > 0 and Y(j) > MeanY ) Do       TotY = TotY  +  W(j) * Y(j) TotW = TotW +  W(j) MeanY = TotY / TotW j = j - 1 End j = j + 1 Do k = j to i+1 Y(k) = MeanY End End End