Wikipedia:Reference desk/Archives/Mathematics/2020 May 8

= May 8 =

Secretary problem variations with costs
I was reading up on the secretary problem and thinking about how, in real life, there are costs (time, space, travel, etc.) incurred with each additional interview, which led me to the following variations of the problem: Consider a machine that has a cost each time it is run. Each time this machine is run, it generates a random integer between 1 and a known maximum M, inclusive, and then asks the user whether the user wants to accept that random value. If the user does not accept the value, then nothing happens. However, if the user accepts the value, then that machine gives that random integer as a payout and then self-destructs. What is the optimal strategy to maximize payout...:

(1) ...if the cost is a constant C each time the machine is run?

(2) ...if the cost is linearly increasing, starting at $1? That is, the machine costs $1 the first time it is run, $2 the second time it is run, $3 the third time it is run, and so forth.

(3) ...if the cost is a general function c(t) of the number of times t that the machine is run? (The previous two questions would then just be the c(t) = C and c(t) = t cases of this problem.)

—SeekingAnswers (reply) 20:24, 8 May 2020 (UTC)


 * I assume that you mean, maximize the expected value of payout minus cumulative cost. Any strategy for maximizing that will be an optimal strategy. For case (1) I got a result which I need to verify, but that will have to wait till another day. Obviously, if C ≥ M, the player should always accept in the very first round, regardless of the number drawn. They cannot gain but only lose by playing further. --Lambiam 23:21, 8 May 2020 (UTC)


 * Ah, yes, I meant maximize payout less the costs. Also, to clarify, one is allowed to not play. So for case (1), if C ≥ M, then not playing at all would be optimal in that case. —SeekingAnswers (reply) 02:45, 9 May 2020 (UTC)


 * In considering this, it is easy to succumb to the sunk cost fallacy, in which the cost that has already been incurred is weighed in decisions about future actions. This fallacy can work in two directions : a suboptimal decision to stay the course because otherwise the past cost will have been in vain, or a suboptimal decision to stop early because the cost incurred already exceeds possible future gain, although the expected value of future gain is positive so that at least some of the prior cost can be recouped. (I model a loss as a negative gain.) The decision whether to accept or continue should disregard the cost of earlier rounds. This applies to all versions of the game.


 * A strategy should tell the player at each round whether to accept the payout $$P$$ offered by the machine, or to reject it and continue. If payout $$P$$ is acceptable, then so is obviously any payout $$P^\prime$$ with $$P^\prime > P$$. Let $$A$$ stand for the lowest acceptable payout. So the strategy will be: accept when $$P \ge A$$, otherwise reject and continue.


 * In version (1), the cost for each round is constant, so the value of $$A$$ is the same for each round, only depending on the values of $$C$$ and $$M$$. Note that in versions with a constant maximal payout, including this version, $$A > M$$ does not make sense – here it would mean that the player always keeps playing, only incurring cost, never a payout. So we know that $$1 \le A \le M$$. We also assume that $$M > 1$$; otherwise we already know that $$A = 1$$. Let $$G(A)$$ stand for the expected gain, given the $$A$$-based strategy. If the machine offers a payout $$P$$ such that $$P < A$$, the player takes the loss $$C$$ of this round but continues to gain $$G(A)$$ in future rounds. The probability of rejection (assuming the machine is fair) equals $$\tfrac{A{-}1}{M}$$, the fraction of payouts to be rejected. Then with probability $$1-\tfrac{A{-}1}{M}$$ the payout offered will be acceptable. Each of the values $$A, A+1,...,M$$ being equally likely, the expected value of $$P$$ is then $$\tfrac{1}{2}(A+M)$$, from which $$C$$ is to be subtracted if we want to compute its contribution to $$G(A)$$. Combining this, we have :
 * $$G(A) = \tfrac{A{-}1}{M}(G(A)-C) + (1-\tfrac{A{-}1}{M})(\tfrac{1}{2}(A+M)-C).$$
 * Solving this equation for $$G(A)$$ results in:
 * $$G(A)= \frac{M^2 - (2C - 1) M - A(A {-} 1)}{2(M-A+1)}.$$
 * We need to determine what value of $$X$$ maximizes $$G(X)$$. If $$X$$ was a continuous quantity, we could just solve $$\frac{dG(X)}{dX}=0$$ for $$X$$, picking the appropriate root. In this discrete case, we reason as follows. If $$X < M$$ and $$G(X) < G(X{+}1)$$, then $$X$$ is unacceptable, since the player has to gain more by using $$X{+}1$$ instead. So the acceptable payouts are characterized by $$X = M$$ or $$G(X) \ge G(X{+}1)$$. We need to find the least value of $$X$$ satisfying the inequation. After simplification, the numerator of $$G(X) - G(X{+}1)$$ is
 * $$X^2-(2M+1)X+M(M-2C+1)) = (X-(M+\tfrac{1}{2}-R))(X-(M+\tfrac{1}{2}+R)),$$
 * where
 * $$R=\sqrt{2CM+\tfrac{1}{4}}.$$
 * The difference $$G(X) - G(X{+}1)$$ is nonnegative when $$X \ge M+\tfrac{1}{2}-R$$, so, since $$A$$ is a whole number, we find that the optimal strategy is given by the least integer $$X$$ in the range from $$1$$ to $$M$$ satisfying this inequation, which is:
 * $$A= \lceil M+\tfrac{1}{2}-R \rceil,$$
 * where $$\lceil\cdot\rceil$$ denotes the ceiling function. (When $$M+\tfrac{1}{2}-R$$ happens to be a whole number, it does not matter whether we choose $$A$$ to be equal to this number, as in the formula with the ceiling function, or its successor; both result in the same optimal value for $$G(A)$$. The lower choice has the advantage, only expressed implicitly in the mathematical model, that the player can go home sooner.) If the formula for $$A$$ results in a value less than $$1$$, use $$A=1$$ instead.  --Lambiam 17:59, 9 May 2020 (UTC)


 * For variant (2), linearly increasing cost, both the least acceptable payout in a round and the corresponding expected gain function depend on the value $$c$$ of the cost for that round. We incorporate this into model by adding subscripts $$c$$ to $$A$$ and $$G(\cdot)$$, so the strategy in the round with cost $$c$$ is to accept when the payout is at least $$A_c$$. We abbreviate $$G_c(A_c)$$, the expected gain under the optimal strategy, by $$G_c$$. As before, when $$c \ge M$$, we have $$A_c = 1$$, so, in particular, $$A_M = 1$$, and $$G_M = G_M(1) = \tfrac{1}{2}(M+1)-M = -\tfrac{1}{2}(M-1)$$. For $$c < M$$, we have, using the same line of reasoning as before,
 * $$G_c(X) = \tfrac{X{-}1}{M}G_{c{+}1} + (1-\tfrac{X{-}1}{M})(\tfrac{1}{2}(X+M))-c.$$
 * Then the numerator of $$G_c(X) - G_c(X{+}1)$$ is
 * $$X-G_{c+1},$$
 * The difference is nonnegative when $$X \ge G_{c+1}$$, so the least acceptable payout is now given by:
 * $$A_c= \lceil G_{c{+}1} \rceil,$$
 * with a lower bound of $$A_c= 1$$, as before, and
 * $$G_c = G_c(A_c).$$
 * This allows to calculate $$A_c$$ and $$G_c$$ backwards for $$c = M, M{-}1, M{-}2,..., 1$$. Because of the ceiling function, there is no pretty closed formula. Here are the computed values for $$M = 10$$:

c Ac  Gc   10  1  -4.500 9 1  -3.500    8  1  -2.500    7  1  -1.500    6  1  -0.500    5  1   0.500    4  1   1.500    3  2   2.550    2  3   3.710    1  4   5.013
 * --Lambiam 18:49, 11 May 2020 (UTC)


 * For the third variant, let the round-depended cost be given as a sequence $$c_0, c_1, c_2,...$$ of positive integers. The strategy will be determined by a corresponding sequence $$A_0, A_1, A_2,...$$ of integers in the range $$1$$ to $$M$$, denoting the least acceptable payout in each round. The sequence of functions $$G_{0}(\cdot), G_{1}(\cdot), G_{2}(\cdot),...$$ denotes the expected gain given a proposed acceptance threshold, assuming all later rounds will be played with an optimal strategy. We abbreviate $$G_i(A_i)$$ by $$G_i$$. Completely analogous to before,
 * $$G_i(X) = \tfrac{X{-}1}{M}G_{i{+}1} + (1-\tfrac{X{-}1}{M})(\tfrac{1}{2}(X+M))-c_i,$$
 * and
 * $$A_i= \lceil G_{i{+}1} \rceil,$$
 * again with a minimum of $$1$$.
 * If, for any round $$i$$, we have $$c_i = M$$, we know (as above) that $$A_i=1$$ and $$G_i=-\tfrac{1}{2}(M-1)$$. Then we can successively compute backwards as before for rounds $$i{-}1, i{-}2,..., 0$$.
 * If the costs remain below the maximum, pick some large index $$h$$ (h for horizon). We know limits on $$A_h$$ and $$G_h$$. For the $$A_h$$:
 * $$1 = A_h^{\mathrm{lo}} \le A_h \le A_h^{\mathrm{hi}} = \lceil M+\tfrac{1}{2} - \sqrt{2M+\tfrac{1}{4}}\rceil,$$
 * in which the upper bound corresponds to the most favourable case for the player, namely $$c_i = 1$$ for all $$i \ge h$$. Then also
 * $$G_h(A_h^{\mathrm{lo}}) = G_h^{\mathrm{lo}} \le G_h \le G_h^{\mathrm{hi}} = G_h(A_h^{\mathrm{hi}}).$$
 * We can then compute lower and upper bounds backwards. With some luck, they will coincide after a number of steps. Since $$A_i^{\mathrm{lo}} \le A_i \le A_i^{\mathrm{hi}}$$, when these bounds coincide at index $$0$$, we have the optimal strategy for rounds $$0$$ up to but not including the earliest point of divergence. If the bounds did not coincide yet when the index $$0$$ is reached, or a longer initial stretch is needed, repeat with a more distant horizon.  --Lambiam 19:04, 12 May 2020 (UTC)