Atkinson–Stiglitz theorem

The Atkinson–Stiglitz theorem is a theorem of public economics. It implies that no indirect taxes need to be employed where the utility function is separable between labor and all commodities. Non-linear income taxation can be used by the government and was in a seminal article by Joseph Stiglitz and Anthony Atkinson in 1976. The Atkinson–Stiglitz theorem is an important theoretical result in public economics, spawning a broad literature that delimited the conditions under which the theorem holds. For example, Emmanuel Saez, a French-American professor and economist demonstrated that the Atkinson–Stiglitz theorem does not hold if households have heterogeneous preferences rather than homogeneous ones.

In practice, the Atkinson–Stiglitz theorem has often been invoked in the debate on optimal capital income taxation. As capital income taxation can be interpreted as the taxation of future consumption over the taxation of present consumption, the theorem implies that governments should abstain from capital income taxation if non-linear income taxation were an option since capital income taxation would not improve equity by comparison to the non-linear income tax, while additionally distorting savings.

Optimal taxation
For an individual whose wage is $$ w $$, the budget constraint is calculated by
 * $$ \sum_{j} q_{j} x_{j} = \sum_{j}  (x_{j} + t_{j} (x_{j} ) )  = wL - T(wL)  \;,  $$

where $$ q_{i} $$ and $$ x_{i} $$ are the price and the purchase of the $$i$$-th commodity, respectively. To maximize the utility function, the first-order condition is:
 * $$ U_{j} = \frac{ (1 + t'_{j} ) (-  U_{L} )   }{w (1 - T')}  \;   (j=1,2, ..., N).  $$

The government maximizes the social welfare function, and so
 * $$ \int^{\infty }_{0 } \left[ wL - \sum_{j } x_{j} -  \overline{R}    \right] d F = 0 \; .  $$

Then a density function $$ f $$ is used to express the Hamiltonian:
 * $$ H = \left[ G(U) - \lambda \left\lbrace wL - \sum_{j } x_{j} - \overline{R}  \right\rbrace  \right]f - \mu \theta U_{L}  \; .    $$

Taking its variation about $$ x_{j} $$, the condition for its maximum is used.
 * $$ - \lambda \left[ \left( \frac{\partial x_{1} }{\partial x_{j } }  \right)_{U} + 1  \right] - \frac{\mu \theta }{f} \left[ \frac{\partial^{2} U}{\partial x_{1} \partial L} \left( \frac{\partial x_{1} }{\partial x_{j} } \right)_{U }  + \frac{\partial^{2} U}{\partial x_{j} \partial L} \right] = 0  \; .  $$

Then the following relation holds:
 * $$ \left( \frac{\partial x_{1} }{\partial x_{j} } \right)_{U} =  - \frac{U_{j} }{ U_{1} } =  - \frac{1 + t'_{j} }{ 1 + t'_{1} } \; .  $$

Substituting this relation into the above condition yields:
 * $$ \lambda \left[ \frac{1 + t'_{j} }{1 + t'_{1} } - 1  \right] = \frac{\mu \theta U_{j} }{f } \left[ \frac{ \partial^{2} U   }{\partial L \partial x_{j} }  \cdot \frac{1 }{ U_{j} } - \frac{\partial^{2} U}{\partial L \partial x_{1}} \cdot \frac{1 }{U_{1}}   \right]

=  \frac{\mu \theta U_{j} }{f } \frac{\partial }{\partial L} \left(  \ln{U_{j}} - \ln{U_{1}}  \right)  \; ,$$ and the following is obtained:
 * $$ \lambda \left[ \frac{1 + t'_{j} }{1 + t'_{1} } - 1  \right] =  \frac{\mu \theta U_{j} }{f } \frac{\partial }{\partial L} \left( \ln{ \frac{U_{j} }{U_{1}}  }  \right)  \; . $$

Note that there is no loss of generality in setting $$ t'_{1} $$ zero, therefore $$ t'_{1} = 0 $$ is put. Since $$ U_{j} = (1 + t'_{j} ) \alpha $$,
 * $$ \frac{t'_{j } }{1 + t'_{j} } = \frac{\mu \theta \alpha }{\lambda f }  \frac{\partial }{\partial L} \left( \ln{ \frac{U_{j} }{U_{1}}  }  \right)  \; . $$

Thus, no indirect taxation needs to be employed, i.e. $$ t_{j} = 0 $$, provided that the utility function is weakly separable between labor and all consumption goods.

Other Approaches
Joseph Stiglitz explains why indirect taxation is unnecessary, viewing the Atkinson–Stiglitz theorem from a different perspective.

Basic concepts
Suppose that those who are in category 2 are the more able. Then, two conditions are imposed for Pareto efficient taxation at which a government aims. The first condition is that the utility of category 1 is equal to or more than a given level:
 * $$ \overline{U}_{1} \le  V_{1} (C_{1}, Y_{1} ) \quad .    $$

The second condition is that the government revenue $$ R $$, which is equal to or more than the revenue requirement $$ \overline{R} $$, is increased by a given amount:
 * $$ R = -(C_{1} - Y_{1}) N_{1} - (C_{2} - Y_{2} ) N_{2} \;, $$
 * $$ \overline{R} \le R \;, $$

where $$ N_{1} $$ and $$ N_{2} $$ indicate the number of individuals of each type. Under these conditions, the government needs to maximize the utility $$ V_{2} (C_{2}, Y_{2 }) $$ of category 2. Then writing down the Lagrange function for this problem:
 * $$ \mathcal{L} = V_{2}(C_{2},Y_{2}) + \mu V_{1}(C_{1},Y_{1}) + \lambda_{2 } ( V_{2} (C_{2}, Y_{2}) - V_{2} (C_{1},Y_{1} )   )

+ \lambda_{1 } ( V_{1} (C_{1}, Y_{1}) - V_{1} (C_{2},Y_{2 } )  ) + \gamma \left( - (C_{1} - Y_{1}) N_{1} - (C_{2} - Y_{2} ) N_{2} - \overline{R}   \right) \;,  $$ ensuring the satisfaction of the self-selection constraints, the first-order conditions are:
 * $$ \mu \frac{\partial V_{1}}{\partial C_{1}} - \lambda_{2} \frac{\partial V_{2} }{\partial C_{1} }  + \lambda_{1} \frac{\partial V_{1} }{\partial C_{1} }

- \gamma N_{1} = 0 \;, $$
 * $$ \mu \frac{\partial V_{1}}{\partial Y_{1}} - \lambda_{2} \frac{\partial V_{2} }{\partial Y_{1} }  + \lambda_{1} \frac{\partial V_{1} }{\partial Y_{1} }

+ \gamma N_{1} = 0 \;, $$
 * $$ \frac{\partial V_{2}}{\partial C_{2}} + \lambda_{2} \frac{\partial V_{2} }{\partial C_{2} }  - \lambda_{1} \frac{\partial V_{1} }{\partial C_{2} }

- \gamma N_{2} = 0 \;, $$
 * $$ \frac{\partial V_{2}}{\partial Y_{2}} + \lambda_{2} \frac{\partial V_{2} }{\partial Y_{2} }  - \lambda_{1} \frac{\partial V_{1} }{\partial Y_{2} }

+ \gamma N_{2} = 0 \;. $$

For the case where $$ \lambda_{1}=0 $$ and $$ \lambda_{2}=0 $$:
 * $$ \frac{\partial V_{i} / \partial Y_{i} }{\partial V_{i} / \partial C_{i}  } + 1 = 0 \;,  $$

for $$ i=1,2 $$, therefore the government can achieve a lump-sum taxation. For the case where $$ \lambda_{1}=0 $$ and $$ \lambda_{2}>0 $$:
 * $$ \frac{\partial V_{2} / \partial Y_{2} }{\partial V_{2} / \partial C_{2}  } + 1 = 0 \;,  $$

the marginal tax rate for category 2 is zero. As to category 1:
 * $$ \frac{\partial V_{1} / \partial Y_{2} }{ \partial V_{1} / \partial C_{1} } = -

\frac{1 - \lambda_{2} (\partial V_{2} / \partial Y_{1}) / N_{1} \gamma }{1 + \lambda_{2} (\partial V_{2} / \partial C_{1} ) / N_{1} \gamma } \;. $$ If $$ \delta_{i} = \frac{\partial V_{i} / \partial Y_{1}}{\partial V_{i} / \partial C_{1}} \;, \quad (i=1,2)  $$, the marginal tax rate for category 1 is $$ \delta_{1} + 1 $$.

Also, note the following equation:
 * $$ \delta_{1} = - \left( \frac{1 - \nu \delta_{2} }{1 + \nu} \right)  \;,  $$

where $$ \nu $$ is denoted by:
 * $$ \nu = \frac{ \lambda_{2} (\partial V_{2} / \partial C_{1}) }{ N_{1} \gamma }   \; . $$

Therefore, by assumption, $$ \delta_{1} < \delta_{2} $$, and so $$ -1 <  \delta_{1} < \delta_{2} $$ can be directly proven. Accordingly, the marginal tax rate for category 1 is positive.

For the case where $$ \lambda_{1}>0 $$, and $$ \lambda_{2} = 0 $$, the marginal tax rate for category 2 is negative. The lump-sum tax imposed on an individual of category 1 would become larger than that for category 2 if the lump-sum tax were feasible.

Various commodities
Consider a case where income level and several commodities are observable. Each individual's consumption function is expressed in a vector form as:
 * $$ \textbf{C}_{1} = \sum_{j} C_{1j} \textbf{e}_{j}      $$
 * $$ \textbf{C}_{2} = \sum_{j} C_{2j} \textbf{e}_{j} \; . $$

In this case, the government's budget constraint is:
 * $$ R \leq \sum^{2}_{k=1} ( Y_{k} N_{k} )  -  N_{1} \sum_{j} C_{1j}  - N_{2} \sum_{j} C_{2j}   \; .  $$

Then:
 * $$ \mu \frac{\partial V_{1}}{\partial C_{1j}} - \lambda_{2} \frac{\partial V_{2} }{\partial C_{1j} }  + \lambda_{1} \frac{\partial V_{1} }{\partial C_{1j} }  -  \gamma N_{1} = 0 \;, $$
 * $$ \mu \frac{\partial V_{1}}{\partial Y_{1}} - \lambda_{2} \frac{\partial V_{2} }{\partial Y_{1} }  + \lambda_{1} \frac{\partial V_{1} }{\partial Y_{1} }  +  \gamma N_{1} = 0 \;, $$
 * $$ \frac{\partial V_{2}}{\partial C_{2j}} + \lambda_{2} \frac{\partial V_{2}  }{\partial C_{2j} }  - \lambda_{1} \frac{\partial V_{1} }{\partial C_{2j} } -  \gamma N_{2} = 0 \;, $$
 * $$ \frac{\partial V_{2}}{\partial Y_{2}} + \lambda_{2} \frac{\partial V_{2} }{\partial Y_{2} }  - \lambda_{1} \frac{\partial V_{1} }{\partial Y_{2} }

+ \gamma N_{2} = 0 \;. $$ Here, $$ \lambda_{1}=0 $$ and $$ \lambda_{2} > 0 $$. Therefore, it follows that:
 * $$ \frac{ \frac{\partial V_{2}}{\partial C_{2j} } }{ \frac{\partial V_{2}}{\partial C_{2n} }  } = 1 \;, \quad

\frac{ \frac{\partial V_{2}}{\partial C_{2j} } }{ \frac{\partial V_{2}}{\partial Y_{2}}   } = 1  \;. $$ Suppose all individuals have the same indifference curve in C-L plane. The separability between leisure and consumption can be expressed as:

$$ \frac{\partial^{2} U_{k} }{\partial C_{kj} \partial L_{k} } = 0 \;,  $$ yielding
 * $$ \frac{\partial V_{1}}{\partial C_{1j}} = \frac{\partial V_{2}}{\partial C_{1j}} \; . $$

As a result:
 * $$ \frac{ \frac{\partial V_{1}}{\partial C_{1j} } }{ \frac{\partial V_{1}}{\partial C_{1n} }  } = 1  \;  . $$

Thus, Stiglitz stated it is unnecessary to impose taxes on commodities.

Conditions for randomization
Consider a scenario in which individuals with high abilities, who typically earn higher incomes as a reflection of their skills, downplay their abilities. In this case, it could be argued that the government needs to randomize the taxes imposed on the low ability individuals, to increase the effectiveness of screening. It is possible that under certain conditions the taxes can be randomized without damaging the low-ability individuals. For the case where an individual chooses to show their ability, a tax schedule is related to $$ \lbrace C^{*}_{2}, Y^{*}_{2} \rbrace $$. For the case where an individual chooses to hide their ability, there are two tax schedule possibilities: $$ \lbrace C^{*}_{1}, Y^{*}_{1} \rbrace $$ and $$ \lbrace C^{**}_{1}, Y^{**}_{1} \rbrace  $$. The randomization is done so that the risk of the former case should differ from that of the latter.

To avoid hitting the low ability group, the mean consumption must be shifted upwards at each $$ Y $$. As the consumption is maximized, a higher $$ \overline{C}_{1} $$ is set for a higher $$ \overline{Y}_{1} $$. Then the relations between those variables are:
 * $$ C^{*}_{1} = \overline{C}_{1} + h \;,  \quad  Y^{*}_{1} = \overline{Y}_{1} + \lambda h $$
 * $$ C^{**}_{1} = \overline{C}_{1} - h \;, \quad Y^{**}_{1} = \overline{Y}_{1} - \lambda h   \; . $$

The utility function is $$ V_{2} ( C^{*}_{1}, Y^{*}_{1} ) $$ and $$ V_{2} (C^{**}_{1}, Y^{**}_{1} ) $$, therefore the condition for the optimum is:
 * $$ V_{2C^{*} } (d \overline{C}_{1} + d h ) + V_{2 Y^{*}} (d \overline{Y}_{1} + \lambda d h) + V_{2 C^{**}} (d \overline{C}_{1 }   - d h  ) + V_{2 Y^{**} } (d \overline{Y}_{1} - \lambda d h ) = 0  \;,  $$

and likewise:
 * $$ V_{1C^{*} } (d \overline{C}_{1} + d h ) + V_{1 Y^{*}} (d \overline{Y}_{1} + \lambda d h) + V_{1 C^{**}} (d \overline{C}_{1 }   - d h  ) + V_{1 Y^{**} } (d \overline{Y}_{1} - \lambda d h ) = 0  \; .   $$

And accordingly:

\begin{bmatrix} SV_{2 C} & SV_{2 Y} \\ SV_{1 C} & SV_{1 Y} \end{bmatrix} \begin{bmatrix} d \overline{C} \\ d \overline{Y} \end{bmatrix} = - \begin{bmatrix} DV_{2C} + \lambda DV_{2 Y} \\ DV_{1C} + \lambda DV_{1 C} \end{bmatrix} d h  \;, $$ where $$ SV_{k C} = V_{k C^{*} } + V_{k C^{**}} $$ and $$ SV_{k Y} = V_{k Y^{*} } + V_{k Y^{**} } $$ and $$ k=1,2 $$. Similarly, $$ DV_{k C} = V_{k C^{*} } - V_{k C^{**}} $$ and $$ DV_{k Y} = V_{k Y^{*} } - V_{k Y^{**}} $$.

Then:
 * $$ \lim_{h \rightarrow 0} \frac{d (\overline{Y} - \overline{C} ) }{d h} = \frac{ F_{1} - F_{2} }{ (-2) ( MRS_{1} - MRS_{2}) }  \;,   $$

where $$ MRS_{k} = - ( \frac{\partial V_{k}}{\partial C_{1}} )^{-1} \frac{\partial V_{k} }{\partial Y_{1}} $$. As to $$ F_{1}, F_{2} $$ are denoted by $$ F_{1} = ( \frac{\partial V_{2}}{\partial C_{1}} )^{-1} M_{2}  (1 - MRS_{1}) $$ and $$ F_{2} = ( \frac{\partial V_{1}}{\partial C_{1}} )^{-1} M_{1}   (1 - MRS_{2}) $$. Also, $$ M_{k} $$ is defined by $$ M_{k} = DV_{k C} +  \lambda D V_{k Y} $$. The first derivative of $$ \overline{Y} - \overline{C} $$ with regard to $$ h $$, at $$ h=0 $$, is zero because $$ M_{k}=0 $$, and so its second derivative needs to be calculated.


 * $$ \frac{d^{2} ( \overline{Y} - \overline{C} )}{d h^{2}} = H_{1} + H_{2}  \;, $$

where $$ H_{1} = \frac{d(F_{1} - F_{2})}{dh} \frac{1}{-2( MRS_{1} - MRS_{2} ) } $$ and $$ H_{2} = (-1) \frac{ d( \overline{Y} - \overline{C} ) }{ d h } \frac{d \ln{(-2)( MRS_{1} - MRS_{2} )} }{d h} $$. And so $$ H_{2} $$ disappears at $$  h = 0 $$. Then:
 * $$ \frac{d^{2} ( \overline{Y} - \overline{C} )}{d h^{2}} = \frac{I_{1} + I_{2}}{ (-1) (MRS_{1} - MRS_{2}) } \; \; . $$
 * $$ I_{1} =      ( V_{2CC} + 2 \lambda V_{2 CY} +  \lambda^{2} V_{2 Y Y }   )   ( \frac{\partial V_{2}}{\partial C_{1}} )^{-1} (1 - MRS_{1}) $$
 * $$ I_{2} = (-1) ( V_{1CC} + 2 \lambda V_{1 CY} +  \lambda^{2} V_{1 Y Y }   )   ( \frac{\partial V_{1}}{\partial C_{1}} )^{-1} (1 - MRS_{2}) $$

Since $$ MRS_{2} < MRS_{1} < 1 $$, the condition under which randomization is desirable is calculated:
 * $$ (V_{2CC} + 2 \lambda V_{2 CY} + \lambda^{2} V_{2 Y Y }) ( V_{1 C_{1} } + V_{2 Y_{1} }  )  -

(V_{1CC} + 2 \lambda V_{1 CY} + \lambda^{2} V_{2 Y Y } ) ( V_{2 C_{1} } + V_{2 Y_{1} }  ) < 0  \;. $$