Talk:Gini coefficient/Archive 3

alleged predatory source?
You deleted a reference claiming it was a "predatory source". The reference in question is:




 * 1) How do you know this is "predatory publishing"?
 * 2) Even if it is, why should that matter in this case?

The reference you deleted includes the formula for the Gini coefficient for which it was cited. It also provides additional, useful commentary on how to estimate the Gini coefficient and standard errors.

More generally, the last I checked, Wikipedia did not require all citations to be to sources officially identified as "credible". I'm sure that supporters of people with power wish that Wikipedia were more restrictive in the sources it allows, because honest journalism often appears in disestablishment sources.

For example, Patricia Díaz-Rubio, General Manager of Wikimedia Chile, said in a presentation at Wikimania 2021 that the Latin American Wiki Human Rights campaign had made a material contribution to the success of the vote for rewriting the constitution of Chile, written under the Pinochet dictatorship by encouraging people to post photos of abuse of force by law enforcement to Wikimedia Commons and use them in articles in the Spanish-language Wikipedia.

By extension, are you saying that people should not be allowed to post photos or videos, e.g., of the George Floyd murder, to Wikimedia Commons, because it was not officially sanctioned? Or that the Wikipedia article on George Floyd should not be allowed to cite sources not blessed by the government?

In my judgment, excessive elimination of sources that are deprecated or are otherwise disliked by some is an attack on the free and open public debate that is essential to the progress of democracy. For more on this, see Managing conflict on Wikipedia and internationally.

Accordingly, I've reverted your deletion of that source. Thanks, DavidMCEddy (talk) 15:51, 5 January 2022 (UTC)
 * I've re-removed it, the Kogaion Publishing Center is a well-known predatory publisher, and European Academic Research is a well-known predatory journal. These do not meet our WP:RS standards. I fail to see what Wikimedia Chile's general manager's opinion of Pinochet has to do with our reliable sourcing policies. &#32; Headbomb {t · c · p · b} 16:09, 5 January 2022 (UTC)

lognormal v. "log normal" v. "log-normal"
The name of the Wikipedia article on the "log-normal" distribution is spelled with a dash or hyphen: It's not "lognormal" nor "log normal". If you think it should be something different, can you please propose a name change on Talk:Log-normal distribution?

Google just now found "about" 5.4 billion matches for "log-normal", 5.37 billion for "log normal", and 19.7 million for "lognormal". This suggest we keep "log-normal", though Wolfram and R use "log normal". In the past, I've preferred "lognormal", but that did returned less than 0.5 percent of the matches for either "log-normal" or "log normal".

In the meantime, I'm changing this reference to match that of the Wikipedia article. If you think it should be different without changing the name of the associated Wikipedia article, that would seem to require a different discussion.

Thanks for your efforts to improve "the sum of all human knowledge" freely available. DavidMCEddy (talk) 14:24, 20 February 2022 (UTC)

Thank you for pointing that out.

Borsycle1 — Preceding unsigned comment added by Borsycle1 (talk • contribs) 14:38, 20 February 2022 (UTC)

Actual Gini shape
I think the article should have an example Gini coefficient with a Lorenz curve from reality in it, they tend to leave out the possibility of oligarcy or of negative wealth. For instance here is what the wealth curve for the US in 2016 looks like. As a matter of interest I also wonder what the Global share of wealth by wealth group diagram which is second in the article should look like if it were to show rather than ignore negative wealth, should there be a fraction of people on the left going below the base line? NadVolum (talk) 07:42, 15 June 2022 (UTC)

Definition section.
Please refer to this text (as of 04/01/2022):


 * In terms of income-ordered population percentiles, the Gini coefficient is the cumulative shortfall from equal share of the total income up to each percentile. That summed shortfall is then divided by the value it would have in the case of complete equality.
 * If all people have non-negative income (or wealth, as the case may be), the Gini coefficient can theoretically range from 0 (complete equality) to 1 (complete inequality); it is sometimes expressed as a percentage ranging between 0 and 100.
 * If all people have non-negative income (or wealth, as the case may be), the Gini coefficient can theoretically range from 0 (complete equality) to 1 (complete inequality); it is sometimes expressed as a percentage ranging between 0 and 100.

In the first paragraph, the text was originally "inequality" and was later changed to "equality". [Revision as of 07:23, 15 March 2021 (Woodstone)] This does not seem correct to me, since complete equality equals 0, and you can not divide by 0. However, I can not reconcile this short definition with the other formulas provided, so I do not want to change this back.

Would someone with an understanding of the underlying concept please determine if the summed shortfall is divided by the value of complete "equality" or "inequality", and revert to "inequality" if needed.

Thank you Bobsd  (talk) 20:51, 1 April 2022 (UTC)


 * You're right. The meaning of 'it' in "That summed shortfall is then divided by the value it would have in the case of complete equality" is not clear and logically is wrong. I'll try and find a good source for a fix. NadVolum (talk) 07:40, 16 June 2022 (UTC)
 * As far as I can see Gini is normally defined in terms of the Lorenz curve and nothing like the definition in that paragraph is used. It isn't cited so I'll just delete it. NadVolum (talk) 07:54, 16 June 2022 (UTC)

An alternative formula for when the pdf is continuous
I believe that there should be an additional formula added for when p(x) is continuous. On the article for mean absolute difference, I added the additional formula for the continuos case as $$\mathrm{MD} = \int_{0}^\infty \int_{-\infty}^\infty 2\,f(x)\,f(x+\delta)\,\delta\,dx\,d\delta $$.The reason for this change is that it removes the absolute values that are in the original equation. The value of this is that you can utilize calculus to solve the mean absolute difference as well as the Gini coefficient. You can even use my proposed version of the formula to solve mean absolute differences and Gini coefficients using online integer calculators, which I have done.

The argument for why the two equations are equivalent is this. To calculate the mean absolute difference, we need to pick two random points, calculate the probability density of picking both of those points, multiply that probability by the distance between the two points, and then do this for all possible pairs of points. For our first two points, choose the points x and x + delta with delta > 0. The probability density of choosing the point x is p(x), where as the probability density of choosing point x + delta is p(x + delta). The probability density of picking the point x first and then x + delta second is p(x)*p(x + delta), which is the same probability as picking the point x + delta first and then the point x second. So in total, the probability density of picking both the points x and x + delta as your two points is 2*p(x)*p(x + delta). Also notice that the absolute difference between x and x + delta is just delta. So for any two points, 2*p(x)*p(x + delta)*delta represents the probability density of picking those two points, multiplied by the distance between those two points. To figure out the mean absolute difference amongst all points, we must integrate this equation across all starting points x and all possible distances from x, namely delta. Note that x can be any value from -infinity to +infinity, and that delta can be any value from 0 to infinity. So our equation for mean absolute difference is

$$\mathrm \int_{0}^\infty \int_{-\infty}^\infty 2\,f(x)\,f(x+\delta)\,\delta\,dx\,d\delta .$$

So the final equation for the Gini coefficient would be

$$\mathrm \frac{1}{2\mu}\int_{0}^\infty \int_{-\infty}^\infty 2\,f(x)\,f(x+\delta)\,\delta\,dx\,d\delta.$$

-Collin Paul Reitz, Null Simplex


 * I'm afraid no matter how good your work is it can't go in WIkipedia unless it is cited, otherwise what you have there is what is called WP:Original research on Wikipedia and is not allowed. NadVolum (talk) 07:17, 16 June 2022 (UTC)
 * Actually I'm happy with that formula bing at Mean absolute difference and this article reference that rather that duplicating too much from the other article. NadVolum (talk) 23:21, 12 August 2022 (UTC)

Please simplify!
Why isn't there a simple graph illustrating the Pareto’s 80:20 rule?… and extending it to (64:4), (51.2:0.8), etc. Also GINI/Lorenz curves (p-p plot) should exhibit the Pareto. And why no mention of the form of the eq for the GINI/Pareto of the simple form Y = 1-(1-x)^[log.8/log.2]? ~ JdelaF (talk) 07:04, 12 March 2023 (UTC)


 * Pareto principle is perhaps what you're looking for. A nice rule but not close enough for wealth either in the world or America where 80% is owned by perhaps 12%. Those distributions in the article are just people fitting curves and not based on models and nowhere near as close as the affine wealth model. NadVolum (talk) 15:29, 13 March 2023 (UTC)

Income and Wealth conflated?
Isn't income typically revenue (cash flow) and wealth assets (balance sheet)? Huge difference, yes? 71.231.158.70 (talk) 21:31, 22 February 2023 (UTC)
 * No one is saying they are the same. The Gini coefficient is a measure of disparity and you get different Gini coefficients for wealth and for income. NadVolum (talk) 15:33, 13 March 2023 (UTC)