Talk:Stratified sampling

Untitled
The real-world example below does seem correct: A real-world example of using stratified sampling would be for a US political survey. If we wanted the respondents to reflect the diversity of the population of the United States, the researcher would specifically seek to include participants of various minority groups such as race or religion, based on their proportionality to the total population as mentioned above. A stratified survey could thus claim to be more representative of the US population than a survey of simple random sampling or systematic sampling. The reason is because old polls used to have these quotas for religions and income, etc, and the accuracy of these polls were far worse than those from Gallop using simple random sampling. Using simple random sampling with sufficient sample sizes results in a representative sample.

somebody please define the diffrence between stratified and clustered random sampling??? —Preceding unsigned comment added by 65.95.199.124 (talk) 15:51, 25 June 2008 (UTC)

Shouldn't the title be change to "Stratified sample" to refelct the article "Simple random sample"? — Preceding unsigned comment added by 212.56.102.224 (talk) 09:14, 28 March 2007 (UTC)

- In the article: The reasons to use stratified sampling rather than simple random sampling include[2] 1. If measurements within strata have lower standard deviation, stratification gives smaller error in estimation.

But I read in the reference: https://onlinecourses.science.psu.edu/stat506/node/27/

Stratification may produce a smaller error of estimation than would be produced by a simple random sample of the same size. This result is particularly true if measurements within strata are very homogeneous.

In my interpretation this is a whole other statement.

Paul Nollen (talk) 17:06, 15 July 2018 (UTC)

Lack of clarity about what random variable is used
The first mathematical formulae in the article are introduced with the phrase "The mean and variance of stratified random sampling". However "mean" and "variance" are quantities that are defined for random variables, not for sampling procedures. Unless a sampling procedure employs a unique and universally understood random variable, it doesn't make sense to refer to the mean and variance of that procedure.

In the case of stratified sampling, there are examples on the internet (e.g. https://jkim.public.iastate.edu/teaching/book5.pdf Example 5.1) where the random variable associated with stratified sampling lacks the factor of 1/N that is suggested by formula given in the article. The article would be improved by stating explicitly what random variables are involved.

In particular, the meaning of $$ s^2_h $$ should be explained by stating what random variable it refers to. It appears to be the variance of the random variable defined by taking a random sample of size 1 taken from (the entire population of) strata $$h$$ (as opposed to the variance of the random variable that is the mean value of a sample of size $$ n_h $$ taken from that population.

Tashiro~enwiki (talk) 12:43, 16 May 2019 (UTC)

Possible citation for math in this article
The results in this article are all found in the book "Sampling Techniques, Third Edition" by William G. Cochran (https://www.wiley.com/en-us/Sampling+Techniques%2C+3rd+Edition-p-9780471162407) I have no experience editing Wikipedia pages, but perhaps someone else with access to the book and knowledge of how to edit could follow up on this comment. WpeditTscott (talk) 18:58, 30 July 2023 (UTC)