Talk:Point-biserial correlation coefficient

I suspect the bad link to "biserial correlation" in the first paragraph is actually supposed to link to "bivariate correlation", but I'm reluctant to change it because I don't understand the preceding "the new dichotomous variable may be conceptualized as having an underlying continuity" and, moreover, biserial and bivariate seem to be the same thing. It's just that in biserial one of the variables can only be zero or one - as in zero for the control group and 1 for the experimental group to get an effect size in terms of r.   — Preceding unsigned comment added by 160.39.69.88 (talk) 17:00, 7 December 2019 (UTC)

This is wrong. The point biserial correlation coefficient is still important today in the field of Psychometrics. —Preceding unsigned comment added by 128.97.86.17 (talk • contribs) 08:57, 21 June 2006

I would like to suggest to delete the "external information" link, since the formula for r_pb on that page is wrong. That is, it makes exactly the mistake I am warning about. Kmir78 01:50, 18 January 2007 (UTC)

Do you have a source with the correct formula? MrArt 05:43, 18 January 2007 (UTC)

Glass and Hopkins (Statistical Methods in Education and Psychology (3rd Edition)) have the correct formula, but I could also easily put a derivation from the normal formula in here. I don't know any online source. Kmir78 04:48, 19 January 2007 (UTC)

Wrong formula?
It is not easy to just say that a formula is wrong if one doesn't know its meaning. Oe should at least distinguish between population and sample. In the population the coefficient is a parameter and in the sample an estimator of this parameter.

Parameter:
 * $$\rho = \frac{\mu_1-\mu_0}{\sigma_Y}\sqrt{p(1-p)},$$

with
 * $$\mu_x=E(Y|X=x)$$

and
 * $$p=P(X=1)$$

Estimator
 * $$r= \frac{M_1-M_0}{S_Y}\sqrt{\frac{N_1(n-N_1)}{n(n-1)}},$$

where M1,M0 are the sample means of Y for X=1 and X=0 and N1 the number of Y's with X=1 and SY is the usual sample standard deviation "with n-1 in the denominator".

If one takes for SY the sample standard deviation "with n the denominator". ", the formula reads:


 * $$r= \frac{M_1-M_0}{nS_Y}\sqrt{N_1(n-N_1)},$$

Even in the case where SY is the usual sample standard deviation "with n-1 in the denominator" the formula:
 * $$r= \frac{M_1-M_0}{nS_Y}\sqrt{N_1(n-N_1)},$$

gives a good estimator of &rho;.Nijdam 14:07, 24 January 2007 (UTC)

Asymptotically, both formulas will yield the same result. But as far as I know, $$r_{pb}$$ is supposed to equal $$r_{XY}$$ in the sample. $$r_{XY}$$ is independent of the denominator $$n$$ or $$n-1$$ and therefore, $$r_{pb}$$ with 'some $$n$$' stuck in there somewhere instead of $$n-1$$ will not equal $$r_{XY}$$. Kmir78 06:40, 28 January 2007 (UTC)

significance
Information on how to determine the significance of a pb correlation would be useful. 208.253.150.100 02:02, 23 April 2007 (UTC)

I have included a note on assessing the significance of a pb correlation by relating it to Student's t distribution.

On the question of using n-1 rather than n as a denominator, the former should be used to estimate the population standard deviation and the latter to calculate the sample standard deviation. The pb coefficient however depends on the ratio between the standard deviations of X and Y. Clearly, if n-1 is used in both cases (and I don't see how you could justify using it for one but not the other), they just cancel out and the formula remains unchanged. In what sense then can either version of the formula be regarded as "wrong"?

I don't have a copy of the Glass and Hopkins book. What is the "right" formula which they give? Ted7815 (talk) 10:04, 28 March 2008 (UTC)

biserial correlation
The assumption of the biserial correlation is not the X is normal, but that the variable underlying Y is normal — Preceding unsigned comment added by Mcfanda (talk • contribs) 16:56, 22 December 2011 (UTC)