Bivariate analysis

Bivariate analysis is one of the simplest forms of quantitative (statistical) analysis. It involves the analysis of two variables (often denoted as X, Y), for the purpose of determining the empirical relationship between them.

Bivariate analysis can be helpful in testing simple hypotheses of association. Bivariate analysis can help determine to what extent it becomes easier to know and predict a value for one variable (possibly a dependent variable) if we know the value of the other variable (possibly the independent variable) (see also correlation and simple linear regression).

Bivariate analysis can be contrasted with univariate analysis in which only one variable is analysed. Like univariate analysis, bivariate analysis can be descriptive or inferential. It is the analysis of the relationship between the two variables. Bivariate analysis is a simple (two variable) special case of multivariate analysis (where multiple relations between multiple variables are examined simultaneously).

Bivariate Regression
Regression is a statistical technique used to help investigate how variation in one or more variables predicts or explains variation in another variable. Bivariate regression aims to identify the equation representing the optimal line that defines the relationship between two variables based on a particular data set. This equation is subsequently applied to anticipate values of the dependent variable not present in the initial dataset. Through regression analysis, one can derive the equation for the curve or straight line and obtain the correlation coefficient.

Simple Linear Regression
Simple linear regression is a statistical method used to model the linear relationship between an independent variable and a dependent variable. It assumes a linear relationship between the variables and is sensitive to outliers. The best-fitting linear equation is often represented as a straight line to minimize the difference between the predicted values from the equation and the actual observed values of the dependent variable.

Equation: $$y = mx +b$$

$$x$$: independent variable (predictor)

$$y$$: dependent variable (outcome)

$$m$$: slope of the line

$$b$$: $$y$$-intercept

Least Squares Regression Line (LSRL)
The least squares regression line is a method in simple linear regression for modeling the linear relationship between two variables, and it serves as a tool for making predictions based on new values of the independent variable. The calculation is based on the method of the least squares criterion. The goal is to minimize the sum of the squared vertical distances (residuals) between the observed y-values and the corresponding predicted y-values of each data point.

Bivariate Correlation
A bivariate correlation is a measure of whether and how two variables covary linearly, that is, whether the variance of one changes in a linear fashion as the variance of the other changes.

Covariance can be difficult to interpret across studies because it depends on the scale or level of measurement used. For this reason, covariance is standardized by dividing by the product of the standard deviations of the two variables to produce the Pearson product–moment correlation coefficient (also referred to as the Pearson correlation coefficient or correlation coefficient), which is usually denoted by the letter “r.”

Pearson’s correlation coefficient is used when both variables are measured on an interval or ratio scale. Other correlation coefficients or analyses are used when variables are not interval or ratio, or when they are not normally distributed. Examples are Spearman’s correlation coefficient, Kendall’s tau, Biserial correlation, and Chi-square analysis.

Three important notes should be highlighted with regard to correlation:


 * The presence of outliers can severely bias the correlation coefficient.
 * Large sample sizes can result in statistically significant correlations that may have little or no practical significance.
 * It is not possible to draw conclusions about causality based on correlation analyses alone.

When there is a dependent variable
If the dependent variable&mdash;the one whose value is determined to some extent by the other, independent variable&mdash; is a categorical variable, such as the preferred brand of cereal, then probit or logit regression (or multinomial probit or multinomial logit) can be used. If both variables are ordinal, meaning they are ranked in a sequence as first, second, etc., then a rank correlation coefficient can be computed. If just the dependent variable is ordinal, ordered probit or ordered logit can be used. If the dependent variable is continuous&mdash;either interval level or ratio level, such as a temperature scale or an income scale&mdash;then simple regression can be used.

If both variables are time series, a particular type of causality known as Granger causality can be tested for, and vector autoregression can be performed to examine the intertemporal linkages between the variables.

When there is not a dependent variable
When neither variable can be regarded as dependent on the other, regression is not appropriate but some form of correlation analysis may be.

Graphical methods
Graphs that are appropriate for bivariate analysis depend on the type of variable. For two continuous variables, a scatterplot is a common graph. When one variable is categorical and the other continuous, a box plot is common and when both are categorical a mosaic plot is common. These graphs are part of descriptive statistics.