Talk:Design matrix

Minor suggestions
Very good article! Regards, Herbmuell (talk) 06:26, 25 July 2015 (UTC)
 * "...data ON the independent variables ... observed data ON a response variable ..." -> ON sounds a bit funny to me.
 * "A notable feature of the concept of a design matrix is..." -> Why not simply "A notable feature of a design matrix is..." ?

Proposed merger of "Data Matrix" into this article
The proposed article for merging into this one is Data matrix (multivariate statistics).

In my experience the terms "data matrix" and "design matrix" are used interchangeably, and the article linked does not seem to state any real differentiation between the two concepts. The definitions seem to be the same. — Preceding unsigned comment added by Denziloe (talk • contribs) 22:58, 26 April 2016 (UTC)

"Data matrix" does not seem to be an established term, but a simple composition of "data" and "matrix". References on the page do not show that "data matrix" is a separate concept. Merging "Data Matrix" and "Design Matrix" seems to be a reasonable proposition.Akaravaev (talk) 19:12, 27 June 2017 (UTC)
 * Agreed and ✅ Klbrain (talk) 13:58, 1 February 2018 (UTC)


 * The point surely is that this article presents the design matrix, which is the matrix X in the equation
 * y ≈ X b
 * where a vector b of parameters is estimated from a vector y of observations of a single observable.  This is the equation presented in this article.


 * But in multivariate statistics, when there are many observables, that gives a matrix of observations Y, that are used to estimate a matrix of regression coefficients B,
 * Y ≈ X B
 * Here X is the design matrix. It encodes how the parameters or regression coefficients B are modelled to affect the observations Y.


 * Y is not the design matrix. It is the matrix of actual data observed, hence its name the data matrix.  It is important to keep clear the distinction between the two, underlined by their different names.


 * It's hard to follow why you think this article is sufficiently presenting the data matrix, when it doesn't cover multivariate regression at all. Jheald (talk) 22:04, 1 February 2018 (UTC)
 * If you look at the unopposed comments from Denziloe and Akaravaev (from 2016 and 2017), you can see that the key argument is that there is no evidence presented for the data matrix as a distinct and notable term. I accept that your use does show a distinction, but can you point to reliable sources which demonstrate that use? Adding those to the article would be helpful. Given the potential for confusion, perhaps the differences between data matrix and design matrix could be discussed on this page. Klbrain (talk) 22:17, 1 February 2018 (UTC)

Examples
Unless I’m missing something, the first example - calculating the arithmetic mean - states the wrong design matrix. It says that the design matrix needed would be a single column on ones - which cannot be true because that’d mean that the original data plays no part in the calculation! Instead wouldn’t the correct design matrix be a column vector containing x_0, x_2 … x_n, or the same divided by ‘n’ (depending upon what beta is desired to be). BrianOfRugby (talk) 11:16, 7 March 2023 (UTC)