Talk:Panel data

I'm not overly concerned one way or the other, however, the last edit is unnecessary. "Longitudinal data" is time-series data. See any of multiple definitions on the web. For example, http://www.cacr.ca/news/2002/0204pahwa.htm Wikiant 17:47, 31 January 2006 (UTC)

Removed this from External Links section because it leads to a 404 error: Lingamer8 16:46, 31 January 2007 (UTC)
 * SLID

I second that Time Series is not necessarily one-dimensional. See Multiple-Time-Series. — Preceding unsigned comment added by 195.176.26.99 (talk) 15:57, 6 July 2011 (UTC)

the examples of panels above do not match the matrix description below: "A panel has the form x_it =.." In fact the panels above cannot be described with one matrix only, as there are more than one variables contained. (sex, age, income). So the statement "A panel has the form [...]" is wrong. — Preceding unsigned comment added by 213.23.17.234 (talk) 13:09, 3 November 2011 (UTC)

$$\nu_{it}$$ is currently undefined for fixed effects model. — Preceding unsigned comment added by 158.129.140.71 (talk) 12:44, 27 August 2014 (UTC)

Dr. Mora's comment on this article
Dr. Mora has reviewed this Wikipedia page, and provided us with the following comments to improve its quality:

"Is important include pseudopanel theory, selectio bias in panel..and selection bias in pseudopanel"

We hope Wikipedians on this talk page can take advantage of these comments and improve the quality of the article accordingly.

Dr. Mora has published scholarly research which seems to be relevant to this Wikipedia article:


 * Reference: Jhon James Mora Rodríguez and Juan Muro (2015). Consistent Estimation in Pseudo Panels in the Presence of Selection Bias. Economics: The Open-Access, Open-Assessment E-Journal, 8 (2014-43): 1—25. http://dx.doi.org/10.5018/economics-ejournal.ja.2014-43

ExpertIdeasBot (talk) 21:30, 21 May 2016 (UTC)

Dr. Sul's comment on this article
Dr. Sul has reviewed this Wikipedia page, and provided us with the following comments to improve its quality:

"1. Need to discuss why the panel data have become of interest more. Usually panel information helps to identify hidden parameters which cannot be identify either by using only cross sectional or time series information. For example, factor structure cannot be identified in univariate time series data. 2. Example: This section should include how the panel method helps to identify and estimate the hidden parameters. Including difference in difference and factor structure would be good examples. 3. Analysis of panel data: Too short. Should be deleted, or provide extend coverage. 4. Data Sets which have a panel design: All examples are for the micro panel. Need to mention why many countries need to construct the panel survey."

We hope Wikipedians on this talk page can take advantage of these comments and improve the quality of the article accordingly.

Dr. Sul has published scholarly research which seems to be relevant to this Wikipedia article:


 * Reference : Peter C.B. Phillips & Donggyu Sul, 2007. "Transition Modeling and Econometric Convergence Tests," Cowles Foundation Discussion Papers 1595, Cowles Foundation for Research in Economics, Yale University.

ExpertIdeasBot (talk) 18:54, 15 June 2016 (UTC)

Dr. Nguena's comment on this article
Dr. Nguena has reviewed this Wikipedia page, and provided us with the following comments to improve its quality:

"The content "Analysis of panel data" is not complete. Please add the following:

1) Individual-Effects Model The standard panel data specification is that there is an individual-specific effect which enters linearly in the regression: yit = x'itβ + ui + eit. The typical maintained assumptions are that the individuals i are mutually independent, that ui and eit are independent, that eit is iid across individuals and time, and that eit is uncorrelated with xit. OLS of yit on xit is called pooled estimation. It is consistent if: E(xitui) = 0

(1) If this condition fails, then OLS is inconsistent. (1) fails if the individual-specific unobserved effect ui is correlated with the observed explanatory variables xit. This is often believed to be plausible if ui is an omitted variable. If (1) is true, however, OLS can be improved upon via a GLS technique. In either event, OLS appears a poor estimation choice. Condition (1) is called the random effects hypothesis. It is a strong assumption, and most applied researchers try to avoid its use.

2) Fixed Effects This is the most common technique for estimation of non-dynamic linear panel regressions. The motivation is to allow ui to be arbitrary, and have arbitrary correlated with xi. The goal is to eliminate ui from the estimator, and thus achieve invariance.

3) Dynamic Panel Regression A dynamic panel regression has a lagged dependent variable yit = αyit−1 + x'itβ + ui + eit. --- (2) This is a model suitable for studying dynamic behavior of individual agents. Unfortunately, the fixed effects estimator is inconsistent, at least if T is held finite as n → ∞. This is because the sample mean of yit−1 is correlated with that of eit. The standard approach to estimate a dynamic panel is to combine first-differencing with IV or GMM. Taking first-differences of (19.3) eliminates the individual-specific effect: Δyit = αΔyit−1 + Δx'itβ + Δeit.

(3) However, if eit is iid, then it will be correlated with Δyit−1 : E(Δyit−1Δeit) = E((yit−1 − yit−2) (eit − eit−1)) = −E(yit−1eit−1) = −σ2e. So OLS on (3) will be inconsistent. But if there are valid instruments, then IV or GMM can be used to estimate the equation. Typically, we use lags of the dependent variable, two periods back, as yt−2 is uncorrelated with Δeit. Thus values of yit−k, k ≥ 2, are valid instruments. Hence a valid estimator of α and β is to estimate (3) by IV using yt−2 as an instrument for Δyt−1 (which is just identified). Alternatively, GMM using yt−2 and yt−3 as instruments (which is overidentified, but loses a time-series observation). A more sophisticated GMM estimator recognizes that for time-periods later in the sample, there are more instruments available, so the instrument list should be different for each equation. This is conveniently organized by the GMM principle, as this enables the moments from the different timeperiods to be stacked together to create a list of all the moment conditions. A simple application of GMM yields the parameter estimates and standard errors.

Reference: William H. Greene (2002) "ECONOMETRIC ANALYSIS". FIFTH EDITION. New York University."

We hope Wikipedians on this talk page can take advantage of these comments and improve the quality of the article accordingly.

Dr. Nguena has published scholarly research which seems to be relevant to this Wikipedia article:


 * Reference : NGUENA Christian-Lambert & NANFOSSO Roger, 2014. "Macroeconomic Factors and Dynamics of Financial Deepening: An empirical Investigation applied to the CEMAC Sub-region," Working Papers 14/015, African Governance and Development Institute..

ExpertIdeasBot (talk) 18:10, 27 June 2016 (UTC)

Dr. Sosvilla-Rivero's comment on this article
Dr. Sosvilla-Rivero has reviewed this Wikipedia page, and provided us with the following comments to improve its quality:

"After the example, "and person 3 is not observed in 2001 or 2003." better than "and person 3 is not observed in 2003 or 2001."

When presenting the structure of the data and the two formats "long" and "wide", it could be interesting to mention that in Stata there are two command "reshape long" and "wide reshape" to convert from one format to another.

An example of the wide format could be presented: id	sex	Income2001	Income2002	Income2003 1	1	1300	1600	2000 2	2	2000	2300	2400

When presenting the analysis of panel data, it should be mention a third model: the pooled-OLS model.

It should be mentioned that in order to determine the empirical relevance of each of the potential methods for our panel data, several statistic tests can be used. In particular, to test fixed-effects versus random-effects, the Hausman test statistic can be used to test for non-correlation between the unobserved effect and the regressors (see Baltagi, 2008, chapter 4). Additionally, to choose between pooled-OLS and random-effects, the Breusch and Pagan (1980)’s Lagrange multiplier test can be used to test for the presence of an unobserved effect. Finally, the F test for fixed effects can be used to test whether all unobservable individual effects are zero, in order to discriminate between pooled-OLS and random-effects.

Reefrences: Baltagi, B. D. (2008). Econometric analysis of panel data, fourth ed. Chichester: John Wiley and Sons. Breusch, T. S. and Pagan, A R. (1980). The Lagrange multiplier test and its applications to model specification in econometrics. Review of Economic Studies, 47, 239-253."

We hope Wikipedians on this talk page can take advantage of these comments and improve the quality of the article accordingly.

Dr. Sosvilla-Rivero has published scholarly research which seems to be relevant to this Wikipedia article:


 * Reference : Marta Gomez-Puig & Simon Sosvilla-Rivero & Maria del Carmen Ramos-Herrera, 2014. "An update on EMU sovereign yield spread drivers in time of crisis: A panel data analysis," Working Papers 2014-04, Universitat de Barcelona, UB Riskcenter.

ExpertIdeasBot (talk) 18:49, 27 June 2016 (UTC)

Dr. Farsi's comment on this article
Dr. Farsi has reviewed this Wikipedia page, and provided us with the following comments to improve its quality:

"Clear and concise."

We hope Wikipedians on this talk page can take advantage of these comments and improve the quality of the article accordingly.

Dr. Farsi has published scholarly research which seems to be relevant to this Wikipedia article:


 * Reference : Cullmann, Astrid & Farsi, Mehdi & Filippini Massimo, 2009. "Unobserved Heterogeneity and International Benchmarking in Public Trasport," Quaderni della facolta di Scienze economiche dell'Universita di Lugano 0904, USI Universita della Svizzera italiana.

ExpertIdeasBot (talk) 18:51, 27 June 2016 (UTC)

Dr. Reed's comment on this article
Dr. Reed has reviewed this Wikipedia page, and provided us with the following comments to improve its quality:

"This article is missing a discussion of the great variety of different panel data estimators such as the list below:

-	Fixed Effects -	Random Effects -	OLS (Robust Cluster) -	FGLS (Weights/EVC) -	Panel Corrected Standard Error (PCSE) -	OLS (Driscoll – Kray) -	Correlated Random Effects -	Anderson-Hsaio -	Difference GMM -	System GMM -	Mean Group -	Pooled Mean Group -	Dynamic Fixed Effect -	Common Correlated Effects Mean Group -	Augmented Mean Group A good discussion of modern panel data estimators, with many relevant references, is Eberhardt, M. and Teal, F. (2011), ECONOMETRICS FOR GRUMBLERS: A NEW LOOK AT THE LITERATURE ON CROSS-COUNTRY GROWTH EMPIRICS. Journal of Economic Surveys, 25: 109–155. doi:10.1111/j.1467-6419.2010.00624.x"

We hope Wikipedians on this talk page can take advantage of these comments and improve the quality of the article accordingly.

We believe Dr. Reed has expertise on the topic of this article, since he has published relevant scholarly research:


 * Reference : W. Robert Reed & Rachel Webb, 2010. "The PCSE Estimator is Good -- Just Not as Good as You Think," Working Papers in Economics 10/53, University of Canterbury, Department of Economics and Finance.

ExpertIdeasBot (talk) 16:07, 12 July 2016 (UTC)

Dr. Marquez-Ramos's comment on this article
Dr. Marquez-Ramos has reviewed this Wikipedia page, and provided us with the following comments to improve its quality:

"1. in the original text, it appears: "Panel data contain observations of multiple phenomena obtained over multiple time periods for the same firms or individuals". Also other entities can be followed over time, not necessarily only firms or individuals, So I would suggest to finish the sentence with "for the same units, as is the case of regions, countries, firms or individuals".

2. " In biostatistics, the term longitudinal data is often used instead". The term can be used in econometrics too (and probably in other disciplines). I suggest writing "The term longitudinal data is also often used instead".

3. I suggest to start in a new paragraph after "Because each person is observed every year, the left-hand data set is called a balanced panel, whereas the data set on the right hand is called an unbalanced panel, since person 1 is not observed in year 2003 and person 3 is not observed in 2003 or 2001." So the paragraph describes the dataset.

4. After the description of the example/dataset, I suggest starting with a new section "Structure of panel data: long and wide formats"

5. After this, the new section would read as follows "This specific structure these data sets are in is called long format where one row holds one observation per time. Another way to structure panel data would be the wide format where one row represents one observational unit for all points in time (for the example, the wide format would have only two (left example) or three (right example) rows of data with additional columns for each time-varying variable (income, age). Representing panel data in long format is much more common than using the wide format."

However, I believe that this description might not be easily understood by the audience and I suggest to simplify things and to show the dataset in wide format:

"The previous example provides an illustration in long format, which uses multiple rows for each observation or participant. Another way to structure panel data would be using the wide format, which uses one row for each observation or participant".

Then, I suggest showing the example in wide format, the balanced panel in wide format is:

person   income2001   income2002    income2003   age2001   age2002    age2003 sex2001 sex2002 sex 2003 1             1300               1600                2000              27             28              29           1             1            1 2              2000                2300               2400              38             39              40           2             2            2

An explanaition could be added. "It can be observed that the first person (1), with sex encoded as 1 (let's say, female) and 27 years in 2001, earned 1300 monetary units in 2001; 1600 monetary units in 2002; and 2000 monetary units in 2003."

I can continue to illustrate the "analysis of panel data" with this specific example later."

We hope Wikipedians on this talk page can take advantage of these comments and improve the quality of the article accordingly.

We believe Dr. Marquez-Ramos has expertise on the topic of this article, since he has published relevant scholarly research:


 * Reference : Luis Marcelo Florensa & Laura Marquez-Ramos & Maria Luisa Recalde & Maria Victoria Barone, 2014. "Does economic integration increase trade margins? Empirical evidence from LAIAs countries," Working Papers 2014/05, Economics Department, Universitat Jaume I, Castellon (Spain).

ExpertIdeasBot (talk) 16:35, 2 August 2016 (UTC)