| 
View
 

03 Panel Data Analysis Methods

This version was saved 13 years, 3 months ago View current version     Page history
Saved by editor
on November 22, 2012 at 5:58:07 pm
 

Panel Data Analysis

 



 

Introduction


 

Panel data, also called longitudinal data or cross-sectional time series data, are data where same entities (panels) like people, firms, and countries were observed at multiple time points.

  • National Longitudinal Survey is an example of panel data, where a sample of people were followed up over the years.
  • General Social Survey data, for example, are not longitudinal data although a group of people were surveyed for multiple years, because the respondents are not necessarily the same each year.

 

 

Methodology


 

Setting up STATA database

 

  • Obtain your file
    • Ex: Open nlswork.dta. Give a command:

                    . use http://www.stata-press.com/data/r11/nlswork.dta

 

  • Declare that data set is panel
    • Indicate
      • the name of the panel (idcode)
      • the time variables (year)
      • Both panel and time variables need to be numeric
    • Then type:

                    . xtset idcode year

                    OR: Select Statistics -> Longitudinal/panel data -> Setup and Utilities -> Declare dataset to be panel data

xtset.JPG

                    Output:

      • "Unbalanced" idcode: gaps present among the id numbers
      • "Year... but with gaps": no need to do anything
      • Other considerations:
        • Create a numeric code for any string panel variable (Ex: string date --> Stata date format
  • Data should like this

          Panel ID     Time Variable     Var3     Var4     Var5...

               1                 1978                 ..           ..           ..   

               2                 1979

               3                 1980

               4                 1981

          (unique)        (Stata date)

          (numeric)       (numeric)

 

 

Fixed Effects Regression

 

Entity Fixed Effects

What is it?

  • It helps to control for omitted variables that differ among panels, but are constant over time   

 

Example: Effect of 'experience' on 'earning'

  • Here, earning (wage) obviously is influenced by factors other than experience (tenure), such as personality of the person, which can be assumed to stay constant over time
  • With this assumption of fixed entity (other than 'experience') we can run a fixed effects regression with the following model: 

Then the model will be:

  • ln(Wage) = intercept + b1*(TenureForEachPanel&Time) + b2*(UnobservedCharacteristicsForEachPanel) + ErrorForEachPanel&Time

 

  • In STATA:

 

. xtreg ln_wage tenure, fe

 

    • xtreg is used for panel data
    • fe indicates other variables have fixed effect

 

  • Output:

 

 

 

 

 

 

where UnobservedCharacteristics varies from a person to person but does not change over time. So, this equation is the same as:

LogWage = interceptForEachPanel + b1TenureForEachPanel&Time + ErrorForEachPanel&Time

in which the interceptForEachPanel absorbs the b2, coefficient for the UnobservedCharacteristicsForEachPanel in the previous equation. The second equation, for sample data, becomes

EstimatedLogWage = PanelFixedEffects + b1TenureForEachPanel&Time

 

To run this fixed effects regression in Stata, give a command:

. xtreg ln_wage tenure, fe

If you are using menus, select Statistics -> Longitudinal/panel data -> Linear models -> Linear regression(FE,RE,PA,BE).

fe.JPG

The output shows you that it is a fixed-effects regression, with a group variable idcode. There is a total of 28,101 observations, and 4,699 groups (persons in this case). The observations per group, in this case year, ranges from 1 to 15. Plugging in the coefficients into the above model, we have:

EstimatedLogWage = 1.57 + 0.034TenureForEachPanel&Time

This is equivalent to including n-1 <a href="/Panel/regDum.html">dummy variables</a> in the model, where n is the total number of panels in your data. Or, it may be more intuitive to exclude a constant and include n dummy variables. The dummy variables will absorb the panel variations that are consistent across time. In this data, it is not practical to create a dummy variable for each person, as there are close to 5000 people! But if you have a small number of panels, then you would have obtained the same coefficient by running a regular regression with dummy variables. We'll examine this in time fixed effects section, because we have 15 years, which is more manageable than 5000 panels.

Time Fixed Effects

Suppose instead that you assume that there are unobserved effects that vary across time rather than acrss persons that impact the income. Say, national economy in general may impact people's income the same way for everyone, but it may be different at different point in time. To control for such unobserved variable that may vary by time, you can run time fixed effects regression model. Such model may look like:

LogWage = interceptForEachTime + b1TenureForEachPanel&Time + ErrorForEachPanel&Time

So, the intercept includes the variation of time rather than panels. Estimation model will then be:

EstimatedLogWage = TimeFixedEffects + b1TenureForEachPanel&Time

In Stata, you can run time fixed model by using areg, and have year as the variable to be absorbed. In Command box, type:

. areg ln_wage tenure, absorb(year)

If using menus, select Statistics -> Linear models and related -> Other -> Linear regression absorbing one cat. variable

areg.JPG

By plugging in these coefficients, we have

EstimatedLogWage = 1.55 + 0.039TenureForEachPanel&Time

As in the case of entity fixed effects, you can include t-1 dummy variables in the model, where t is the total number of years in the data. tabulate command with generate option creates a dummy variable for each year. Using asterisk (*) with yr, you can include yr1 through yr15. If you include a constant, then the constant takes the effect of the year that is ommitted. Here we purposefully exclude the constant. In the areg, the constant absorbed all the year's effects, whereas in the dummy version, you'll have an intercept for each year.

. tabulate year, generate(yr)
. regress ln_wage tenure yr*, hascons

reWdum.JPG

Notice that the coefficient, standard error, and t-value of tenure are the same as in areg results. In time fixed effects model, we assumed that the slope for tenure is the same for all years but the intercept is different. If you think that there not only are effects that are different for each year, the effect of tenure would also be different for each year, you may run regressions for each year. Well, it may be a bit of a digression from time fixed effects.

. sort year
. by year: regress ln_wage tenure

Random Effects Regression

If you believe that some omitted variables may be constant over time but vary among panels, and others may be fixed among panels but vary over time, then you can apply random effects regression model. Stata's random-effects estimator is a weighted average of fixed and between effects models. You can run the model in Stata by giving a command:

. xtreg ln_wage tenure, re

re.JPG

Estimated regression outcome is:

EstimatedLogWage = 1.55 + 0.0375TenureForEachPanel&Time

 

Choosing Between Fixed and Random Effects

If you aren't exactly sure which models, fixed effects or random effects, you should use, you can do a test called Hausman test. To run a Hausman test in Stata, you need to save the coefficients from each of the models and use the stored results in the test. To store the coefficients, you can use "estimates store" command.

. xtreg ln_wage tenure, fe

. estimates store fixed

. xtreg ln_wage tenure, re

. estimates store random

. hausman fixed random

From the menu, select Statistics -> Postestimation -> Tests -> Hausman specification test

hauseman.JPG

The hausman test tests the null hypothesis that the coefficients estimated by the efficient random effects estimator are the same as the ones estimated by the consistent fixed effects estimator. If they are, then it is safe to use random effects. If you get a statistically significant P-value, however, you should use fixed effects. In this example, the P-value is statistically significant. Therefore, fixed effects would be more appropriate in this case.

Between Effects Regression

To run between effects regression, give a command:

. xtreg ln_wage tenure, be

be.JPG

Applying between effects regression model is equivalent to taking the mean of each variable in the model for each panel across time and running a regression on the collapsed dataset of means. In this data, some people's tenure information is missing. In xt command, Stata will automatically exclude missing values from the computations. Manually creating means with collapse, however, will not automatically exclude missing values. So we need to remove the cases with missing tenure before collapsing.

. drop if tenure == .
. sort idcode
. collapse (mean) meanLnWage=ln_wage meanTenure=tenure, by(idcode)
. regress meanLnWage meanTenure

be2.JPG

The number of observations in the second regression matches the number of groups in xt regression. You see that it created identical result as in the xt result above.

 

References

Stata Longitudinal/Panel-Data Reference Manual Release 11. Stata Press.

Stock, James H. and Watson, Mark W. 2007. "Chapter 10 Regressino with Panel Data" in Introduction to Econometrics. Second Edition. Pearson Education, Inc.

UCLA Academic Technology Services. Using xtreg. http://www.ats.ucla.edu/stat/stata/code/xt.htm ; retrieved August, 2010.

UCLA Academic Technology Services. Stata FAQ: What is the relationship between xtreg-re, xtreg-fe, and xtreg-be? http://www.ats.ucla.edu/stat/stata/faq/revsfe.htm ; retrieved August, 2010.

 

 

 

Comments (0)

You don't have permission to comment on this page.