Introduction
Instrumental variable methods allow consistent estimation when the explanatory variables (covariates) are correlated with the error terms of a regression relationship. Such correlation may occur when the dependent variable causes at least one of the covariates ("reverse" causation), when there are relevant explanatory variables which are omitted from the model, or when the covariates are subject to measurement error. In this situation, ordinary linear regression generally produces biased and inconsistent estimates. However, if an instrument is available, consistent estimates may still be obtained. An instrument is a variable that does not itself belong in the explanatory equation and is correlated with the endogenous explanatory variables, conditional on the other covariates.1
In linear models, there are two main requirements for using an IV:
- The instrument must be correlated with the endogenous explanatory variables, conditional on the other covariates.
- The instrument cannot be correlated with the error term in the explanatory equation, that is, the instrument cannot suffer from the same problem as the original predicting variable.1
Methodology
STATA: ivreg
|
|
- Syntax: Need to specify D (instrumented) and Z (the instrument)
. ivreg Y (D=Z)
Instrumental variables (2SLS) regression
Source | SS df MS Number of obs = 20
-------------+------------------------------ F( 1, 18) = 7.64
Model | 357 1 357 Prob > F = 0.0128
Residual | 136.2 18 7.56666667 R-squared = 0.7238
-------------+------------------------------ Adj R-squared = 0.7085
Total | 493.2 19 25.9578947 Root MSE = 2.7508
------------------------------------------------------------------------------
Y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
D | 8.5 3.07544 2.76 0.013 2.03874 14.96126
_cons | 69.7 1.945079 35.83 0.000 65.61354 73.78646
------------------------------------------------------------------------------
Instrumented: D
Instruments: Z
|
Example Applications
The Impact of Public Expenditure on Agriculture Outcomes in Nepal
IFPRI | Dillon, Sharma, Zhang | PDF
Problem:
The study attempts to determine the impact of 'Public Expenditure' (on agriculture extension & subsidies, irrigation, roads) on 'Per capita spending' to assess the contribution of public expenditure on agriculture.
1) Theoretize model with Y, D, Z
Y (Outcome) = Per capita spending
D (Treatment) = Level of Public expenditure
Z (Instrument) = Conflict Killings (highly correlated with public expenditure, but not with the error term)
2) Regress D and Z to see if reg is statistically significant
The instrument was first regressed with D and was found to be statistically significant.
3) If SS, then ivreg Z on Y, along with covariates
The ivreg regression was performed (note D is divided into 3 sub-parts):
lnY = f(AGexp, IRexp, RDexp, landarea, landelevation, districtpop, rainfall, region, belt)
This process showed these results:
- Rural roads are one of the most productive expenditures with marginal benefits that range from 5.43%-30.25% on consumption per capita
- Estimates of the impact of irrigation expenditures are also very high: 3.92-9.61% on consumption per capita
- BUT, agriculture spending is much lower (1.36-2.98%)
- HENCE:
- returns on agricultural extension & subsidies are low
- returns on roads and irrigation are much more
- ... roads & irrigation should be the areas of target
- ... need to find out why returns on agric extension & subsidies are so low
Reference
(1) Wikipedia: Instrumental Variable
(2) Course materials from Experimental and quasi-experimental methods in program evaluation 2012. Alan Yang. Columbia - SIPA.
- Instrumental Variables Estimate: STATA
- First, estimate the percentage of units who are D=1 when Z=1
- This is just the coefficient of the treatment (Z)
. regress D Z
Source | SS df MS Number of obs = 20
-------------+------------------------------ F( 1, 18) = 3.60
Model | .8 1 .8 Prob > F = 0.0739
Residual | 4 18 .222222222 R-squared = 0.1667
-------------+------------------------------ Adj R-squared = 0.1204
Total | 4.8 19 .252631579 Root MSE = .4714
------------------------------------------------------------------------------
D | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Z | .4 .2108185 1.90 0.074 -.0429133 .8429133
_cons | .4 .1490712 2.68 0.015 .086813 .713187
------------------------------------------------------------------------------
- Next, compute the ITT estimate, which is the regression of Y as a function of Z
regress Y Z
Source | SS df MS Number of obs = 20
-------------+------------------------------ F( 1, 18) = 2.39
Model | 57.8 1 57.8 Prob > F = 0.1396
Residual | 435.4 18 24.1888889 R-squared = 0.1172
-------------+------------------------------ Adj R-squared = 0.0681
Total | 493.2 19 25.9578947 Root MSE = 4.9182
------------------------------------------------------------------------------
Y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Z | 3.4 2.199495 1.55 0.140 -1.220967 8.020967
_cons | 73.1 1.555278 47.00 0.000 69.83248 76.36752
------------------------------------------------------------------------------
- We then divide ITT (3.4) by the percentage of “Compliers” (.40)
- 3.4/.40 = 8.5
- Another way to estimate the IV estimate:
- Regress D on Z (1st stage)
- Save predicted values (Dhats)
- Regress Y on Dhat (2nd stage)
- The coefficient on Dhat is the IV estimate
- The standard error on that coefficient would be inappropriate however because it doesn’t account for the fact that Dhat is a prediction
- Stata’s ivreg command gives the correct s.e. (see below)
. regress D Z
Source | SS df MS Number of obs = 20
-------------+------------------------------ F( 1, 18) = 3.60
Model | .8 1 .8 Prob > F = 0.0739
Residual | 4 18 .222222222 R-squared = 0.1667
-------------+------------------------------ Adj R-squared = 0.1204
Total | 4.8 19 .252631579 Root MSE = .4714
------------------------------------------------------------------------------
D | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Z | .4 .2108185 1.90 0.074 -.0429133 .8429133
_cons | .4 .1490712 2.68 0.015 .086813 .713187
------------------------------------------------------------------------------
. predict pre_1, xb
. regress Y pre_1
Source | SS df MS Number of obs = 20
-------------+------------------------------ F( 1, 18) = 2.39
Model | 57.8 1 57.8 Prob > F = 0.1396
Residual | 435.4 18 24.1888889 R-squared = 0.1172
-------------+------------------------------ Adj R-squared = 0.0681
Total | 493.2 19 25.9578947 Root MSE = 4.9182
------------------------------------------------------------------------------
Y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pre_1 | 8.5 5.498737 1.55 0.140 -3.052418 20.05242
_cons | 69.7 3.477707 20.04 0.000 62.39361 77.00639
------------------------------------------------------------------------------
- STATA: ivreg
- Simple IV estimate
- Syntax: Need to specify D (instrumented) and Z (the instrument)
. ivreg Y (D=Z)
Instrumental variables (2SLS) regression
Source | SS df MS Number of obs = 20
-------------+------------------------------ F( 1, 18) = 7.64
Model | 357 1 357 Prob > F = 0.0128
Residual | 136.2 18 7.56666667 R-squared = 0.7238
-------------+------------------------------ Adj R-squared = 0.7085
Total | 493.2 19 25.9578947 Root MSE = 2.7508
------------------------------------------------------------------------------
Y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
D | 8.5 3.07544 2.76 0.013 2.03874 14.96126
_cons | 69.7 1.945079 35.83 0.000 65.61354 73.78646
------------------------------------------------------------------------------
Instrumented: D
Instruments: Z
Comments (0)
You don't have permission to comment on this page.