| 
View
 

STATA: Quick Command Reference

This version was saved 13 years ago View current version     Page history
Saved by editor
on November 22, 2012 at 8:49:16 pm
 

 

  1. Getting Started
    1. Help and Search
      1. help regress
      2. search fixed effect
    2. Creating log (procedure and output) file
      1. log using a:\wage1, replace
      2. log close
      3. Increasing the amount of memory
      4. set memory 5M
    3. Batch or interactive
      1. Using commands from keyboard
      2. Using a do file
    4. Reading Data Files
      1. use a:\wage1
      2. infile
      3. save
      4. clear
    5. Looking At and Summarizing Your Data
      1. list wage edu
      2. list wage edu in 1/20
      3. list married age if hours = = 0
      4. list married age if union= =1 & hours >= 40
      5. sum wage edu tenu married
      6. sum wage edu tenure married, detail
      7. sum wage edu tenu married if year = =1990
      8. sum wage edu in 1/20
      9. tab married
      10. by married: sum wage
      11. drop if ~union (or drop if union = = 0)
      12. keep if (year >= 1986) & (year <= 1990)
      13. drop in 2672
    6. Defining New Variables
      1. generate expsq = exp^2
      2. gen lwage = ln(wage) if hours > 0
      3. gen ccrime = crime - crime [_n-l] if year = = 1987
      4. replace expsq = exp^2
  2. Basic Estimation Commands
    1. Ordinary Least Squares
      1. reg lwage edu exp expsq tenu union
      2. predict lwagehat
      3. predict uhat, resid
      4. test north south east
      5. reg lwage educ exper expersq married black, robust
    2. Instrumental Variable (Two Stage Least Squares)
      1. ivreg lwage (edu =married) exp expsq tenu union
    3. Fixed Effect Model
      1. xtreg lwage edu exp expsq tenu union, fe
    4. Random Effect Model
      1. xtreg lwage edu exp expsq tenu union, re
    5. Between Effect Model
      1. xtreg lwage edu exp expsq tenu union, be
  3. Editing the Command Line
  4. Using Stata as a Calculator and Computing p-values
    1. Calculator
      1. di .048/(2*.0016)
    2. P-values
      1. di normprob (1.58)
      2. di tprob (df, t)
      3. di fprob (df1, df2, F)
  5. Time Series Analysis
    1. Generating lags and leads
      1. . gen xlag1= x[_n-1]
  6. Duration Model
    1. Simultaneous equation model
    2. Constrained regression
    3. Other diagnostics

     

     

     

    Getting Started


    Help and Search

     

    help regress

    search fixed effect

     

    Creating log (procedure and output) file

     

    Suppose you want to print out the “tutorial” or results from I. Stata for Dummies. Yes, you should create an output file. In particular, for involved projects, you must create a record of what you have done (data transformations, regressions, and so on). To do this you can create a log file. With a diskette in the A: drive, before doing any analysis, type:

     

    log using a:wage1, replace

     

    This will create the file wage1.log on the diskette in the A: drive. Or just

    Click the log start/stop icon, and follow the direction.

     

    Stata log files are just standard ASCII files. They can also he directly sent to a printer. However, I do not like the font size and format. So the best way is to read the log by using MS Word or any text editor (hint: Courier new with 8 font size is best). So why don’t you open any log file and type “tutorial intro” and print it out? When you are finished, you can close the log file for good

     

    log close

     

    After typing this command, log on will not open the log file. If you decide to add onto the end of an existing log file type:

     

    log using a:\wage1, append

     

    Increasing the amount of memory

     

    The default is only 1Mb. If your data set is bigger than 1Mb, you cannot even open the data with this default. Always have at least two times bigger memory than your data set. There are two ways to increase memory. The first way is, at the beginning, type (for 5Mb)

     

    set memory 5M

     

    However, allocating memory every time is cumbersome. So, you can change the amount of memory Stata uses once it is running. Create shortcut of the wstata.exe file. Click properties, and choose shortcut tab. The target probably says something like

     

                C:\stata\wstata.exe /k1000

     

    Stata is to allocate 1000k (1Mb). If you change the option to, say, k5000 (5Mb), Stata would allocate 5Mb. Change the number as you wish. The number needs to be a multiple of 1000. If your computer’s memory is smaller than the allocation, then the Stata will use virtual memory (not recommended).

     

    Batch or interactive

     

    Using commands from keyboard

    Using a do file

    Of course, it is possible to cause Stata to execute the commands stored in filename (batch mode) just as if they were entered from the keyboard (interactive mode). If filename is specified without an extension, .do is assumed. This is called “do file”. You will find this batch mode is extremely helpful. Refer to ancillary handouts.

     

     

    Reading Data Files

     

    The command to read a Stata file is “use”. Of course you can instead use the Stata tool bar. If the Stata file is called wage1.dta, and the file is on the diskette in the A: drive, the command is:

     

    use a:wage1

     

    After entering this command the data file wage1 is loaded into memory (Note that Stata is case sensitive). However, life is not that easy. Not every data has Stata format. If the data is not Stata format, you should change the data to Stata format. There are many ways. Here are my hints.

     

    1. If you want to input data, then just use the “Data Editor” in Stata. After you input the data, and save it by pulling down File and choosing Save as. It is easy. The Data Editor is compatible with MS Excel. So as long as your data is in Excel format, you can just copy them to the editor.

     

    2. If the data is saved as any software format, for example, Excel, SPSS, SAS, Dbase, Limdep, RATS, Gauss……, then you can use STAT/Transfer program. This is the easiest way to transfer one data set to another. If you think you will heavily use micro-data set in the future, consider buying it.

     

    3. Or you can create Stata data from ASCII file (text file). You may be able to convert your data set into ASCII format. In most cases, your data set is already an ASCII file. There is a command called infile, that allows you to read an ASCII file. The file must be organized with an observation in each row, and the variables in the data set in its own column.

     

    a) If each number is separated by space, for example, suppose a wage data set is organized as

     

    10.75   12        6          1          0
    16.50   16        3          0          0

    12.10    12         8          1          1

     

    Each row corresponds to an individual. In the example above, the first variable is hourly wage, the second is years of education, the third is experience, the fourth is a dummy variable equal to unity for unionized firm, zero for non-unionized firm, and the last variable is an indicator variable for marital status (which equals unity for married individuals). The variables in this example are equally spaced but this spacing is not essential. If these data are in the file wage.raw on the A: drive, then the command:

     

    infile

    infile wage edu exp union married using a:\wage.raw

     

    reads in each row of data, and stores the data on each variable into the appropriate name. Once the ASCII file has been read, it is a good idea to save it as a Stata file. The command:

     

    save

    save a:\wage2 

     

    creates the Stata file wage.dta on the A: drive. Notice that the file type dta denotes a stata file. If you are working with multiple years of data for each individual, it is a good idea to include in your data set a variable indicating the year and id of the observation.

     

    b) If each number is not separated by space or tab, then you should create a dictionary file to read the data. Create exer1.dct (ASCII) file as follows. Assume that there is a data set named original.dat

     

    dictionary using C:\original.dat  {

    year                  %2f 

    firmsize %1f 

    sampling           %3f 

    union                %1f 

    idnumber          %4f 

    _column(40)

    sex                   %1s 

    _column(44)

    msts                 %1f 

    emptype           %1f 

    shift                  %2f 

    expyears           %1f 

    }

    “ind” and “sex” is string (s) and others are numbers(f). %2f means occupying 2 columns with numbers. If you do not want to read all variables then jump to column 40 where the variable sex is.

     

                Then type

     

    infile using a:\wage.dct

     

    For more advanced features on inputting data, you can refer to the Stata User’s Guide which is published by the Stata Press.

     

    If you have completed your analysis with a file such as wage1.dta, and then wish to use a different data set, you simply clear the existing data set from memory. The command to use is

     

    clear

     

    By issuing this command, it is important to know that any changes you made to the data set during your current Stata session will be lost.

     

     

    Looking At and Summarizing Your Data

     

    After reading in a data file, you can get a list of the available variables by typing des. Often a short description has been given to each variable. To look at the observations of one or more variables, use the list command. For example, to look at the variables wage and edu for all observations, type:

     

    list wage edu

     

    This will list, one screen at a time, the data on wage and edu for every person in the sample. (Missing values in Stata are denoted by a period.) If the data set is large, you may not wish to look at all observations. You can always stop the listing by hitting Ctrl-Break on the keyboard. In fact, Ctrl-Break can be used to interrupt any Stata command.

    Alternatively, there are various ways to restrict the range of the listing and many other Stata commands. To look at the first 20 observations on wage and edu type:  

     

    list wage edu in 1/20

     

    Rather than specify a range of observations, a logical command can be used instead. For example, to look at the data on marital status and age for people with zero hours worked type:

     

    list married age if hours = = 0

     

    Notice how the double equal sign is used by Stata to determine equivalence. The other relational operators in Stata are > (greater than), < (less than), >= (greater than or equal), <= (less than or equal), and ~= (not equal)). Or if you want to restrict attention to non union members, type:

     

    list married age if union= =0

     

    The variable union is a binary indicator equal to unity for union members, and zero otherwise. The ~ is the logical "not" operator. We can combine many different logical statements. The command:

     

    list married age if union= =1 & hours >= 40

     

    restricts attention to union members who work at least 40 hours a week. (Logical and is denoted by “&” and logical or is denoted by "|" in Stata.)          .'

    Two useful commands for summarizing data are the sum and tab commands. The sum command computes the sample average, standard deviation, and the minimum and maximum values of all (nonmissing) observations. Because this command tells you how many observations were used for each variable in computing the summary statistics, you can easily find out how many missing data points there are for any variable. Thus, the command:

     

    sum wage edu tenu married

     

    computes the summary statistics for the four variables listed. Because married is a binary variable, its minimum and maximum values are not very interesting. The average value reported is simply the proportion of people in the sample who are married.

    To obtain more summary information for each of these variables you must type:

     

    sum wage edu tenure married, detail

     

    By adding the detail option, Stata provides an extensive list of summary statistics for each of these variables including the median and other percentiles of the empirical distribution.

    Stata also provides summary statistics for any subgroup of the sample if you add a logical statement:

     

    sum wage edu tenu married if union

     

    If the data is a pooled cross section or a panel data set, to summarize for 1990 type:

     

    sum wage edu tenu married if year = =1990

     

    The sample can be restricted to certain observation ranges by using the in m/n option, just as illustrated in the list command:

     

    sum wage edu in 1/20

     

    For variables that take on a relatively small number of values - such as number of children or number of times an individual was arrested during a year - you can use the tab command to get frequency tabulation:

     

    tab married

     

    This command reports the frequency associated with each value of arrests in the sample. You also can combine this command with logical statements or restrict the range of observations.

     

    In order to calculate the frequency of arrests by city you will need to use the sort command. First you need to sort the data by city by typing:

     

    sort married

     

    Once the data is sorted then you summarize the variable by typing:

     

    by married: sum wage

     

    In order to calculate the wage by another variable, say year, you will need to resort the data by year and then use the su command again.

    Sometimes, you may want to restrict all subsequent analysis to a particular subset of the data. In such cases it is useful to delete the data that will not he used subsequently. This can be done using the drop or keep commands. For example, if we want to analyze only union members in a wage equation, then we can type:

     

    drop if ~union (or drop if union = = 0)

     

    This drops everyone in the sample who is not union member. Or, to analyze only the years between 1986 and 1990 (inclusive), we can type:

     

    keep if (year >= 1986) & (year <= 1990)

     

    In order to drop a particular observation, say observation 2672, you must type:

     

    drop in 2672

     

    It is important to know that the data dropped are gone from the current Stata session. If you want to get them back, you must reread the original data file. Along these lines, do not make the mistake of saving the smaller data set over the original one, or you will lose a chunk of your data.

    BE SURE TO KEEP AN EXTRA BACKUP FILE OF YOUR STATA DATA SETS (for both beginners and experts!!). 

     

     

     

    Defining New Variables

     

    It is easy to create variables that are functions of existing variables. In Stata, this is accomplished using the gen command (short for generate). For example, to create the square of experience, type:

     

    generate expsq = exp^2

     

    The new variable, expsq, can he used in a regression or any place else Stata variables are used (Stata does not allow us to put expressions such as exp^2 into regression commands; we must create the variables first.) When creating Stata variables, you should remember that the name of variables can not he longer than eight characters. Stata will refuse to accept names longer than eight characters in the gen command (and in all other Stata commands). If an observation had a missing value for exp then, naturally, expsq will also be missing for that observation. In fact, Stata will tell you how many missing observations were created after every gen command. If Stata reports nothing, then no missing observations were generated.

    To find the natural log of a variable such as wage, type:

     

    gen lwage = ln(wage)

     

    If saving is missing then lwage will also be missing. For functions such as the natural log, there is an additional consideration: ln(wage) is not defined for wage <= 0. When a function is not defined for particular values of the variable, Stata sets the result to missing.

    Logical commands can be used to restrict observations used for generating new variables. For example:

     

    gen lwage = ln(wage) if hours > 0

     

    creates ln(wage) for people who work ( and therefore whose wage can be observed). Using the gen command without the statement if hour > 0 has the same effect in this example because wage is missing for those individuals who do not work.

    Creating interaction terms is easy:

     

    gen blckedu = black*edu

     

    where “*” denotes multiplication: the division operator is “/”. Addition is “+” and subtraction is

    “-”. The gen command also can be used to create binary variables. For example if fratio is the funding ratio of a firm's pension plan, the dummy variable overfund can he created which is unity when fratio > 1 and zero otherwise:

     

    gen overfund = fratio > 1

     

    The way this command works is that the logical statement on the right hand side is evaluated to be true or false; then true is assigned the value unity, and false assigned the value zero. So overfund is unity if frafio >1 and overfund is zero if fratio <= 1. As another example, we can create year dummies using a command such as:

     

    gen y85 = (year = = 1985)

     

    where year is assumed to be a variable define in the data set. The variable y85 is unity for observations corresponding to 1985, and zero otherwise. We can do this for each year in our sample to create a full set of year dummies.

     

    The gen command also can be used to difference data among different years. Suppose that, for a sample of cities, we have two years of data for each city (say 1982 and 1987). The data are stored so that the two years for each city are adjacent in the file, with the 1982 observation preceding the 1987 observation. To eliminate unobserved "fixed" effects, say in relating city crime rates to expenditures on crimes and other city characteristics, we can relate changes overtime. Stata stores the changes between 1982 and 1987 alongside the 1987 data. It is important to remember that for 1982 there is no change from a previous time period because we do not have data on a previous time period. Therefore, we should define the change data so that it is missing in 1982. For example:

     

    gen ccrime = crime - crime [_n-l] if year = = 1987

     

    gen cexpend = expend - expend[_n-l] if year = =1987

     

    The variable "_n" is the reserved Stata symbol for the current observation; thus, _n-1 is the variable lagged once. The variable ccrime is the change in crime between 1982 and 1987; cexpend is the change in expenditures between 1982 and 1987. These new change variables are stored next to the 1987 observations, and the corresponding change variables for 1982 are missing denoted as a ".". We can then use these change variables in a regression analysis, or some other analysis.

     

    The replace command is useful for correcting mistakes in definitions and redefining variables after values of other variables have changed. Suppose, for example, when creating the variable expsq, you mistakenly typed "gen expsq = exper^3.'' One possibility is to drop the variable expsq and try again:

     

    drop expsq

     

    gen expsq = exp^2

     

    (Note that the drop command really has two purposes: to delete all variables for certain observations and to drop one or more variables for all observations.) A faster route is to use the replace command:

     

    replace expsq = exp^2

     

    Stata explicitly requires the replace command to write over the contents in a previously defined variable.

     

     

    Basic Estimation Commands

     

    Ordinary Least Squares

     

    For OLS regression, we use the command reg. Immediately following reg is the dependent variable, and after that, all of the independent variables (order of the independent variables is not, of course, important). An example is:

     

    reg lwage edu exp expsq tenu union

     

    This command produces OLS estimates, standard errors, t statistics, confidence intervals, and a variety of other statistics usually reported with OLS output. Unless a specific range of observations or logical statement is included, Stata uses all possible observations in obtaining estimates. It does not use observations for which data on the dependent or any of the independent variables is missing. Thus, you must he aware that adding another explanatory variable can result in fewer observations used in the regression if some observations are missing for that variable. If a variable called "motheduc" (mother's education) is added to the independent variables in the above regression, and this variable is missing for say 10 percent of individuals, then the sample size using in obtaining OLS estimates is decreased accordingly.

    Sometimes we want to restrict our regression analysis based on the size of one or more of the variables. For example,

     

    reg lwage edu exp expsq tenu union if edu<16

     

    where size is the number of employees of a firm restricts the analysis to firms with no more than 5,000 employees. The regression also can be restricted to a particular year using a similar if statements, or to a particular observation range using the command in m/n.

    Predicted values are obtained using the predict command. Thus, if a regression is run with lwage as the dependent variable, to get the fitted values type:

     

    predict lwagehat

     

    The choice of the name lwagehat is arbitrary, subject to its being no more than eight characters and its not already being used. The predict command saves the fitted values for the most recently run regression.

    The residuals can be obtained by:

     

    predict uhat, resid

     

    where again the name uhat is arbitrary.

    You can test multiple linear restrictions after an OLS regression by using the test command. Consider a regression which controls for four Census regions: north, south, east, and west. Because the regression includes a constant term, we can identify parameters for three of the four regional dummy variables. Suppose we exclude the "west" dummy from the regression and we wish to test whether there are any "regional effects" in the data. To test whether the coefficients for the north, south, and east dummy variables are jointly zero you can just lists the variables hypothesized to have no effect:

     

    test north south east

     

    The result of this test tells you whether the three regional indicators can be excluded from the previously estimated model. Along with the value of the F-statistic, Stata also reports a p-value. As with the predict command, test is applied to the most recently estimated model.

    OLS estimates with heteroskedasticity-robust standard errors and t statistics can be obtained using  robust option. Remember, this is just OLS, but the asymptotic variance is estimated in a heteroskedasticity-robust fashion. For example,

     

    reg lwage educ exper expersq married black, robust

     

    Instrumental Variable (Two Stage Least Squares)

     

    The reg command can also be used to estimate models by 2SLS. After specifying the dependent variable and the explanatory variables - which presumably contain at least one endogenous variable (that is correlated with the error) - one then lists all of the exogenous variables as instruments in parentheses. Naturally, the list of instruments does not contain any endogenous variables.

     

    An example of a 2SLS command is:

     

    ivreg lwage (edu =married) exp expsq tenu union

    ivreg lwage (edu =married exp expsq tenu union) exp expsq tenu union

     

    This command produces 2SLS estimates, standard errors, t statistics, and so on. By looking at this command, we see that edu is an endogenous explanatory variable in the lwage equation while exp, expsq, and union are assumed to be exogenous explanatory variables. The variable married is assumed to be additional exogenous variable that does not appear in the lwage structural equation but should have some correlation with edu. These appear in the instrument list along with the exogenous explanatory variables.

     

    The order in which we order the instruments is not important. The necessary condition for the model to be identified is that the number of terms in parentheses is at least as large as the total number of explanatory variables. In this example, the count is five to four, and so the order condition holds.

     

    In the previous example, we allowed for just one endogenous explanatory variable, "edu". Allowing for more than one endogenous explanatory variable is also easy. After 2SLS, we can test multiple restrictions using the test command, just as with OLS.

     

    Fixed Effect Model

     

    iis id ¿  

    tis year ¿

    xtreg lwage edu exp expsq tenu union, fe

     

     

    xtreg lwage edu exp expsq tenu union, re

     

    xtreg lwage edu exp expsq tenu union, be

     

     

     

    Random Effect Model

    iis id ¿  

    tis year ¿

    xtreg lwage edu exp expsq tenu union, re

     

    Between Effect Model

    iis id ¿  

    tis year ¿

    xtreg lwage edu exp expsq tenu union, be

     
     

     

     

     

     

    Editing the Command Line

     

    Stata has several shortcuts for entering command. Two useful keys are Page Up and Page Down. If at any point you hit Page Up, the previously executed command appears on the command line. This can save on a lot of typing because you can hit Page Up and edit the previous command. Among other things, this makes adding an independent variable to a regression, or expanding and instrument list easier. Hitting Page Up repeatedly allows you to traverse through previously executed commands until you find the one you want. Hitting Page Down takes you back down through all of the commands.

    It is easy to edit the command line. Hitting Home on the keyboard takes the cursor to the beginning of the line; hitting End moves the cursor to the end of the line. The key Delete deletes a single character to the fight of the Cursor; holding it down will delete many characters. The Backspace key (a left arrow on many keyboards) deletes a character to the left of the cursor. Hitting the left arrow moves you one character to the left, and the right arrow takes you one character to the right. You can hold down either to move several characters. The key Ins allows you to toggle between insert and overwrite modes. Both of these modes are useful for editing commands.

     

     

    Using Stata as a Calculator and Computing p-values

     

    Calculator

    Stata can be used to compute a variety of expressions, including certain functions that are not available on a standard calculator. The command to compute an expression is disp or di for short. The command:

     

    di .048/(2*.0016)

     

    will return "15." We can use the di command to compute natural logs, exponentials, squares, and so on. For example:

     

    di exp(3.5 + 4*.06)

     

    P-values

    returns the value 42.098 (approximately). These previous calculation can he performed on most calculators. More importantly, we can use di to compute p-values after computing a test statistic. The command:

    di normprob (1.58)

     

    gives the probability that a standard normal random variable is greater than the value 1.58 (about .943). Thus, if a standard normal test statistic takes on the value 1.58, the p-value is 1 - .943 = .057. Other functions are geared to give the p-value directly.

     

    di tprob (df, t)

     

    returns the p-value for a t test against a two-sided alternative (t is the absolute value of the “t” statistic and “df” is the degrees of freedom). For example, with df= 31 and t = 1.32, the command returns the value .196. To obtain the p-value for an F test, the command is:

     

    di fprob (df1, df2, F)

     

    where “df1” is the numerator degrees of freedom, “df2” is the denominator df, and F is the value of the F statistic. As an example:

     

    di fprob (3, 142, 2.18)

     

    returns the p-value .093.

     

     

    Estimation and test techniques

     

    1. Time Series
    2. Duration Model
    3. Simultaneous Equation
    4. Constrained Regression
    5. Test and Diagnostics

     

     

    Time Series Analysis

     

     

    Generating lags and leads

     

    If you sort the data by date, then the lagged variable x can be obtained by typing

     

    . gen xlag1= x[_n-1]

     

    Of course you can use as many lags as you want.

     

    . gen xlag2=x[_n-2]

     

    Likewise, you can lead the date by using _n+1, _n+2.

     

    If you are a serious user of time-series data, then it would be better served by using time series operators. The time series operators are L (lag), F (lead), D (difference) and S(seasonal). You must set the time variables using the tsset command.

     

    . tsset  year, yearly        /*declare dataset to be time-series data*/

    . reg consume gnp L.gnp L2.gnp D.gnp D2.gnp S2.gnp

    . sum interest if F.gnp<gnp

     

     

    2) Examples

     

    corrgram lists a table of the autocorrelations, partial autocorrelations, and Q statistics.  It will also list a character-based plot of the autocorrelations and partial autocorrelations. The ac command produces a correlogram (the autocorrelations) with pointwise confidence intervals obtained from the Q statistic. The wntestq command produces the Box-Pierce Q test statistics. The null hypothesis is the autocorrelation coefficients are simultaneously equal to zero.

     

    . corrgram r

    . corrgram r, lags(5)

    . wntestq r

     

    dfuller performs the augmented Dickey-Fuller test of unit roots.  This test performs a regression of the differenced variable on its lag and the user specified number of lagged differences of the variable. Optionally a constant trend term may be included as well as the associated regression. The null hypothesis is that there is a unit root.

     

     . dfuller r

     . dfuller r, lags(3) trend regress

     

    arima estimates a model of depvar on varlist where the disturbances are allowed to follow a linear autoregressive moving-average (ARMA) specification. The dependent and independent variables may be differenced or seasonally differenced to any degree.  When independent variables are not specified, these models reduce to autoregressive integrated moving-average (ARIMA) models in the dependent variable.  Missing data are allowed and are handled using the Kalman filter. arima allows time-series operators in the dependent variable and independent variable lists and it is often convenient to make extensive use of these operators.

     

        .arima r, arima(1,1,1)

        .arima D.r, ar(1) ma(1)  /*same as above*/

       . arima r, arima(3,2,4)

       . arima D2.r, ar(1/3) ma(1/4)   /*same as above*/

     

    There are other test commands such as Granger causality test and cointegration test. The ado program files are not originally installed, but you can find and download it from STB(Stata Technical Bulletin) web site. If your computer is connected to network, just click the search result.

     

    Duration Model

     

    If the data set is already a duration data, then can use duration model. For example, you can use either Cox or Weibull model. You can also use hr option to report the hazard ratio instead of coefficient.

     

         . cox studytim drug age, dead(died)

         . cox studytim drug age, dead(died) hr

     

    If you are a serious user of the duration data set you might want to use stset command: stset declares data to be survival-time data.

     

         . stset studytim, failure(died)

         . stset studytim, failure(outcome==2)

        . stset studytim, failure(outcome==2) id(patientno)  /*multiple failure*/

     

    Once declared as duration data, you can use

         . cox drug age

          .cox drug age, hr

     

    Compare these results with those above. Also can use weibull instead of cox.

     

     

    Simultaneous equation model

     

    sureg estimates Zellner's seemingly unrelated regression models. This is in fact a FGLS and especially useful when all equations consist of only exogenous variables among explanatory variables.  Suppose GDP = f(G, I) and M2=g(r). Do

     

    . sureg (gdp g i) (m2 r). Compare the result with two ols result

     

    Now suppose some equations contain endogenous variables among the explanatory variables. I guess everyone is already familiar with 2sls method (ivreg). Another way to estimate this is estimation of a system of structural equations by using reg3 command. Suppose GDP = f(G, I, M2) and M2 = g(GDP, r). Notice that the two dependent variables are now included as explanatory variable in the equations. reg3 can also estimate systems of equations by seemingly unrelated regression (SURE).

     

    . reg3 (gdp m2 g i) (m2 gdp r)

    . reg3 (gdp m2 g i) (m2 gdp r), sure

     

     

    Constrained regression

     

    cnsreg estimates constrained linear regression models. constraint(constraint list) is not optional; it specifies the constraint numbers of the constraints to be applied.  Constraints are defined using the constraint command.

     

            . constraint def 1 price = weight

            . cnsreg mpg price weight, constraint(1)

            . constraint def 2 gratio = -foreign

            . cnsreg mpg price weight displ gratio foreign length, c(1-2)

            . constraint define 3 _cons = 0

            . cnsreg mpg price weight displ gratio foreign length, c(1-3)

     

    1. Test and Regress Diagnostics

     

    test tests linear hypotheses about the estimated parameters from the most recently estimated model. testnl tests nonlinear (or linear) hypotheses about the estimated parameters from the most recently estimated model.

     

            . reg gdp m2 g i

            . test m2 g

            . test m2-5*g +3 = = 0

            . testnl _b[m2]/_b[g] = =1

            . testnl (_b[m2]/_b[g] = =1)  (_b[m2] = = 2)

     

    test perform Wald tests.  See help lrtest for likelihood-ratio tests.

     

    Other diagnostics

     

    rvfplot       Graph residual-versus-fitted plot

    rvpplot      Graph residual-versus-predictor plot

    ovtest        Perform Ramsey RESET test for omitted variable test

    dwstat       Compute Durbin-Watson d statistic if the data is declared as time series

    vif             Calculate VIFs (variance inflation factors)

     

     

     

    ---

    Quick Example

     

    • Click File, Open, and choose a:\wage1.dta (dta is an extension for Stata data). Or you can type in a Stata Command line

     

    regress lwage edu exp expsq tenu union   ¿ (this is enter key)

     

    You just ran a wage regression based on Mincer’s human capital model. Or type,

     

    ivreg lwage (edu =married) exp expsq tenu union ¿

     

    You just used Two Stage Least Square method by using “married” as identifying variable. This is also called Instrumental Variable (IV) methods. Instrumented variable is “edu” and IV’s are married exp expsq tenu union. Or try,

     

    iis id ¿  

    tis year ¿

    xtreg lwage edu exp expsq tenu union, fe

     

    You just used Fixed Effect (FE) technique. Or try

     

    logit married edu exp lwage ¿

    or         probit married edu exp lwage ¿

     

    This is Logit and Probit model (please do not concern about the results). Or try

     

    cox tenu edu lwage union married ¿

     

    This is Cox hazard model.

    These examples show that you can estimate something i) if you know how to interact with Stata (open data, save and print out the result, create panel from cross section data, etc.) , iii) if you have a “nice” Stata data set, and iii) if you know estimation commands.

     

     

    Comments (0)

    You don't have permission to comment on this page.