| 
View
 

02 Factor Analysis and scale construction

Page history last edited by editor 13 years, 1 month ago

 


 

Factor Analysis & Scale Construction


 

Introduction

 

Factor analysis is used to describe variability among observed, correlated variables (such as V1, V2, V3) in terms of a potentially lower number of unobserved variables called factors (F1 = {V1, V2, V3}). In other words, it is possible, for example, that variations in three or four observed variables mainly reflect the variations in fewer unobserved variables. Factor analysis searches for such joint variations  (ex: highly correlated V1, V2 and V3) in response to unobserved latent variables. The observed variables are modeled as linear combinations of the potential factors, plus "error" terms.1

 

But why?

The information gained about the interdependencies between observed variables can be used later to reduce the set of variables in a data set.1  

 

 

Types of Factor Analysis

 

  • Exploratory factor analysis- When there is no prior theory, one uses factor loadings to develop the factor structure of the data

 

  • Confirmatory factor analysis- When there is a prior theory, the method seeks to determine if the number of factors and the loadings of measured (indicator) variables on them conform to this pre-established theory; indicator variables are selected based on this theory and see if they load as predicted.

 

Types of Factoring

 

  • Principal component analysis (PCA) - It seeks a linear combination of variables with high variance. It removes this variance and then seeks a second linear combination with the maximum proportion of the remaining variances, and so on. This results in uncorrelated factors.
    • Can be used to reduce the # of variables
    • One use - to simplify a regression analysis by reducing the # of predictor variables 

 

  • Others1

 

 

Application

 

  • Used where there is a large quantity of data: behavioral sciences, social sciencesmarketingproduct managementoperations research, and other applied sciences
  • When you have a survey of questions that often ask about a similar idea or concept from different angles: ex: How often you read bible? & How religious are you? So essentially responses to these questions would be represented by a latent variable or a Factor, which would then be used for further analysis (such as regression analysis).

 

Demonstrative Example:

 

Theory: 'Intelligence' is of 2 kinds: 1) Verbal intelligence and 2) Mathematical intelligence

 

  • Intelligence = ( A * verbal intelligence) + ( B * mathematical intelligence)
  • Here:
    • Intelligence is a linear combination of 2 factors: verbal and mathematical intelligence
    • A & B are factor loadings, which are based on theory   

 

 

Methodology

 

Considerations:3

  • Factor analysis needs large samples and it is one of the only draw backs
  • The more reliable the correlations are the smaller the number of subjects needed  
  • Need enough subjects for stable estimates:

    • 50 very poor, 100 poor, 200 fair, 300 good, 500 very good and 1000+ excellent

    • Shoot for minimum of 300 usually

    • More highly correlated markers less subjects

 

STEPS
  1. Preparation of appropriate correlation matrix
  2. Extraction of initial (orthogonal) factors 
  3. Rotation to terminal solution
  4. Use Cronbach's Alpha to measure internal consistency of the groups
  5. Scale Construction

 

Step 1: Preparation of appropriate correlation matrix

This helps us to get a sense of which variables are correlated and will potentially go in a factor together. So run a simple correlation on STATA.

 

. cor rrtherm gayther2 schlpray relipoli relidivi bible abortion adjmoral tradf

(obs=987)

 

             |  rrtherm gayther2 schlpray relipoli relidivi    bible abortion

-------------+---------------------------------------------------------------

     rrtherm |   1.0000

    gayther2 |   0.1916   1.0000

    schlpray |   0.3432   0.2489   1.0000

    relipoli |   0.3660   0.0872   0.1611   1.0000

    relidivi |   0.3950   0.1131   0.2510   0.6198   1.0000

       bible |   0.4265   0.2819   0.3481   0.2465   0.3144   1.0000

    abortion |   0.3697   0.2997   0.2395   0.2744   0.3145   0.4186   1.0000

    adjmoral |   0.0139   0.1351   0.0699   0.0958   0.1123   0.1233   0.1397

    tradfami |   0.3400   0.2661   0.2870   0.1209   0.1981   0.3247   0.2772

 

             | adjmoral tradfami

-------------+------------------

    adjmoral |   1.0000

    tradfami |   0.1627   1.0000

 

Step 2: Extraction

Extraction Methods:

 

Principal Component Analysis v Factor Analysis3

 

PCA

FA

  • The goal is to extract as much variance with the least # of factors

  • Gives a unique solution

  • The goal is to explain as much of correlations with the least # of factors

  • Can give multiple solutions (depending on the method of estimates of covariance)

  • Begins with 1s in the diagonal of the correlation matrix

  • Analyzes (extracts) all variance (each variable giving equal weight)

  • Outputs inflated communality estimate

  • Reproduces R matrix (near perfectly)

  • Begins with a covariance estimates in the diagonal

  • Analyzes only covariance

  • Outputs a more realistic covariance/ communality estimate

  • A close approximation to the R matrix

 

Supplementary notes:

Maximum Likelihood

    • Computationally intensive method for estimating loadings that maximize the likelihood (probability) of the correlation matrix.

Unweighted least squares

    • Ignores diagonal and tries to minimize off diagonal residuals
    • Communalites are derived from the solution
    • Originally called Minimum Residual method (Comrey)

 

 

Now, perform Principal Component Factors Analysis in STATA to determine Eigen Values.

  • Eigenvalues: The eigenvalue for a given factor measures the variance of all variables accounted by that factor. (So it looks at the variance in V1 V2 V3... of Factor 1, and so on).1

    • Low eigenvalue = contributes little to explaining variances in the variables; can be ignored

    • Here EV<1 is ignored => we end up with 4 factors

 

STATA Example:4

factor rrtherm gayther2 schlpray relipoli relidivi bible abortion adjmoral tradfami tolmoral lifestyl relguid biblread, pcf

(obs=737)

            (principal component factors; 4 factors retained)

  Factor     Eigenvalue     Difference    Proportion    Cumulative

------------------------------------------------------------------

     1        3.54723         2.08944      0.2729         0.2729

     2        1.45779         0.26615      0.1121         0.3850

     3        1.19164         0.08519      0.0917         0.4767

     4        1.10645         0.16162      0.0851         0.5618

     5        0.94483         0.09620      0.0727         0.6345

     6        0.84863         0.17768      0.0653         0.6997

     7        0.67095         0.00861      0.0516         0.7513

     8        0.66234         0.01525      0.0509         0.8023

     9        0.64709         0.11167      0.0498         0.8521

    10        0.53542         0.02511      0.0412         0.8933

    11        0.51031         0.01792      0.0393         0.9325

    12        0.49239         0.10746      0.0379         0.9704

    13        0.38493            .         0.0296         1.0000

 

Step 3: Rotation to terminal solution 

 

3.1) Rotation

Rotation is used to improve interpretability and utility.

 

Orthogonal rotation v Oblique rotation:3

  • Orthogonal rotation - keeps factors uncorrelated while increasing the meaning of the factors
    • Varimax method - spreads the variance from first (largest) factor to other smaller factors; commonly used
    • Quartimax method  - opposite of varimax; not used often
    • Equamax method - hybrid; not used often
    • Direct Oblimin

  • Oblique rotation - allows the factors to correlate leading to a conceptually clearer picture, but making it difficult to explain
    • Promax method (used here)
      • Most recommended
      • First, solution is rotated maximally with an orthogonal rotation
      • Then, it is rotated by oblique rotation 
      • Orthogonal loadings are raised to powers in order to drive down small loadings
      • Simple structure is reached 
      • Easy and quick method

 

STATA Example:4


. rotate, promax(3.0)

            (promax rotation)

               Rotated Factor Loadings

    Variable |      1          2          3          4    Uniqueness    LOADING

-------------+------------------------------------------------------      RANK 

     rrtherm |   0.35220   -0.45493   -0.22869   -0.08927    0.47332        12 

    gayther2 |   0.47267    0.03895    0.28206    0.03723    0.57587        11

    schlpray |   0.60145   -0.03276   -0.12727    0.06871    0.65833        9 

    relipoli |  -0.14698   -0.92299    0.08464    0.04728    0.24303        1

    relidivi |  -0.02670   -0.88084    0.06523    0.04814    0.26397        2

       bible |   0.32669    0.00182   -0.10869   -0.52665    0.49348        10

    abortion |   0.21779   -0.20869    0.05076   -0.36013    0.61285        13 

    adjmoral |  -0.08198   -0.04140    0.80104   -0.10148    0.34123        5

    tradfami |   0.79735    0.07873   -0.04639    0.06473    0.44421        6

    tolmoral |   0.20651   -0.08625    0.72051    0.04116    0.38490        7

    lifestyl |   0.69668    0.11100    0.21698    0.02228    0.47731        8

     relguid |  -0.05968    0.06701    0.02265   -0.82168    0.39178        4

    biblread |  -0.12333    0.03256    0.08085   -0.85363    0.33660        3

 

Then, rank the loadings manually by going through each Variable and seeing to which Factor that variable has the highest loading (shown in the right-most column)

 

 

3.2) Retaining 10 of the 13 highest loadings

Choose top 10 from above and do the factor command again.

 

. factor schlpray relipoli relidivi bible adjmoral tradfami tolmoral lifestyl relguid biblread, pcf

(obs=818)

            (principal component factors; 4 factors retained)

  Factor     Eigenvalue     Difference    Proportion    Cumulative

------------------------------------------------------------------

     1        2.77313         1.42808      0.2773         0.2773

     2        1.34505         0.21001      0.1345         0.4118

     3        1.13504         0.06176      0.1135         0.5253

     4        1.07328         0.21546      0.1073         0.6327

     5        0.85782         0.13578      0.0858         0.7184

     6        0.72204         0.09339      0.0722         0.7906

     7        0.62865         0.08095      0.0629         0.8535

     8        0.54770         0.03238      0.0548         0.9083

     9        0.51532         0.11334      0.0515         0.9598

    10        0.40197               .      0.0402         1.0000

 

3.3) Un-rotated Factor Matrix Cut

Now rotate again.

 

. rotate, promax(3.0)

 

            (promax rotation)

               Rotated Factor Loadings

    Variable |      1          2          3          4    Uniqueness

-------------+------------------------------------------------------

    schlpray |   0.10000   -0.06717   -0.14650   -0.51329    0.67726

    relipoli |  -0.00130   -0.90470    0.01729    0.06921    0.20176

    relidivi |  -0.00401   -0.87883    0.00045   -0.04381    0.21115

       bible |   0.65210   -0.01686   -0.10814   -0.18663    0.46866

    adjmoral |   0.04488    0.00862    0.81824   -0.00959    0.31408

    tradfami |  -0.06184    0.01955   -0.03455   -0.85194    0.31964

    tolmoral |   0.01322   -0.03281    0.76272   -0.08360    0.37909

    lifestyl |  -0.04870    0.02847    0.21189   -0.76514    0.35407

     relguid |   0.74278    0.02104    0.04740    0.00595    0.44602

    biblread |   0.85764   -0.00304    0.07817    0.14983    0.30174

 

Choosing variables with highest values for each factor:

  • Factor 1:
    • bible
    • relguid
    • biblread
  • Factor 2:
    • relipoli
    • relidivi
  • Factor 3:
    • adjmoral
    • tolmoral
  • Factor 4:
    • schlpray
    • tradfam
    • lifestyl

 

 

Step 4: Use Cronbach's Alpha to measure internal consistency of the groups

Cronbach's Alpha is a measure of internal consistency. Now use STATA command to calculate Cronbach's Alpha (scale reliability coefficient) for each group of variables identified above.

 

Quick pointers:3

  • Higher the alpha, higher the internal consistency.
  • As a rule of thumb, alpha > 0.80  are normally considered to have strong internal consistency
  • Alpha can be interpreted as:
    • It is the correlation between the present variables (in group or factor) and all other possible variables measuring the same thing;
      => higher the correlation --> more reliable
    • It is the squared correlation between the observed score (score in a particular scale) & the true score (score one would have obtained in all possible items in the universe);
      => higher the correlation --> more reliable

 

STATA command:

. alpha v1 v2 v3

 

STATA Example:4

. alpha bible relguid biblread

Test scale = mean(unstandardized items)

Average interitem covariance:     .2805805

Number of items in the scale:            3

Scale reliability coefficient:      0.6226

 

. alpha relipoli relidivi

Test scale = mean(unstandardized items)

Average interitem covariance:     .1518385

Number of items in the scale:            2

Scale reliability coefficient:      0.7612

 

. alpha adjmoral tolmoral

Test scale = mean(unstandardized items)

Average interitem covariance:     .5710242

Number of items in the scale:            2

Scale reliability coefficient:      0.4842

 

. alpha schlpray tradfami lifestyl

Test scale = mean(unstandardized items)

Average interitem covariance:     .4006161

Number of items in the scale:            3

Scale reliability coefficient:      0.6232

 

 

Step 5: Scale Construction

When there are multiple variables that explain a same concept, or factor in our case, it is much more reliable to have that one factor in our model, rather than to have multiple individual variables. Mostly because the latter can lead to problems of multicollinearity and other complicated results.2

 

In scale construction, we must consider:2

  • What items belong in the scale?
  • How reliable is the scale?
  • Need to examine theoretical concerns
  • Need to also look at empirical results

 

Example:4

. generate float Factor1= (2*(bible - 1))+(2*(relguid - 1))+(biblread - 1)

(346 missing values generated)

 

. generate float Factor2= relipoli+relidiv

(120 missing values generated)

 

. generate float Factor3= (tradfami-1)+(lifestyl-1)

(23 missing values generated)

 

. generate float Factor4= (adjmoral-1)+(tolmoral-1)

(23 missing values generated)

 

 

 

 

 

 

 

Reference


(1) Wikipedia.org     

(2) http://nd.edu/~rwilliam/stats2/l23.pdf

(3) Ainsworth PSY 524 Factor Analysis [PPT]

(4) Course Materials from Quantitative Analysis II 2011. Alan Yang. Columbia-SIPA.

Comments (0)

You don't have permission to comment on this page.