factor analysis is a broad term for multivariate statistical methods used to identify common underlying variables called factors within a larger set of measures. basically, factor analysis determines which variables group or go together. a factor is a group of related variables representing an underlying domain or theme. factors are indicated by shared variance (covariances) among two or more variables. as variables are reduced to factors by inter-item correlational statistical analysis, relationships between the factors begin to emerge in the variables they represent. the observed variables are then modeled as linear combinations of the factors.
factor analysis empirically explores the interrelationships and dimensions among variables to cluster inter-correlated variables into smaller sets of basic factors. it reduces the number of variables and also classifies variables by exploring the underlying theoretical structure(s). the process simplifies data and eliminates redundant variables, unclear variables, and irrelevant variables.
factor analysis is commonly used in psychometric instrument development. for example, psychological questionnaires often aim to operationalize abstract psychological constructs, with multiple empiric indicators on the questionnaire measuring each construct (in order to enhance reliability and validity). factor analysis statistical techniques have been utilized in the formation and verification of tens of thousands of psychological screening and measurement tests.
observed variable: measured directly; a measured variable, an indicator, or a manifest variable.
latent construct: measured indirectly by determining its influence to responses on observed variables; a factor, underlying construct, or unobserved variable.
factor scores: estimates of underlying latent constructs.
eigenvalues: amount of variance explained by each each factor.
orthogonal: 90 degree angle, perpendicular
oblique: other than a 90 degree angle
the main steps in factor analysis are: assessment of the suitability of data for factor analysis (meeting assumptions and assessing the correlation matrices), factor extraction, factor retention, and factor rotation/ interpretation.
assumptions of factor analysis:
all variables at least on ordinal scale. nominal data not appropriate for factor analysis.
overall sample size 150+ or ratio 20:1 (cases per variable).
random sampling; normal distribution.
linear relations among the variables.
little to no outliers.
factors independent of one another (no correlation).
there should be at least 5 salient variables for each factor.
no measurement error.
assessing the correlation matrix:
correlations of r
= .3 to .8. initial communalities > .6. bartlett’s test of sphericity statistically significant at p
< .05. kaiser-meyer-olkin (kmo) measure of sampling accuracy value .6 or above. individual measures of sampling adequacy (msa) > .7. together, these tests suggest that the matrix is factorable.
this phase condenses items in the data matrix into a smaller number of factors. it is used to define the number of underlying dimensions. it results in an unrotated factor matrix, which contains coefficients or weights for all original items on each extracted factor. two widely used factor extraction techniques are principal components analysis (pca) and common factor analysis.
according to costello and oborne (2005), common factor analysis is preferable to principal components analysis (pca), which is the “default” data reduction method in spss. principle components analysis transforms variables into smaller sets of linear combinations with all the variance in the variables utilized. this analysis technique can produce inflated values of variance in the components. common factor analysis utilizes only the shared variance amongst the items. each indicator is typically linked to only one factor, since cross-loadings suggest poorer construct validity.
besides the “default” pca, spss has six factor extraction methods from which to choose: unweighted least squares, generalized least squares, maximum likelihood, principal axis factoring, alpha factoring, and image factoring. costello and osborne recommend the common factor analysis methods of maximum likelihood extraction (for normally distributed data) or principal axis factoring (for non-normal data), over principal components analysis.
: there are a number of techniques used to determine which factors to retain: kaiser’s criterion, scree test, and parallel analysis. kaiser’s criterion retains all factors with eigenvalues
greater than 1.0. this is considered one of the least accurate methods for deciding which factors to retain. the scree plot is a two dimensional graph with factors on the x-axis and eigenvalues
on the y-axis. the number of datapoints above the natural bend or “elbow” are retained. generally, this graph yields good results if there is a clear "break" in the plot of eigenvalues
at the “elbow.” costello and osborne (2005) contend that the scree test is the best choice for researchers. pallant (2007), on the other hand, states that parallel analysis is the best approach to deciding the number of factors. parallel analysis compares the size of the eigenvalues
with those obtained from a randomly generated data of the same size. only those eigenvalues
that exceed the corresponding values are retained.
factor rotation and interpretation:
rotation refers to the shifting of the factors in the "factor space" to maximize the interpretation of the factor loadings. the goal of rotation is to simplify, clarify, and interpret the data by either orthogonal or oblique methods of rotation. researchers typically evaluate rotated rather than unrotated factor loadings for the obvious reason that they are easier to interpret.
spss offers a variety of factor extraction choices: rotated factor matrix
after orthogonal rotation (varimax), or pattern matrix
after oblique rotation (oblimin) (costello & osborne, 2005).
if the factors are truly independent and uncorrelated, orthogonal and oblique rotation methods produce essentially identical results and interpretations, but the orthogonal (varimax) method is preferred. if the factors are related, with some correlations, the oblique (oblimin) method is preferred. pallant (2007) suggests that the researcher start with oblimin rotation, as it provides information about the degree of correlation between the factors.
exploratory factor analysis versus confirmatory factor analysis
exploratory factor analysis (efa) and confirmatory factor analysis (cfa) are two statistical approaches used to examine the internal reliability of a measure.
exploratory factor analysis
exploratory factor analysis (efa) explores and summarizes underlying correlational structure for a data set. it is used in the early stages of research
to explore interrelationships among variable sets
, to find the model. efa is complex, has little absolute guidelines, and many application choices. this factor analysis approach is used when the underlying structure of a data set is unknown, to determine which domains comprise a construct of interest. in efa, the researcher explores how many factors there are, whether the factors are correlated, and which observed variables appear to best measure each factor. efa can reduce a large set of variables to a couple of underlying factors.
exploratory factor analysis is used to discover the factor structure of a measure and to examine its internal reliability. efa is recommended when researchers have no preconceived hypotheses or prior theory about the nature of the underlying factor structure of their measure. as such, it is an inductive approach using the factor loadings to uncover the factor structure of the data. since efa is exploratory
in scope, there are no inferential statistical processes. efa is not appropriate to use for testing
hypotheses or theories, but only to clarify and describe relationships. efa is subject to error and a wide variation of subjective interpretations, even with optimal data and large samples. this method is as much an “art” as it is a “science.”
exploratory factor analysis has three basic decision points: (1) deciding the number of factors, (2) choosing an extraction method, and (3) choosing a rotation method. efa then hypothesizes the underlying construct, the latent structure (dimensions) of a set of variables.
confirmatory factor analysis
confirmatory factor analysis (cfa) is a set of more complex and sophisticated statistical techniques used later in the research process
the hypotheses or theories concerning the underlying structure generated by efa. it is a hypotheses testing approach
, used to test the model. confirmatory factor analysis tests the correlational structure of a data set against the hypothesised structure and rates the “goodness of fit.” cfa tests hypotheses that state the number of factors representing data and the items comprising each factor. in cfa, the researcher specifies a certain number of factors, which factors are correlated, and which observed variables measure each factor.
cfa seeks to determine if the number of factors conform to what is expected on the basis of pre-established theory. indicator variables are selected on the basis of prior theory and factor analysis is used to see if they load as predicted on the expected number of factors. a minimum requirement of confirmatory factor analysis is that the researcher hypothesizes beforehand the number of factors in the model. he or she should also posit expectations about which variables will load on which factors. the researcher seeks to determine, for instance, if measures created to represent a latent construct really belong together.
exploratory factor analysis uses a correlation
matrix to see if any variables are components of factors. shared variance
indicates an underlying factor. the eigenvalue
shows the amount
of variance (a type of effect size). factors with larger eigenvalues
account for greater variance as opposed to factors with lower eigenvalues
. as discussed earlier, inferential statistics should not be used in efa. the statistical problem is not
one of testing a given hypothesis, but rather one of fitting the model to the data to decide where common variances are. the dimensionality of a set of items emerges empirically.
confirmatory factor uses a variance-covariance matrix
to test hypotheses with inferential statistical techniques, using an advanced class of statistical techniques referred to as structural equation modeling
(sem). in sem, the most frequently used estimation procedure is maximum likelihood estimation
. cfa tests a measurement model by testing correlations (between observed and latent variables, pairs of latent variables, and among the errors). the researcher can compare the estimated matrices representing the relationships between variables in the model to the actual matrices. the researcher specifies a hypothesis by designating certain parameters in the factor matrices. the hypothesis is confirmed to the extent that the model still fits.
researcher information to report
when a researcher reports the results from factor analysis, he or she needs to include the following information: verification of assumptions; details of the method of factor extraction used; criteria used to determine details of the method of factor extraction used; criteria used to determine the number of factors retained; type of rotation technique used; total variance explained; initial eigenvalues
after rotation; and a table of loadings showing all values.
an example follows.
presenting the results from factor analysis
(pallant, 2007, p. 197-198)
the 20 items of the positive and negative affect scale (panas) were subjected to principle components analysis (pca) using spss version 15. prior to performing pca, the suitability of data for factor analysis was assessed. inspection of the correlation matrix revealed the presence of many coefficients of .3 and above. the kaiser-meyer-olkin value was .87, exceeding the recommended value of .6 (kaiser 1970, 1974) and bartlett’s test of sphericity (bartlett 1954) reached statistical significance, supporting the factorability of the correlation matrix.
principle components analysis revealed the presence of four components with eigenvalues exceeding 1, explaining 31.2%, 17%, 6.1%, and 5.8% of the variance respectively. an inspection of the screeplot revealed a clear break after the second component. using catell’s (1966) scree test, it was decided to retain two components for further investigation. this was further supported by the results of parallel analysis, which showed only two components with eigenvalues exceeding the corresponding criterion values for a randomly generated data matrix of the same size (20 variables x 435 respondents).
the two-component solution explained a total of 48.2% of the variance, with component 1 contributing 31.25% and component 2 contributing 17.0%. to aid in the interpretation of these two components, oblimin rotation was performed. the rotated solution revealed the presence of simple structure (thurstone 1947), with both components showing a number of strong loadings and all variables loading substantially on only one component. the interpretation of the two components was consistent with previous research on the panas scale, with positive affect items loading strongly on component 1 and negative affect items loading strongly on component 2. there was a weak negative correlation between the two factors (r = -.28). the results of this analysis support the use of the positive affect items and the negative affect items as separate scales, as suggested by the scale authors (watson, clark & tellegen 1988).
costello, a. b., & osborne, j. w. (2005). best practices in exploratory factor analysis: four recommendations for getting the most from your analysis. practical assessment research & evaluation, 10
o’brien, k. (2007). factor analysis: an overview in the field of measurement. physiotherapy canada, 59
pallant, j. (2007). spss survival manual.
new york: mcgraw-hill education.
polit, d. f., & beck, c. t. (2008). nursing research: generating and assessing evidence for nursing practice
(8th ed.). philadelphia: wolters kluwer health.