The Statistical Resources Thread

VickyRN, MSN, DNP, RN · Mar 23, 2009

T-tests and analysis of variance (ANOVA) are widely used statistical methods to compare group means. Both are parametric statistical techniques, in that these tests involve a number of assumptions, including: normally distributed population; dependent variable measured on continuous interval or ratio level; random sampling of data; observations must be independent of one another; and homogeneity of variance (population means may differ, but all populations should have the same standard deviation). The independent variable is categorical.

Both t-tests and analysis of variance (ANOVA) procedures are used to test hypotheses - by means of the null hypothesis and alternative hypothesis. The researcher asks: Does the observed variation represent a real difference between the two populations, or just a chance difference in the samples? The null hypothesis asserts that there is no difference between the population groups and that any observed variation is due to chance alone. The rival hypothesis is the alternative (research) hypothesis, which asserts that an observed effect is genuine.

Assuming that the null hypothesis is true, what is the probability of obtaining the observed value for the test statistic? Statistical significance (p value £ 0.5) is a possible finding of both the t-test statistic and F-ratio statistic. This would indicate that the sample is unlikely to have occurred by chance. Therefore, the null hypothesis would be rejected, and the alternative hypothesis supported.

The t-test is used to test differences in means between two groups. The t-test is used when the dependent variable is a continuous interval/ratio scale variable (such as total self-esteem) and the independent variable is a two-level categorical variable (such as gender). The t-test can be used even if sample sizes are very small, as long as the variables within each group are normally distributed and the variation of scores within the two groups is equal (no reliable differences). With the t-test, the test statistic used to generate p values has a Student's t distribution with n-1 degrees of freedom.

The statistical t-test procedure is used to determine a p-value that indicates how likely the results would be obtained by chance. If there is £ 5% chance of getting the observed differences by chance, the null hypothesis is rejected because a statistically significant difference was found between the two groups.

The t-test can be used with two independent groups (independent samples t-test) and when the sample is paired or dependent (paired samples t-test). Independent samples are usually two groups chosen by random selection. Dependent samples are two groups matched on some variable (such as gender or age) or the same group being tested twice (repeated measures).

The two sample t-test simply tests whether or not two independent populations have different mean values on some measure. An example of an independent samples t-test is evaluating differences in test scores between a group of patients who were given a treatment intervention and a control group who received a placebo. An example of a paired samples t-test is computing differences in tests scores on the same sample of patients using a pretest-posttest design (such as measuring pretreatment and posttreatment cholesterol levels).

Whereas statistical significance determines how likely an observed finding occurred by chance, effect size measures the strength of relationship between two variables. Effect size is a population effect and its indices are independent of sample size. The effect size statistic for the independent-samples t-test is either Cohen's d or eta squared. The effect size (d) is the difference between the two population means, divided by the estimated population standard deviation. The formula for eta squared = t2 / t2 + (N12 + N2-2).

To ascertain how precise is the estimate of effects (for instance, the mean), a confidence interval (CI) is formulated. The CI is constructed around a sample mean or another statistic to establish a range of values for the unknown estimated population parameter (mean or mean difference), as well as the probability of being right (the degree of confidence for this estimate). The 95% or 99% CI is most commonly used.

When a researcher reports the results from an independent or paired-samples t-test, he or she needs to include the following information: verification of parametric assumptions; dependent variable scores; independent variable, levels; statistical data: significance, t-scores, probability, group means, group standard deviations, mean differences, confidence intervals, and effect size. Examples are below.

Presenting the results for independent-samples t-test

An independent-samples t-test was conducted to compare the sleepiness scores for males and females. There was no significant difference in scores for males (M = 31.04, SD = 2.36) and females (M = 34.53, SD = 3.22); t (588) = 1.62, p = .14 (two-tailed). The magnitude of the differences in the means (mean difference = 3.49, 95% CI: -1.80 to 1.87) was very small (eta squared = .008).

Presenting the results for paired-samples t-test

A paired-samples t test was conducted to evaluate the impact of the intervention on students' scores on the Fear of Statistics Test (FOST). There was a statistically significant decrease in FOST scores from Time 1 (M = 39.16, SD = 4.25) to Time 2 (M = 35.55, SD = 4.35), t (32) = 5.12, p

While the t-test is used to compare the means between two groups, ANOVA is a statistical procedure used to compare means between three or more groups. Analysis of variance (ANOVA), despite its name, is concerned with differences between means of groups, not differences between variances. The term analysis of variance comes from the way the procedure uses variances to decide whether the means are different.

The ANOVA statistical procedure examines what the variation (difference) is within the groups (SSw), then examines how that variation translates into variation between the groups (SSb), taking into account how many subjects there are in the groups (degrees of freedom). If the observed differences are greater than what is likely to occur by chance, then there is statistical significance.

The statistic computed in ANOVA to generate p-values is the F-ratio, the ratio of the mean of the squares between to the mean of the squares within: F = MSb/ MSw (each of the means = SS/ df). Like the t, F depends on degrees of freedom to determine probabilities and critical values. The F statistic and the p-value depend on the variability of the data within groups and the differences among the means.

The null hypothesis for ANOVA is that the population mean (average value of the dependent variable) is the same for all groups. In other words, there are no differences among the group means. The alternative hypothesis is that the average is not the same for all groups. A significant F test means the null hypothesis is rejected - the population means are not equal. When the null hypothesis is true, the F-ratio is approximately 1. When the alternative hypothesis is true, the F statistic tends to be large.

The F test is always one-sided because any differences among the group means tend to make F large. The ANOVA F test shares the robustness of the two-sample t test.

With ANOVA, if the null hypothesis is rejected, then it is known that at least two groups are different from each other. It is not known specifically which of the groups differ. In order to determine which groups differ, post-hoc t-tests are performed using some form of correction (such as the Bonferroni correction) to adjust for an inflated probability of a Type I error (false positive conclusion).

Effect size for ANOVA is determined by estimating eta squared. Eta squared is calculated by dividing the sum of squares between (SSb) by the total sum of squares (SSt) and it indicates the proportion of variance explained in ANOVA.

There are several varieties of ANOVA, such as one-factor (or one-way) ANOVA or two-factor (or two-way) ANOVA. The factors are the independent variables, each of which must be measured on a categorical scale. The levels of the independent variable (factor) define the separate groups.

The one-way ANOVA is used with an interval or ratio level continuous dependent variable, and a categorical independent variable (factor) that has two or more different levels. The levels correspond to different groups or conditions. There are two different types of one-way ANOVA: between groups ANOVA (comparing two or more different groups; independent design), and repeated measures ANOVA (one group of subjects exposed to two or more conditions; within-subjects design).

An example of one-way between groups ANOVA is a research study comparing the effectiveness of four different dosage regimens of the same antidepressant medication on depression scores. A questionnaire that measures depression is given to participants in the four different intervention groups.

When a researcher reports the results from a one-way between groups ANOVA or repeated measures ANOVA, he or she needs to include the following information: verification of parametric assumptions; dependent variable scores; independent variable, levels; statistical data: significance, F-ratio scores, probability, group means, group standard deviations, mean differences, confidence intervals, effect size, and post-hoc comparisons. An example is below.

Presenting the results from one-way between groups ANOVA with post-hoc tests (Pallant, 2007, p. 248)

A one-way between groups analysis of variance was conducted to explore the impact of age on levels of optimism, as measured by the Life Orientation Test (OT). Subjects were divided into three groups according to their age (Group 1: 29 yrs or less; Group 2: 30 to 44 yrs; Group 3: 45 yrs and above). There was a statistically significant difference at the p

References

Moore, D. S., & McCabe, G. P. (2003). Introduction to the practice of statistics (4th ed.). New York: W. H. Freeman and Company.

Pallant, J. (2007). SPSS survival manual. New York: McGraw-Hill Education.

Polit, D. F., & Beck, C. T. (2008). Nursing research: Generating and assessing evidence for nursing practice (8th ed.). Philadelphia: Wolters Kluwer Health.

VickyRN, MSN, DNP, RN · Mar 23, 2009

Mixed between-within subjects ANOVA (also known as a split-plot ANOVA) combines two different types of one-way ANOVA into one study: between-groups ANOVA and within-subjects ANOVA. Thus, in a mixed-design ANOVA model, one categorical independent variable is a between-subjects variable and the other categorical independent variable is a within-subjects variable. The mixed-design ANOVA model is used to test for differences between two or more independent groups while subjecting participants to repeated measures. The dependent variable is continuous (measured at the ratio or interval level) and is measured for each group across each level of the repeated factor.

The mixed ANOVA design is unique because there are two factors, one of which is repeated. Since the mixed design employs both types of ANOVA, a brief review of between-groups ANOVA and within-subjects ANOVA is in order:

One-way between-groups ANOVA consists of different subjects or cases in each group - an independent group design. There is one independent (grouping) variable with three or more levels (groups) and one dependent continuous variable. There is only one independent categorical variable with different subjects or cases in each of the groups.

One-way within-subjects ANOVA, also known as repeated-measures ANOVA, measures the same subjects at different points of time or under different conditions, and is a dependent group design. This type of ANOVA is used when the subjects encounter repeated measures (i.e., the same subjects are used for each treatment). All subjects participate in all conditions of the research experiment. Each subject responds to every level of the repeated factor, but to only one level of the nonrepeated factor. The participants serve as their own control because they are involved in both the treatment and control groups. The within-subjects design should only be used when the two sets of scores represent measures of exactly the same thing. Therefore exactly the same test needs to be given at both times or under both conditions to all participants.

These two different approaches, of course, could be calculated separately. Often it is more efficient to combine both types of ANOVA into one analysis and study the two factors simultaneously rather than separately. Interactions between factors can also be investigated with this mixed between-within ANOVA design.

Since this mixed-type ANOVA involves two independent variables, it is a type of two-way ANOVA. Two-way (or two factor) ANOVA is used to test the relationship between two categorical independent variables and one continuous dependent variable. With two independent variables, three hypotheses, or main effects, are being tested. Two-way ANOVA introduces a concept not known in one-way analysis: interaction. Interaction refers to the way in which a category of one independent variable combines with a category of the other independent variable to produce an effect on the dependent variable that goes beyond the sum of the separate effects. It questions whether the effect of one independent variable is consistent for all levels of a second independent variable. Interaction is a feature common in both experimental and observational studies.

An example of mixed between-within subjects ANOVA is a study investigating the impact of an intervention on participants' depression symptoms (using pre-test and post-test design), but also investigating whether the impact varies for gender (males and females). In this case, there are two independent variables: gender (between-subjects variable) and time (within-subjects variable). The researcher would perform the intervention on both groups of males and females, and then measure their depression symptoms over time (Time 1 = pre-intervention and Time 2 = after the intervention).

Assumptions: same assumptions as with t-tests and one-way ANOVA, plus homogeneity of inter-correlations.

There are three null hypotheses in mixed between-within subjects ANOVA. F statistics and p-values are used to test hypotheses about the main effects and the interaction. An F-statistic is computed to test for between-subject effect. Another F-statistic is computed to test for within-subjects effect or time factor. This statistic indicates whether, across the groups, the dependent variable differs over time. Finally, an interaction effect is tested to determine whether group differences vary across time. The test for interaction should be examined first, since the presence of a strong interaction may influence the interpretation of main effects. Plots are a useful aid.

The effect size for mixed between-within subjects ANOVA is calculated by the partial eta squared statistic:

Small effect .01

Moderate effect .06

Large effect .14

When a researcher reports the results from a mixed between-within subjects ANOVA, he or she needs to include the following information: verification of parametric assumptions; verification of homogeneity of inter-correlations (Box's M statistic); verification of homogeneity of variances (Levene's Test of Equality of Error Variances); interaction effect (Wilks' Lambda); dependent variable scores; independent variables, levels; statistical data: significance, F-ratio scores, probability, means measured for each group across each level of the repeated factor, group standard deviations, and effect size. An example is below:

Presenting the Results from Mixed Between-Within ANOVA (Pallant, 2007, p. 274)

A mixed between-within subjects analysis of variance was conducted to assess the impact of two different interventions (Math skills, Confidence Building) on participants' scores on the Fear of Statistics Test, across three time periods (pre-intervention, post-intervention and 3-mth follow-up). There was no significant interaction between program type and time, Wilks Lambda = .87, F (2, 27) = 2.03, p = .15, partial eta squared = .13. There was a substantial main effect for time, Wilks Lambda = .34, F (2, 27) = 26.59, p

References

Moore, D. S., & McCabe, G. P. (2003). Introduction to the practice of statistics (4th ed.). New York: W. H. Freeman and Company.

Pallant, J. (2007). SPSS survival manual. New York: McGraw-Hill Education.

Polit, D. F., & Beck, C. T. (2008). Nursing research: Generating and assessing evidence for nursing practice (8th ed.). Philadelphia: Wolters Kluwer Health.

VickyRN, MSN, DNP, RN · Mar 23, 2009

Analysis of covariance (ANCOVA) is a statistical technique that blends analysis of variance and linear regression analysis. It is a more sophisticated method of testing the significance of differences among group means because it adjusts scores on the dependent variable to remove the effect of confounding variables. ANCOVA is based on inclusion of additional variables (known as covariates) into the model that may be influencing scores on the dependent variable. (Covariance simply means the degree to which two variables vary together – the dependent variable covaries with other variables.) This lets the researcher account for inter-group variation associated not with the "treatment" itself, but from extraneous factors on the dependent variable, the covariate(s). ANCOVA can control one or more covariates at the same time.

The purpose of ANCOVA, then, is the following: to increase the precision of comparison between groups by reducing within-group error variance; and, to “adjust” comparisons between groups for imbalances by eliminating confounding variables.

In order to accurately identify possible covariates, one needs sufficient background knowledge of theory and research in the topic area. Ideally, there should only be a small number of covariates. Covariates need to be chosen carefully and should have the following qualities:

Continuous (at interval or ratio level, such as anxiety scores) or dichotomous (such as male/ female); reliable measurement; correlate significantly with the dependent variable; linear relationship with dependent variable; not highly correlated with one another (should not overlap in influence); and relationship with dependent variable the same for each of the groups (homogeneity of regression slopes).

Each covariate should contribute uniquely to the variance. The covariate must be measured before the intervention is performed. Correct analysis requires that the covariate not be influenced by the treatment – it therefore must be measured prior to treatment.

The independent variable is a categorical (nominal-level) variable.

ANCOVA tests whether certain factors have an effect on the outcome variable after removing the covariate effects. It is capable of removing the obscuring effects of pre-existing individual differences among subjects. It allows the researcher to compensate for systematic biases among the samples. The inclusion of covariates can also increase statistical power because it accounts for some of the variability.

Assumptions: same as ANOVA (normal distribution, homogeneity of variance, random sampling); relationship of the dependent variable to the independent variable(s) must be linear; dependent variables must be independent; regression lines must be parallel; normal distribution with means of zero; and homoscedasticity. The model assumes that the data in the two groups are well described by straight lines that have the same slope.

An example of ANCOVA is a pretest-posttest randomized experimental design, in which pretest scores are statistically controlled. In this case, the dependent variable is the posttest scores, the independent variable is the experimental/ comparison group status, and the covariate is the pretest scores.

With ANCOVA, the F-ratio statistic is used to determine the statistical significance (p £ .05) of differences among group means. Partial Eta Squared is used to determine effect size:

Small .01

Medium .06

Large .138

There are both one-way and two-way analyses of covariance.

When a researcher reports the results from analysis of covariance (ANCOVA), he or she needs to include the following information: verification of parametric assumptions; verification that covariate(s) measured before treatment; verification of reliability of the covariate(s); verification that covariates are not too strongly correlated with one another; verification of linearity; verification of homogeneity of regression slopes; dependent variable scores; independent variable, levels; covariate(s); statistical data: significance, F-ratio scores, probability, means, and effect size (partial eta squared). An example is below:

Presenting the results from one-way ANCOVA (Pallant, 2007, p. 303)

A one-way between-groups analysis of covariance was conducted to compare the effectiveness of two different interventions designed to reduce participants’ fear of statistics. The independent variable was the type of intervention (math skills, confidence building), and the dependent variable consisted of scores on the Fear of Statistics Test administered after the intervention was completed. Participants’ scores on the pre-intervention administration of the Fear of Statistics Test were used as the covariate in this analysis.

Preliminary checks were conducted to ensure that there was no violation of the assumptions of normality, linearity, homogenity of variances, homogeneity of regression slopes, and reliable measurement of the covariate. After adjusting for pre-intervention scores, there was no significant difference between the two intervention groups on post-intervention scores on the Fear of Statistics Test , F (1, 27) = .76, p = .39, partial eta squared = .03. There was a strong relationship between the pre-intervention and post-intervention scores on the Fear of Statistics Test, as indicated by a partial eta squared value of .75.

Presenting the results from two-way ANCOVA (Pallant, 2007, p. 310)

A 2 by 2 between-groups analysis of covariance was conducted to assess the effectiveness of two programs in reducing fear of statistics for male and female participants. The independent variables were the type of proram (math skills, confidence building) and gender. The dependent variable was scores on the Fear of Statistics Test (FOST), administered following completion of the intervention programs (Time 2). Scores on the FOST administered prior to the commencement of the programs (Time 1) were used as a covariate to control for individual differences.

Preliminary checks were conducted to ensure that there was no violation of the assumptions of normality, linearity, homogeneity of variances, homogeneity of regression slopes,and reliable measurement of the covariate. After adjusting for FOST scores at Time 1, there was a significant interaction effect. F (1, 25) = 31.7, p

In presenting the above results, the researcher should also provide a table of means for each of the groups.

References

Pallant, J. (2007). SPSS survival manual. New York: McGraw-Hill Education.

Polit, D. F., & Beck, C. T. (2008). Nursing research: Generating and assessing evidence for nursing practice (8th ed.). Philadelphia: Wolters Kluwer Health.

VickyRN, MSN, DNP, RN · Mar 23, 2009

regression analysis is a broad term for statistical techniques used to model and analyze numerical data consisting of values of a dependent variable y (also known as the response or predicted variable) and one or more independent variables x (also known as explanatory or predictor variables). regresion analysis is based on correlation. correlation examines the strength and direction of the linear relationship between two variables, but does not signify causation.

with linear regression models, dependent variables need to be continuous, measured at the interval or ratio level, with scores normally distributed. categorical dependent variables are not suitable for linear regression. the independent variables used in regression can be either continuous or dichotomous.

simple linear regression

simple linear regression studies the relationship between a dependent variable y and a single independent variable x. in doing so, it makes predictions about the values of the response variable y based on values of the explanatory variable x. the simple linear regression equation represents a straight line when the dependant variable is plotted against the independent variable.

the most common method for fitting a regression line is the method of least squares. this method calculates the best-fitting line for the observed data by minimizing the sum of the squares of the vertical deviations from the predicted line data. because they represent “left-over” variation in the response after fitting the regression line, these distances are called residuals. a residual is the difference between an observed value of the dependent variable and the value predicted by the regression line (residual = observed y - predicted y).

during linear regression analysis, the term linear does not refer to this straight line, but to the way the regression coefficients occur in the regression equation. a simple linear regression line has an equation of the form y = a + bx, where x is the independent variable and y is the dependent variable. the slope of the line is b, and a is the intercept (the value of y when x = 0).

multiple linear regression

multiple regression is a method of predicting a continuous dependent variable y on the basis of two or more independent variables x. it models the relationship by fitting a linear equation to observed data. every value of the independent variable x is associated with a value of the dependent variable y.

the model for multiple linear regression, given n observations, is

yi = 0 + 1xi1 + 2xi2 + ... pxip + i for i = 1,2, ... n.

the purpose of multiple regression is to learn more about the relationship between two or more independent variables x and a dependent variable y. it allows the researcher to investigate complex relationships among three or more variables. this statistical technique can establish that a set of independent variables explains a proportion of the variance in a dependent variable at a significant level (through a significance test of r square), and can establish the relative predictive importance of each of the independent variables (by comparing beta weights).

research examples using multiple regression:

predicting infant birth weight (dependent variable) based on three predictor variables: age of mother, smoking during pregnancy, and number of prior pregnancies.

predicting average gpa scores based on three predictor variables: average high school grades in mathematics, science, and english.

types of multiple regression

major types of multiple regression include standard (simultaneous), hierarchical (sequential), and stepwise.

with standard multiple regression, there are several independent variables simultaneously predicting the dependent variable. a set of variables is used to explain (as a group) the variance in a dependent variable. this is the most commonly used multiple regression analysis.

with hierarchical regression, the independent variables are entered into the equation in a specified order. each block of independent variables is assessed in its predictive strength of the dependent variable, while controlling for the contributions of any previous blocks. once all sets of variables are entered, the overall model is assessed. this method is more in depth.

assumptions of multiple regression:

variables measured without error.

random selection from the population of interest.

normality – residuals (predicted minus observed values) are distributed normally.

linearity – relationship between residuals and predicted dependent variables is a straight-line.

can only show strength of relationship (correlation) between variables, not causality.

sample size – 20 cases per independent variable or 40 cases per independent variable for stepwise regression.

very few outliers.

multicollinearity – no redundancy among the independent variables.

singularity – independent variables need to be mutually exclusive. they should not be subsets or combinations of other independent variables.

homoscedasticity – variance of residuals about predicted dependent variables same for all predicted scores.

the test statistic used to generate p-values is the f-ratio in which variability due to regression is contrasted with residual variability. a significance test of r square values demonstrates the percent of variance in the dependent variable explained by the overall model (set of independent variables as a whole).

effect size measures

effect size is shown by comparing the beta values for each independent variable, in relation to the variance of the model as a whole (r-square). beta weights represent the unique contribution of each independent variable to the r-square, after removing overlapping effects of all other variables.

when values of b (beta) are divided by the standard error, the result is a t-statistic, which indicates the significance of each predictor. beta is the average amount the dependent variable increases when the independent variable increases one standard deviation and other independent variables are held constant. the independent variable with the largest beta weight is the one that exerts the largest unique explanatory effect on the dependent variable. the betas will change if variables or interaction terms are added or deleted from the equation, but are not affected by the reordering of variables.

research information

when a researcher reports the results from a multiple regression statistical test, he or she needs to include the following information: type of analysis (standard or hierarchical); standardized beta values if the study was theoretical, or unstandardized beta coefficients with their standard errors if the study was applied; r square change values for each step and associated probability values (with hierarchical multiple regression); sample sizes; dependent variable, predictor variables, control variables; verification of assumptions (normality, linearity, multicollinearity, and homoscedasticity); r-square scores, f-statistic, p-values, significance; beta values p-values, and significance. an example follows.

presenting the results from hierchical multiple regression (pallant, 2007, p. 164):

hierarchical multiple regression was used to assess the ability of two control measures (mastery scale, perceived control of internal states scale: pcoiss) to predict levels of stress (perceived stress scale), after controlling for the influence of social desirability and age. preliminary analyses were conducted to ensure no violation of the assumptions of normality, linearity, multicollinearity, and homoscedasticity. age and social desirability were entered at step 1, explaining 6% of the variance in perceived stress. after entry of the mastery scale and pcoiss scale at step 2 the total variance explained by the model as a whole was 47.4%, f (4, 421) = 94.78, p

references

moore, d. s., & mccabe, g. p. (2003). introduction to the practice of statistics (4th ed.). new york: w. h. freeman and company.

pallant, j. (2007). spss survival manual. new york: mcgraw-hill education.

polit, d. f., & beck, c. t. (2008). nursing research: generating and assessing evidence for nursing practice (8th ed.). philadelphia: wolters kluwer health.

http://www.stat.yale.edu/courses/1997-98/101/linmult.htm

http://www.stat.yale.edu/courses/1997-98/101/linreg.htm

http://faculty.chass.ncsu.edu/garson/pa765/regress.htm

http://dss.princeton.edu/online_help/analysis/regression_intro.htm#smr

VickyRN, MSN, DNP, RN · Mar 28, 2009

factor analysis is a broad term for multivariate statistical methods used to identify common underlying variables called factors within a larger set of measures. basically, factor analysis determines which variables group or go together. a factor is a group of related variables representing an underlying domain or theme. factors are indicated by shared variance (covariances) among two or more variables. as variables are reduced to factors by inter-item correlational statistical analysis, relationships between the factors begin to emerge in the variables they represent. the observed variables are then modeled as linear combinations of the factors.

factor analysis empirically explores the interrelationships and dimensions among variables to cluster inter-correlated variables into smaller sets of basic factors. it reduces the number of variables and also classifies variables by exploring the underlying theoretical structure(s). the process simplifies data and eliminates redundant variables, unclear variables, and irrelevant variables.

factor analysis is commonly used in psychometric instrument development. for example, psychological questionnaires often aim to operationalize abstract psychological constructs, with multiple empiric indicators on the questionnaire measuring each construct (in order to enhance reliability and validity). factor analysis statistical techniques have been utilized in the formation and verification of tens of thousands of psychological screening and measurement tests.

definitions

observed variable: measured directly; a measured variable, an indicator, or a manifest variable.

latent construct: measured indirectly by determining its influence to responses on observed variables; a factor, underlying construct, or unobserved variable.

factor scores: estimates of underlying latent constructs.

eigenvalues: amount of variance explained by each each factor.

orthogonal: 90 degree angle, perpendicular

oblique: other than a 90 degree angle

the main steps in factor analysis are: assessment of the suitability of data for factor analysis (meeting assumptions and assessing the correlation matrices), factor extraction, factor retention, and factor rotation/ interpretation.

assumptions of factor analysis:

all variables at least on ordinal scale. nominal data not appropriate for factor analysis.

overall sample size 150+ or ratio 20:1 (cases per variable).

random sampling; normal distribution.

linear relations among the variables.

little to no outliers.

factors independent of one another (no correlation).

there should be at least 5 salient variables for each factor.

no measurement error.

assessing the correlation matrix:

correlations of r = .3 to .8. initial communalities > .6. bartlett’s test of sphericity statistically significant at p .7. together, these tests suggest that the matrix is factorable.

factor extraction: this phase condenses items in the data matrix into a smaller number of factors. it is used to define the number of underlying dimensions. it results in an unrotated factor matrix, which contains coefficients or weights for all original items on each extracted factor. two widely used factor extraction techniques are principal components analysis (pca) and common factor analysis.

according to costello and oborne (2005), common factor analysis is preferable to principal components analysis (pca), which is the “default” data reduction method in spss. principle components analysis transforms variables into smaller sets of linear combinations with all the variance in the variables utilized. this analysis technique can produce inflated values of variance in the components. common factor analysis utilizes only the shared variance amongst the items. each indicator is typically linked to only one factor, since cross-loadings suggest poorer construct validity.

besides the “default” pca, spss has six factor extraction methods from which to choose: unweighted least squares, generalized least squares, maximum likelihood, principal axis factoring, alpha factoring, and image factoring. costello and osborne recommend the common factor analysis methods of maximum likelihood extraction (for normally distributed data) or principal axis factoring (for non-normal data), over principal components analysis.

factor retention: there are a number of techniques used to determine which factors to retain: kaiser’s criterion, scree test, and parallel analysis. kaiser’s criterion retains all factors with eigenvalues greater than 1.0. this is considered one of the least accurate methods for deciding which factors to retain. the scree plot is a two dimensional graph with factors on the x-axis and eigenvalues on the y-axis. the number of datapoints above the natural bend or “elbow” are retained. generally, this graph yields good results if there is a clear "break" in the plot of eigenvalues at the “elbow.” costello and osborne (2005) contend that the scree test is the best choice for researchers. pallant (2007), on the other hand, states that parallel analysis is the best approach to deciding the number of factors. parallel analysis compares the size of the eigenvalues with those obtained from a randomly generated data of the same size. only those eigenvalues that exceed the corresponding values are retained.

factor rotation and interpretation: rotation refers to the shifting of the factors in the "factor space" to maximize the interpretation of the factor loadings. the goal of rotation is to simplify, clarify, and interpret the data by either orthogonal or oblique methods of rotation. researchers typically evaluate rotated rather than unrotated factor loadings for the obvious reason that they are easier to interpret.

spss offers a variety of factor extraction choices: rotated factor matrix after orthogonal rotation (varimax), or pattern matrix after oblique rotation (oblimin) (costello & osborne, 2005).

if the factors are truly independent and uncorrelated, orthogonal and oblique rotation methods produce essentially identical results and interpretations, but the orthogonal (varimax) method is preferred. if the factors are related, with some correlations, the oblique (oblimin) method is preferred. pallant (2007) suggests that the researcher start with oblimin rotation, as it provides information about the degree of correlation between the factors.

exploratory factor analysis versus confirmatory factor analysis

exploratory factor analysis (efa) and confirmatory factor analysis (cfa) are two statistical approaches used to examine the internal reliability of a measure.

exploratory factor analysis

exploratory factor analysis (efa) explores and summarizes underlying correlational structure for a data set. it is used in the early stages of research to explore interrelationships among variable sets, to find the model. efa is complex, has little absolute guidelines, and many application choices. this factor analysis approach is used when the underlying structure of a data set is unknown, to determine which domains comprise a construct of interest. in efa, the researcher explores how many factors there are, whether the factors are correlated, and which observed variables appear to best measure each factor. efa can reduce a large set of variables to a couple of underlying factors.

exploratory factor analysis is used to discover the factor structure of a measure and to examine its internal reliability. efa is recommended when researchers have no preconceived hypotheses or prior theory about the nature of the underlying factor structure of their measure. as such, it is an inductive approach using the factor loadings to uncover the factor structure of the data. since efa is exploratory in scope, there are no inferential statistical processes. efa is not appropriate to use for testing hypotheses or theories, but only to clarify and describe relationships. efa is subject to error and a wide variation of subjective interpretations, even with optimal data and large samples. this method is as much an “art” as it is a “science.”

exploratory factor analysis has three basic decision points: (1) deciding the number of factors, (2) choosing an extraction method, and (3) choosing a rotation method. efa then hypothesizes the underlying construct, the latent structure (dimensions) of a set of variables.

confirmatory factor analysis

confirmatory factor analysis (cfa) is a set of more complex and sophisticated statistical techniques used later in the research process to confirm the hypotheses or theories concerning the underlying structure generated by efa. it is a hypotheses testing approach, used to test the model. confirmatory factor analysis tests the correlational structure of a data set against the hypothesised structure and rates the “goodness of fit.” cfa tests hypotheses that state the number of factors representing data and the items comprising each factor. in cfa, the researcher specifies a certain number of factors, which factors are correlated, and which observed variables measure each factor.

cfa seeks to determine if the number of factors conform to what is expected on the basis of pre-established theory. indicator variables are selected on the basis of prior theory and factor analysis is used to see if they load as predicted on the expected number of factors. a minimum requirement of confirmatory factor analysis is that the researcher hypothesizes beforehand the number of factors in the model. he or she should also posit expectations about which variables will load on which factors. the researcher seeks to determine, for instance, if measures created to represent a latent construct really belong together.

statistics:

exploratory factor analysis uses a correlation matrix to see if any variables are components of factors. shared variance indicates an underlying factor. the eigenvalue shows the amount of variance (a type of effect size). factors with larger eigenvalues account for greater variance as opposed to factors with lower eigenvalues. as discussed earlier, inferential statistics should not be used in efa. the statistical problem is not one of testing a given hypothesis, but rather one of fitting the model to the data to decide where common variances are. the dimensionality of a set of items emerges empirically.

confirmatory factor uses a variance-covariance matrix to test hypotheses with inferential statistical techniques, using an advanced class of statistical techniques referred to as structural equation modeling (sem). in sem, the most frequently used estimation procedure is maximum likelihood estimation. cfa tests a measurement model by testing correlations (between observed and latent variables, pairs of latent variables, and among the errors). the researcher can compare the estimated matrices representing the relationships between variables in the model to the actual matrices. the researcher specifies a hypothesis by designating certain parameters in the factor matrices. the hypothesis is confirmed to the extent that the model still fits.

researcher information to report

when a researcher reports the results from factor analysis, he or she needs to include the following information: verification of assumptions; details of the method of factor extraction used; criteria used to determine details of the method of factor extraction used; criteria used to determine the number of factors retained; type of rotation technique used; total variance explained; initial eigenvalues; eignenvalues after rotation; and a table of loadings showing all values.

an example follows.

presenting the results from factor analysis (pallant, 2007, p. 197-198)

the 20 items of the positive and negative affect scale (panas) were subjected to principle components analysis (pca) using spss version 15. prior to performing pca, the suitability of data for factor analysis was assessed. inspection of the correlation matrix revealed the presence of many coefficients of .3 and above. the kaiser-meyer-olkin value was .87, exceeding the recommended value of .6 (kaiser 1970, 1974) and bartlett’s test of sphericity (bartlett 1954) reached statistical significance, supporting the factorability of the correlation matrix.

principle components analysis revealed the presence of four components with eigenvalues exceeding 1, explaining 31.2%, 17%, 6.1%, and 5.8% of the variance respectively. an inspection of the screeplot revealed a clear break after the second component. using catell’s (1966) scree test, it was decided to retain two components for further investigation. this was further supported by the results of parallel analysis, which showed only two components with eigenvalues exceeding the corresponding criterion values for a randomly generated data matrix of the same size (20 variables x 435 respondents).

the two-component solution explained a total of 48.2% of the variance, with component 1 contributing 31.25% and component 2 contributing 17.0%. to aid in the interpretation of these two components, oblimin rotation was performed. the rotated solution revealed the presence of simple structure (thurstone 1947), with both components showing a number of strong loadings and all variables loading substantially on only one component. the interpretation of the two components was consistent with previous research on the panas scale, with positive affect items loading strongly on component 1 and negative affect items loading strongly on component 2. there was a weak negative correlation between the two factors (r = -.28). the results of this analysis support the use of the positive affect items and the negative affect items as separate scales, as suggested by the scale authors (watson, clark & tellegen 1988).

references

costello, a. b., & osborne, j. w. (2005). best practices in exploratory factor analysis: four recommendations for getting the most from your analysis. practical assessment research & evaluation, 10(7), 1-9.

o’brien, k. (2007). factor analysis: an overview in the field of measurement. physiotherapy canada, 59, 142-155.

pallant, j. (2007). spss survival manual. new york: mcgraw-hill education.

polit, d. f., & beck, c. t. (2008). nursing research: generating and assessing evidence for nursing practice (8th ed.). philadelphia: wolters kluwer health.

http://www2.sas.com/proceedings/sugi30/203-30.pdf

http://www.soc.iastate.edu/sapp/soc512.efa.html

VickyRN, MSN, DNP, RN · Apr 14, 2009

structural equations comprehensively represent the complex multidimensional relations among research variables in a theory. structural equation modeling (or sem) is a sophisticated class of multivariate analytic statistical techniques used to examine the underlying relationships, or structure, among variables in a model. sem allows the researcher to model, test, and reduce hypothesized relationships among a set of observed variables. sem seeks to represent hypotheses about the means, variances, and covariances of observed data in terms of parameters defined by a hypothesized underlying model.

a structural equation model implies a structure of the variance-covariance matrix of the measures. researchers test whether variables are interrelated through a set of linear relationships by examining the variances and covariances of the variables. it helps answer questions about whether sample data are consistent with the hypothesized model.

sem can clearly summarize results that generate a large number of interrelated measures. variables can be treated as both independent variables and dependent variables. it allows examination of a set of relationships between one or more independent variables, either continuous or discrete, and one or more dependent variables, either continuous or discrete. independent variables are called exogenous or upstream variables; dependent or mediating variables are called endogenous or downstream variables.

sem deals with both observed and latent variables. an observed or manifest variable is a variable that can be observed directly and is measurable. a latent or unobserved variable is a variable that cannot be observed directly (such as intelligence or attitude) and must be inferred from measured variables. latent variables (or factors) are implied by the covariances among two or more measured variables. in sem, the focus is usually on latent variables, rather than on the observed variables used to measure these constructs. sem allows multiple measures to be associated with a single latent construct.

sem is a hybrid of multiple regression and factor analysis techniques, belonging to the general linear model family. sem analyzes relationships among latent variables by combining the strengths of factor analysis and multiple regression into a single model that can be tested statistically. like multiple regression, this model allows for the evaluation of direct and indirect effects of variables in a model. unlike multiple regression, sem allows all variables to be examined simultaneously, testing an entire hypothesized multivariate model. sem allows simultaneous assessment of the strength and direction of the interrelationships among multiple dependent and independent variables, examining the direct and indirect effects of one variable upon another.

structural equation modeling encompasses such diverse statistical techniques as path analysis, confirmatory factor analysis, causal modeling with latent variables, and even analysis of variance and multiple linear regression. major applications of sem include:

- causal modeling or path analysis

- confirmatory factor analysis

- second order factor analysis

- regression models

- covariance structure models

- correlations structure models

most structural equation models can be expressed as path diagrams. path diagrams are similar to flowcharts, with lines, arrows, and geometric figures. they show the way observed and unobserved variables are inter-related, as well as showing which variables cause changes in other variables. ovals or circles represent latent variables, while rectangles or squares represent measured variables. residuals are unobserved, so they are represented by ovals or circles. correlations and covariances are represented by bidirectional arrows, which represent relationships without an explicitly defined causal direction.

this is an example of a path diagram used in structural equation modeling: http://ssc.utexas.edu/consulting/tutorials/stat/amos/images/wheaton_diagram.gif

path analysis is commonly used to evaluate direct and indirect associations among observed variables. sem goes beyond the information provided by path analysis by allowing a more precise estimation of the indirect effects of independent variables on all dependent variables. sem allows researchers to test theories and assumptions directly by specifying which variables are related to other variables. that is, the researcher can test some paths (or relationships) but not others in the analysis.

sem has distinct advantages over an ordinary least squares multiple regression approach. sem tests both conceptual and measurement models simultaneously, tests latent variable structure, allows for multiple measures of independent variables, adjusts for measurement error, and, it utilizes the measurement model to identify the errors of measurement. with sem, there is no assumption that the observed variables are measured without error. thus, sem is more flexible and realistic in that it allows for measurement error and does not require perfect reliability. sem allows researchers to examine relationships among latent variables with multiple observed variables. the researcher can evaluate the real-world scenario of observed variables' simultaneous impact on one another, without having to make artificial decisions about blocking or order of entry. the relationships among latent variables are purged of measurement error, leading to more accurate and often stronger relationships between latent variables than what would be observed using multivariate methods that consider observed variables only (such as manova or multiple regression).

sem is suited to theory testing rather than theory development. the researcher first specifies a model based on theory, then determines how to measure constructs of interest (i.e., how to operationalize these with a reliable and valid measurement instrument), collects data, and then inputs the data into the sem software package. the package fits the data to the specified model and produces the results, which include overall model fit statistics and parameter estimates. the researcher then makes modifications. all sem analyses follow a logical sequence of these five steps or processes: model specification, model identification, model estimation, model testing, and model modification.

sem is sometimes referred to as causal modeling. it allows for assessment of indirect causal paths to outcomes, as well as the testing of alternative models. sem is used to test the reasonableness of alternative hypotheses regarding the causal relationships between various measures, and their relationships to underlying dimensions or latent variables. sem can be used to analyze causal models involving latent variables. with sem, researchers can pose complex models that evaluate the direct and indirect impact of several variables on one or more outcome variables. these complex models can predict various types of outcomes. the researcher must keep in mind that correlation is not causation, even if the correlation is complex and multivariate. what causal modeling does allow is examination of the extent to which data agree or fail to agree with a model of causality.

sem models have two basic elements: a measurement model and a structural equation model. the measurement model describes the indicators (observed measures) of the latent variables. this corresponds to a confirmatory factor analysis, in which a measurement model is tested. the structural mode delineates the direct and indirect effects among latent variables, specifying how the latent variables or hypothetical constructs are measured in terms of the observed variables. it also describes the measurement properties, the validities and reliabilities, of the observed variables. using sem allows the researcher greater options for estimation, including the most commonly used maximum likelihood methods along with numerous statistical indices for evaluating model fit.

when there is evidence of an adequate fit of the data to the hypothesized measured model, the theoretical causal model is tested by structural equation modeling. the structural equation model specifies the causal relationships among the latent variables and describes the underlying effects and the amount of unexplained variance. in this part of the analysis, sem yields information about the hypothesized causal parameters – that is, the path coefficients, which are presented as beta weights. the coefficients indicate the expected amount of change in the latent endogenous variable that is caused by a change in the latent causal variable. sem programs provide information on the significance of individual paths. the residual terms (amount of unexplained variance for the latent endogenous variables) can also be calculated from the sem analysis. the overall fit of the causal model to the research data can be tested by means of several alternative statistics. two such statistics are the goodness-of-fit (gfi) and the adjusted goodness-of-fit (agfi). for both indexes, a value of .90 or greater indicates a good fit of the model to the data.

three key requirements of sem are as follows: thorough knowledge of the theory; adequate assessment of statistical criteria; and parsimony (ability to predict the greatest amount of variance in the outcome variable or variables using the smallest number of predictor variables).

assumptions: any analysis in sem assumes that the model has been specified correctly, that the sample size is sufficiently large (e.g., n > 200), that there is independence of observations, that multivariate data are distributed normally, that there are linear relationships among the observed variables, and that there is an absence of highly correlated observed variables (e.g., r > .90). examining residuals after a model has been reduced helps the researcher to determine the extent to which the errors in prediction are distributed normally within acceptable ranges.

interpreting results of sem depends on the quality of the measured data and generalizability of the sample. sem allows the researcher to evaluate the importance of each independent variable in the model and to test the overall fit of the model to the data. a good fit of the specified measurement or structural model to the observed data indicates that the model is consistent with the relationships within the observed data. the researcher asks, "does it fit well enough to usefully approximate reality and to furnish a reasonable explanation of the data trends?" once the researcher obtains a model that fits well, is theoretically consistent, and provides statistically significant parameter estimates, the researcher must interpret it in the light of the research questions and then distill the results in written form for publication. the fact that the model fits the data does not necessarily imply that the model is the correct one. there may be other equivalent models that fit the data equally well. there may also be non-equivalent alternative models that fit the data better than this model. researchers should strive to test and rule out likely alternative models whenever possible.

spss does not have a structural equation modeling module, but it does support an “add on” called amos (analysis of moment structures) or lisrel (linear structural relations). eqs is another statistical package for doing sem.

research example: structural equation modeling being used to examine the hypothesized causal and correlational links among racism, chronic stress emotions, and blood pressure.

references

buhi, e. r., goodson, p., & neilands, t. b. (2007). structural equation modeling: a primer for health behavior researchers. american journal of health researchers, 31(1), 74-85.

clayton, m. f., & pett, m. a. (2008). amos versus lisrel: one data set, two analyses. nursing research, 57(4), 283-292.

hayes, r. d., revicki, d., & coyne, k. s. (2005). application of structural equation modeling to health outcomes research. evaluation & the health professions, 28, 295-309.

musil. c. m., jones, s. l., & warner, c. d. (1998). structural equation modeling and its relationship to multiple regression and factor analysis. research in nursing & health, 21, 271-281.

pallant, j. (2007). spss survival manual. new york: mcgraw-hill education.

polit, d. f., & beck, c. t. (2008). nursing research: generating and assessing evidence for nursing practice (8th ed.). philadelphia: wolters kluwer health.

http://faculty.chass.ncsu.edu/garson/pa765/structur.htm

http://www.statsoft.com/textbook/stsepath.html

http://www2.gsu.edu/~mkteer/sem.html

http://ms.cc.sunysb.edu/~dsdwyer/factor.doc

http://ssc.utexas.edu/consulting/tutorials/stat/amos/