Multiple Linear Regression

  1. 4

    Regression analysis is a broad term for statistical techniques used to model and analyze numerical data consisting of values of a dependent variable y (also known as the response or predicted variable) and one or more independent variables x (also known as explanatory or predictor variables). Regresion analysis is based on correlation. Correlation examines the strength and direction of the linear relationship between two variables, but does not signify causation.

    Multiple Linear Regression

    With linear regression models, dependent variables need to be continuous, measured at the interval or ratio level, with scores normally distributed. categorical dependent variables are not suitable for linear regression. the independent variables used in regression can be either continuous or dichotomous.

    Simple linear regression

    Simple linear regression studies the relationship between a dependent variable y and a single independent variable x. In doing so, it makes predictions about the values of the response variable y based on values of the explanatory variable x. The simple linear regression equation represents a straight line when the dependant variable is plotted against the independent variable.

    The most common method for fitting a regression line is the method of least squares. This method calculates the best-fitting line for the observed data by minimizing the sum of the squares of the vertical deviations from the predicted line data. Because they represent “left-over” variation in the response after fitting the regression line, these distances are called residuals. A residual is the difference between an observed value of the dependent variable and the value predicted by the regression line (residual = observed y [font=symbol][font=symbol]- predicted y).

    during linear regression analysis, the term linear does not refer to this straight line, but to the way the regression coefficients occur in the regression equation. a simple linear regression line has an equation of the form y = a + bx, where x is the independent variable and y is the dependent variable. the slope of the line is b, and a is the intercept (the value of y when x = 0).

    Multiple Linear Regression

    Multiple regression is a method of predicting a continuous dependent variable y on the basis of two or more independent variables x. It models the relationship by fitting a linear equation to observed data. Every value of the independent variable x is associated with a value of the dependent variable y.

    The model for multiple linear regression, given n observations, is
    yi = 0 + 1xi1 + 2xi2 + ... pxip + i for i = 1,2, ... n.

    The purpose of multiple regression is to learn more about the relationship between two or more independent variables x and a dependent variable y. It allows the researcher to investigate complex relationships among three or more variables. This statistical technique can establish that a set of independent variables explains a proportion of the variance in a dependent variable at a significant level (through a significance test of r square), and can establish the relative predictive importance of each of the independent variables (by comparing beta weights).

    Research examples using multiple regression:

    predicting infant birth weight (dependent variable) based on three predictor variables: age of mother, smoking during pregnancy, and number of prior pregnancies.

    Predicting average gpa scores based on three predictor variables: average high school grades in mathematics, science, and english.

    Types of multiple regression

    major types of multiple regression include standard (simultaneous), hierarchical (sequential), and stepwise.

    With standard multiple regression, there are several independent variables simultaneously predicting the dependent variable. A set of variables is used to explain (as a group) the variance in a dependent variable. This is the most commonly used multiple regression analysis.

    With hierarchical regression, the independent variables are entered into the equation in a specified order. Each block of independent variables is assessed in its predictive strength of the dependent variable, while controlling for the contributions of any previous blocks. Once all sets of variables are entered, the overall model is assessed. This method is more in depth.

    Assumptions of multiple regression:

    variables measured without error.
    Random selection from the population of interest.
    Normality – residuals (predicted minus observed values) are distributed normally.
    Linearity – relationship between residuals and predicted dependent variables is a straight-line.
    Can only show strength of relationship (correlation) between variables, not causality.
    Sample size – 20 cases per independent variable or 40 cases per independent variable for stepwise regression.
    Very few outliers.
    Multicollinearity – no redundancy among the independent variables.
    Singularity – independent variables need to be mutually exclusive. They should not be subsets or combinations of other independent variables.
    Homoscedasticity – variance of residuals about predicted dependent variables same for all predicted scores.

    The test statistic used to generate p-values is the f-ratio in which variability due to regression is contrasted with residual variability. A significance test of r square values demonstrates the percent of variance in the dependent variable explained by the overall model (set of independent variables as a whole).

    Effect size measures

    Effect size is shown by comparing the beta values for each independent variable, in relation to the variance of the model as a whole (r-square). Beta weights represent the unique contribution of each independent variable to the r-square, after removing overlapping effects of all other variables.

    When values of b (beta) are divided by the standard error, the result is a t-statistic, which indicates the significance of each predictor. Beta is the average amount the dependent variable increases when the independent variable increases one standard deviation and other independent variables are held constant. The independent variable with the largest beta weight is the one that exerts the largest unique explanatory effect on the dependent variable. The betas will change if variables or interaction terms are added or deleted from the equation, but are not affected by the reordering of variables.

    Research information

    when a researcher reports the results from a multiple regression statistical test, he or she needs to include the following information: type of analysis (standard or hierarchical); standardized beta values if the study was theoretical, or unstandardized beta coefficients with their standard errors if the study was applied; r square change values for each step and associated probability values (with hierarchical multiple regression); sample sizes; dependent variable, predictor variables, control variables; verification of assumptions (normality, linearity, multicollinearity, and homoscedasticity); r-square scores, f-statistic, p-values, significance; beta values p-values, and significance. An example follows.

    Presenting the results from hierchical multiple regression (pallant, 2007, p. 164):

    hierarchical multiple regression was used to assess the ability of two control measures (mastery scale, perceived control of internal states scale: pcoiss) to predict levels of stress (perceived stress scale), after controlling for the influence of social desirability and age. Preliminary analyses were conducted to ensure no violation of the assumptions of normality, linearity, multicollinearity, and homoscedasticity. Age and social desirability were entered at step 1, explaining 6% of the variance in perceived stress. After entry of the mastery scale and pcoiss scale at step 2 the total variance explained by the model as a whole was 47.4%, f (4, 421) = 94.78, p < .001. The two control measures explained an additional 42% of the variance in stress, after controlling for age and socially desirable responding, r squared change = .42, f change (2, 421) = 166.87, p < .001. In the final model, only the two control measures were statistically significant, with the mastery scale recording a higher beta value (beta = -.44, p < .001) than the pcoiss scale (beta = -.33, p < .001).


    Moore, D. S., & Mccabe, g. p. (2003). Introduction to the practice of statistics (4th ed.). new york: w. h. freeman and company.

    Pallant, J (2007). SPSS Survival Manual. new york: mcgraw-hill education.

    Polit, D. F., & Beck, c. t. (2008). Nursing research: generating and assessing evidence for nursing practice (8th ed.). philadelphia: wolters kluwer health.

    300 Multiple Choices

    DSS - Introduction to Regression
    Last edit by Joe V on Jan 8, '15
    Do you like this Article? Click Like?

  2. Visit VickyRN profile page

    About VickyRN

    Joined: Mar '01; Posts: 12,046; Likes: 6,493
    Nurse Educator; from US
    16 year(s) of experience in Gerontological, cardiac, med-surg, peds

    Read My Articles


  3. by   danh3190
    In my former life as a paint research chemist we always did multiple variable experiments. They were so much more efficient in that we could see interactions among multiple factors. We could do simpler experiments to evaluate which of many possible factors had any effect at all on the response, then more detailed experiments using just those factors (e.g. catalyst, UV absorber, crosslinkers) to optimize multiple responses, (e.g. drying time, price, durability). In our field we tended to do one type of stepwise regression so it was nice to see a broader description of the process.
  4. by   anoro
    This article could have saved me weeks in that semester of grad. school & I might have actually stayed. thanks
  5. by   BBFRN
    I am so printing this out. Thanks for posting this!