STATS SESSION 3: INFERENTIAL STATISTICS (PART 2)

License: CC BY-SA 4.0 Available on Github

Changelog

  • 2025 Version for R, extended with interaction effects in regression modeling (Frans van der Sluis)
  • 2019 Original version for SPSS (Lorna Wildgaard, Haakon Lund, and Toine Bogers)

Topics

  • Correlation analysis (e.g., Pearson’s r, Spearman’s rho)
  • Chi-square tests for categorical data
  • Hierarchical multiple regression
  • Interaction effects
  • Data visualization for inferential statistics (scatterplots, bar charts)

Todo: Add resources

Introduction

The goal of this part of the lab session is to learn how to calculate correlations and how to perform Chi-square tests. In addition, it shows how to analyze questions where participants can select multiple answers. We will continue using the OPTI/QWERTY data set. As usual, try to solve the questions by yourself, with the help of your fellow students before you peek ahead. Now let’s start doing some inferential statistics!

Before beginning, ensure that you have already completed Lab 1 (Descriptive Statistics) and Lab 2 (Inferential Statistics Part 1) and loaded your dataset into R.

Introduction

The goal of this lab session is to learn how to calculate correlations and perform Chi-square tests using R. In addition, you will learn how to analyze questions where participants can select multiple answers. We will continue using the OPTI/QWERTY data set.

Remember: Try to solve the questions on your own before checking the answers at the end of this document.

2. Correlation

Correlation measures the strength of the relationship between two variables. One interesting possible relationship is between the number of messages a person sends per day and their average typing speed (WPM) on the OPTI or QWERTY keyboard. Initially, we will use our questionnaire data only. Prepare by loading questionnaire_data_R_2024.xlsx.

A good starting point for examining the relationship between two variables is to create a scatterplot. For example, to plot Age (X-axis) vs. Hours_of_computer_use_per_day (Y-axis):

library(ggplot2)

# Replace 'data' with your dataset name
ggplot(questionnaire_data, aes(x = Age, y = Hours_of_computer_use_per_day)) +
  geom_point() +
  labs(title = "Scatterplot of Age vs. Hours_of_computer_use_per_day",
       x = "Age",
       y = "Hours of Computer Use")

Next, compute the correlation. For ratio/interval variables, Pearson’s r is commonly used. For example:

cor_test_result <- cor.test(questionnaire_data$Age, questionnaire_data$Hours_of_computer_use_per_day, method = "pearson")
cor_test_result

Questions:

  1. Which correlation measure(s) are appropriate for these variables? Pearson’s r, Spearman’s rho, or Kendall’s Tau?
  2. What is the correlation between AGE and HOURS_COMPUTER_USE? Formulate the null and alternative hypotheses, calculate the correlation at α = 0.05, and report the results. What does this result mean for the relationship between these variables?

Example APA-style report: “A Pearson correlation analysis revealed that there was no significant relationship between age and hours of computer use, r(64) = .08, p = .508.”

3. Chi-square Test

The Chi-square test allows us to test whether there is a significant relationship between two ordinal/nominal variables. More specifically, it tests whether the frequencies of observed events differ significantly from what we would expect to find if all events were equally likely.

Let’s dive into our data set and investigate whether female students send more messages per dag than male students. For us to investigate this we need to check whether there are differences in the observed and expected frequencies of messages sent for different genders.

If there is no difference between the likelihood of these different categories, we would expect that there are as many female students sending a lot of messages as male students, and as many female students sending few messages just like their male counterparts. The Chi-square test tells us whether this is the case.

Step 1: Variable definition

Before running the Chi-square test, ensure your messaging variable is recoded into ordinal groups. Re-run your recoding code from Stats1 if needed:

library(dplyr)
questionnaire_data <- questionnaire_data %>%
  mutate(MessagesCategory = case_when(
    Messages_per_day <= 10 ~ "10 or less",
    Messages_per_day >= 11 & Messages_per_day <= 50 ~ "11 to 50",
    Messages_per_day > 50 ~ "More than 50"
  ))

Next, set the order of the categories:

questionnaire_data$MessagesCategory = factor(questionnaire_data$MessagesCategory, levels = c("10 or less", "11 to 50", "More than 50"), ordered = TRUE)

This turns MessagesCategory from a nominal into an ordinal scale.

Then, let us just remind ourselves of the data:

Now let’s see if there is a relationship between the amount of messages our sample sends per day and gender or is messaging equally likely regardless of gender (our null hypothesis)?

Question:
  1. Formulate the null/alternative hypotheses belonging to this research question.

Step 2: Create a Contingency Table

A contingency table displays the observed frequencies of respondents in each combination of categories. In our case, it shows how many males and females fall into each MessagesCategory (for example, “<=10”, “11-50”, etc.). This table is the foundation for performing the Chi-square test:

tab <- table(questionnaire_data$Sex, questionnaire_data$MessagesCategory)
addmargins(tab) # adds row/column sums

Step 3: Perform the Chi-square Test

The Chi-square test evaluates whether the differences in frequencies across categories are statistically significant compared to what we would expect if there were no association between gender and messaging behavior. A significant result (p < 0.05) would indicate that the observed differences in messaging frequency between genders are unlikely due to chance:

# Perform the Chi-square test
chi_test <- chisq.test(tab)
chi_test
Question:
  1. Report the results of the Chi-square test.

For example, A Chi-square test revealed no significant relationship between gender and messages sent per day, χ²(2, N = 64) = 1.10, p = .78.

Note: Ensure that there are at least 5 cases per cell in your contingency table for the test to be reliable.

Step 4 (Bonus): Visualize the Data with a Faceted Bar Chart

Although the contingency table and Chi-square test offer numerical insights, a visual display helps illustrate the differences in distributions between men and women. A faceted chart is ideal for comparing messages-per-day distributions between groups.

You can find a tutorial here. Skip to the first mention of facet_wrap. Be aware of a few differences:

  • The tutorial uses geom_bar(stat = "identity"), which requires a predefined y-value. Instead, we will use geom_bar(stat = "count"), which automatically counts observations.
  • Because of this, unlike the tutorial, we do not need to specify y in aes().
  • Consider:
    • What should be our x variable: Sex or MessagesCategory?
    • What should be our facet_wrap variable: Sex or MessagesCategory?
  • The tutorial sets a fixed fill color (fill="forest green"). Instead, you can set the fill based on MessagesCategory by adding fill = MessagesCategory inside aes(). Looks nicer :-) (And allows for color-matching between the two distributions).

4. Hierarchical Multiple Regression

It is time to address the question if the OPTI or QWERTY keyboards are in fact faster than one another if we control for a number of variables (age, number of messages sent per day, group 1 or 2).

RQ: If we control for the possible effect of age and the participants use of sending messages in their daily life, can our statistical model predict that the participants type significantly faster on the QWERTY keyboard vs the OPTI keyboard?

To address the question, we can do a hierarchical regression. There are many other types of regression models, but we do not have time to go through them all. Some models are more appropriate on some datasets than others (size and type), the purpose of your investigation and importantly the models differ in the amount of control you have over the statistical analysis. Doing a hierarchical regression, means we enter the predictor (independent) variables in steps, in an order we decide. In the first step, we use Keyboard, Group and Messages_per_day in the analysis. In the second step, we add Age. Now we can see whether our variables can explain the variance between the WPM on the keyboards.

We will answer the research question using our keyboard_data dataframe, so be sure to load that one first. In R, you can perform hierarchical regression using the lm() function and compare models using anova():

# Model 1: Predict OPTI_MEAN using GROUP and MESSDAY
model1 <- lm(WPM ~ Group + Keyboard + Messages_per_day, data = keyboard_data)
print(summary(model1))

# Model 2: Add AGE_GROUP to the predictors
model2 <- lm(WPM ~ Group + Keyboard + Messages_per_day + Age, data = keyboard_data)
print(summary(model2))

# Compare the models to see the added variance explained by AGE_GROUP
anova(model1, model2)

Step 1: Evaluating the model

Check the R Squared values in the Model Summary tables. After the variables in model1 (Keyboard, Group and Messages_per_day) have been entered, the overall model explains 20.7% of the variance in average scores (R Square). After model2 variables have also been entered (Age) and included, the model as a whole explains 21.7% of the variance in average WPM.

Question:
  1. Read the regression results. How much additional variance does model2 explain in comparison to model1? Is this difference significant following the ANOVA test?

Step 2: Evaluating each of the independent variables

Let’s see how well each of the variables contribute to our regression. To compare the relative impact of different predictors, we can standardize the coefficients:

library(lm.beta)

# Run regression and get standardized coefficients
model2_standardized <- lm.beta(model2)
summary(model2_standardized)

This extended summary table provides key information about the relationship between each predictor and the dependent variable (WPM):

  • Estimate Coefficient (B): Represents the expected change in WPM for a one-unit increase in the predictor, keeping other variables constant. The scale depends on the original units of measurement.
  • Standardized Coefficient (β): Converts predictors to z-scores, making them comparable in terms of effect size. A larger β means a stronger influence on WPM.
  • Standard Error: Measures how much the coefficient varies across samples. Smaller values indicate more precise estimates.
  • t-value: Tests how far the coefficient is from 0. Larger absolute values indicate stronger relationships.
  • p-value (Pr(>|t|)): If p < 0.05, the predictor has a statistically significant impact on WPM.
Questions:
  1. Examine the coefficients table. Do the significance levels of predictors such as Sex and Age_Group confirm or contradict earlier t-test results? Why might this be the case?
  2. Create model3 (and 4 and 5 and …) by modifying the set of predictors. Consider adding new variables (e.g., Sex) and removing insignificant ones (e.g., Group). How does this affect the model’s explanatory power?

When predictors are analyzed together in a regression model, their significance can differ from individual t-tests due to several key reasons:

  • Multicollinearity: If two predictors are correlated, part of their explanatory power overlaps. In a multiple regression, the model assigns shared variance between predictors, which can make an effect weaker or even non-significant compared to when analyzed alone in a t-test.
  • Additive Interaction Effects: A predictor might not be significant alone but contributes when combined with others. Some predictors may only show an effect in the presence of another variable (interaction effects). Conversely, adding an unnecessary variable can dilute the effect of others.

Step 3: Reporting regression results

We can report our results as so: Hierarchical multiple regression was used to assess if the average typing speed on the OPTI keyboard was effected by the participant’s age, the amount of messages they send per day in their daily life, and if they tested the OPTI keyboard before the QWERTY keyboard. Test group and messages per day were entered at step 1, explaining 9.4% of the total variance in average wpm. After entry of Age in step 2, the total variance explained by the model was 11.6%, F(3,60)=2.6, p>0.5. Age added an additional 2.2% of the variance in wpm, after controlling for group and messages sent per day. R square change=.22, F change (1,60) 1.491, p>0.05. In the final model Test Group variable was statistically significant predictor Beta=0.271, p<.05, where as Messages sent per day (Beta=-0.14, p>.05) and Age (Beta=-0.158,p >0.05) were not.

Alternatively, one can also report the coefficients table. For an example, see here.

Note: The effect of Group might not be significant, but you would still want to report it with your model1. It tells us about whether there occurred ‘transfer learning’ from QWERTY to OPTI or vice versa, which was controlled for by our within-subjects design.

Question:
  1. Report the results of your hierarchical regression. Choose between a textual or a textual + tabular description.

Step 4: Investigating interaction effects

We suspect that for our dataset it is more likely that experience with the QWERTY keyboard has a greater effect on WPM for the QWERTY keyboard than the OPTI keyboard. Our participants were used to working on the QWERTY keyboard and had most likely never worked with the OPTI keyboard before. Our experiment is biased in favour of the QWERTY keyboard.

This bias suggests that participants who frequently send messages should benefit even more when using the QWERTY keyboard. This is called an interaction effect: we expect Messages_per_day to interact with Keyboard in predicting WPM. Specifically, we hypothesize that higher message frequency improves WPM more for QWERTY than for OPTI.

In R’s formula notation, interactions are specified using the * or : operators: - A * B adds both main effects (A and B) and their interaction term (A × B). - A : B includes only the interaction term, without main effects.

Questions:
  1. Create a model5 in which include Keyboard * Messages_per_day as predictor in your formula to test for an interaction effect. Can you confirm an interaction effect?
  2. Could another experimental design create a fair test between the two keyboards?

Work through the exercises, compare your results with the examples provided, and discuss any discrepancies with your peers and instructors.

Happy analyzing!