The goal of this lab session is to learn how to perform Student’s t-tests using R. In this session, we will analyze the data from our experiment on typing speed (words per minute, WPM) using two different keyboards: the Opti keyboard and the QWERTY keyboard. The data are stored in long format in the file keyboard_data_R_2026.csv, which means that each row corresponds to one trial. Key variables include:
Before beginning, ensure that you have already completed Lab 1 (Descriptive Statistics) and loaded your dataset into R.
The data set contains data from a controlled experiment on typing speed using the Opti keyboard vs Qwerty keyboard. There are many different research questions we can investigate, e.g.:
The experiment used a within-group design for the different keyboards. All participants were asked to complete typing tasks on Opti and on Qwerty keyboards. That means that we have paired observations: typing speed in words/minute on Opti and typing speed in words/minute on Qwerty. In addition, the participants’ gender (Sex) and age (Age) were also recorded.
We start with our first research question (RQ1):Let’s first describe our data for either keyboard. We can summarize (aggregate) our data per keyboard using the ‘group_by’ function. It calculates the mean and standard deviation (both its parametric and non-parametric alternatives) for each group (Keyboard):
keyboard_data %>%
group_by(Keyboard) %>%
summarise(Mean = mean(WPM, na.rm = TRUE),
Std = sd(WPM, na.rm = TRUE),
Median = median(WPM, na.rm = TRUE),
MAD = mad(WPM, na.rm = TRUE)
)
Take a look at the output;
Since every participant completes trials on both keyboards, we have repeated measures for each participant. This means each participant contributes paired observations (one for the OPTI condition and one for the QWERTY condition).
To perform a paired t-test, we first rearrange the data to ensure that each row represents a matched pair of trials for the same participant and trial order. Then, we pivot the data into a wide format, so that each participant’s WPM on OPTI and QWERTY appear in separate columns:
library(tidyverse)
wpm_trial <- keyboard_data %>%
select(ParticipantID, Trial_order, Keyboard, WPM) %>%
arrange(ParticipantID, Trial_order, Keyboard) %>% # Sort data so pairs match correctly
pivot_wider(names_from = Keyboard, values_from = WPM) # Pivot to wide format to match trial pairs
t.test(wpm_trial$OPTI, wpm_trial$QWERTY, paired = TRUE)
The reshaped data frame has two data columns, OPTI and QWERTY (both
WPM measures). Open it in RStudio to inspect. The t.test()
function compares these columns, and paired = TRUE tells R
that each pair of values (OPTI & QWERTY) belongs to the same
participant-trial combination, so differences within pairs are analyzed
rather than differences between
A note on sample normality and t-tests: The t-test doesn’t assume that the individual data points (our sample) are normally distributed. It assumes that the means of samples taken from the population are normally distributed. This is known as the sampling distribution of the mean. Even though our sample shows positive skew, if our sample size is large enough (usually 20–30 or more), the central limit theorem tells us that the distribution of the sample means will be approximately normal. Thus, the t-test remains valid despite our sample data’s skewness.
When reporting your t-test in APA style, include:
For example: “A paired-samples t-test indicated that WPM scores were significantly higher in the OPTI condition (M = 80.1, SD = 12.3) than in the QWERTY condition (M = 75.4, SD = 10.8), t(29) = 2.35, p = .026.” (note: these are random data).
Tip: Negative t-values: The sign of a t-value tells us the direction of the difference in sample means, which can be difficult to interpret without further explanation: Does a negative t-value indicate Opti’s sample mean was greater or smaller than Qwerty? Therefore, it is common to indicate the direction of the mean-difference (even if nonsignificant) in some other way, such as by mentioning the sample means in the text, or by showing the sample means graphically, as in a bar chart.
Participant, Keyboard) and summarize methods
(to extract the mean of WPM). For a tutorial on group_by,
see hereWhat we’ve done in answering RQ1 is group all participants together. But the fact that we found a statistically significant difference does not mean that this difference also exists within our subgroups, e.g., is there also a significant difference between typing on the OPTI keyboard vs.QWERTY for our two age groups, younger and older students? To be able to answer this question, we need to temporarily de-select all older participants in our data set.
The first task is to recode your Age variable into two groups that fit your data and create an Age_group variable, dividing the respondents into a “younger” and “older” group:
Check the frequency distribution of your Age variable (see Stats1, “Frequency distribution table”) to understand its distribution.
Use R to calculate the median of Age:
age_median <- median(keyboard_data$Age, na.rm = TRUE)
print(age_median)Based on the frequency table and the median value, decide on a cutoff that divides your sample into two roughly equal groups. In many cases, the median is a good choice.
Using the method shown in Stats1 (“Re-coding variables”), create a new variable (Age_group) that assigns participants to “younger” (if Age is less than or equal to the median) or “older” (if Age is above the median).
Second, we need to filter our dataframe on a subgroup of younger students. In Stats1 we filtered twice: on a specific partipant (ID: 5193237) and on a specific keyboard (OPTI). Use the same filter command to now filter on younger students.
Now that we have answered our first research question, we can move on to our second:
RQ2: Is there a difference in wpm between males and females?
If we want to investigate differences between the genders, then we suddenly have a between-group design on our hands: you cannot be both female and male at the same time. That means that the genders form two independent groups, which in turn means that we have to use an independent-samples t-test.
To compare participants, we first need to aggregate our data at the participant level by calculating the mean WPM for each participant:
mean_wpm <- keyboard_data %>%
group_by(ParticipantID,Sex) %>%
summarize(Mean_WPM = mean(WPM, na.rm = TRUE), .groups = "drop")
head(mean_wpm)
Using this aggregated dataframe we can perform an independent-samples t-test. You can use the same function call as before (t_test), though without the ‘paired=TRUE’ parameter. You will also need to change the formula: Instead of predicting WMP by Keyboard, we will now compare mean_WPM by Sex.QEE
Tip: For an appropriate graph, consult “Population
Pyramids (Alternative: Faceted Bar Chart in R)” from Stats1. Use the
mean_wpm dataframe, take Mean_WPM as x-axis
(in the aes definition), ~ Sex for facet_wrap.
The t-test assumes that variances within both groups are equal This is called the ‘equality of variances assumption’. To test this, you can run Levene’s test. If the p-value is above 0.05, you assume equal variances and use the standard t-test; if below, you must use the t-test with different instructions.
library(car)
leveneTest(Mean_WPM ~ Sex, data = mean_wpm)
If significant, revisit the previous question and include the
parameter var.equal = FALSE in your t_test call.
We also still need to answer our third and last research question:
RQ3: Is there a difference in typing speed between the older and younger student groups?
So far, we have used t-tests to compare two groups (e.g., WPM between OPTI and QWERTY keyboards). However, when we have more than two categories, a t-test is no longer suitable. Instead, we use Analysis of Variance (ANOVA):
Previously, we used a t-test to compare WPM between OPTI and QWERTY
keyboards. Here, we perform a one-way ANOVA to achieve a similar result,
using again our full data set (keyboard_data):
anova_keyboard <- aov(WPM ~ Keyboard, data = keyboard_data)
summary(anova_keyboard)
If the ANOVA is significant (p < 0.05), follow up a post-hoc test (e.g., Tukey’s HSD or Bonferroni) to find which Keyboard is faster:
TukeyHSD(anova_keyboard)
Question:
In Case 1, the one-way ANOVA was more conservative than the paired t-test because it treated Keyboard as a between-subjects variable, assuming that each WPM value came from different participants. However, our study is within-subjects: each participant types on both keyboards across multiple trials.
To correctly account for this, we use a repeated-measures ANOVA, which separates between-participant variance from within-participant variance, just like a paired t-test does.
# Convert values to factors (categoricals)
keyboard_data <- keyboard_data %>%
mutate(
ParticipantID = factor(ParticipantID),
Keyboard = factor(Keyboard),
Trial_order = as.numeric(Trial_order)
)
# Fit the repeated-measures ANOVA model
anova_model <- aov(WPM ~ Keyboard * Trial_order + Error(ParticipantID/(Keyboard*Trial_order)), data = keyboard_data)
summary(anova_model)
Explanation of the Formula Notation:
WPM ~ Keyboard * Trial_order. This follows
standard formula notation in R: WPM is the dependent
variable, Keyboard * Trial_order specifies that we want to
test for main effects of Keyboard and
Trial_order, as well as their interaction
(Keyboard:Trial_order).Error(ParticipantID/(Keyboard * Trial_order))
specifies ParticipantID as subject identifier, and repeated
measures across the combinations of Keyboard and
Trial_order. It tells R how to account for within-subject
variability (i.e., repeated measures).Note: The conversion to factors is important here! If Trial_order is considered an interval (continuous) scale, anova evaluates whether WPM changes linearly across trials. It assumes that the effect follows a consistent upward/downward trend. As factor, anova tests for WPM differences at each trial. This also captures non-linear patterns (e.g., rapid learning, then plateau).
The resulting ANOVA table provides three tests:
If the interaction is significant, it suggests that one keyboard may have benefited more from repeated trials than the other, indicating a more pronounced learning (or fatigue) effect for one of the keyboards.
Question:As indicated, an ANOVA can also be used to compare >2 categories.
We can test this using our MessagesCategory variable (You
created this variable in Stats1.) Re-run your recoding code if
needed:
library(dplyr)
keyboard_data <- keyboard_data %>%
mutate(MessagesCategory = case_when(
Messages_per_day <= 10 ~ "10 or less",
Messages_per_day >= 11 & Messages_per_day <= 50 ~ "11 to 50",
Messages_per_day > 50 ~ "More than 50"
))
MessagesCategory. If significant, conduct a post-hoc test
(Tukey’s HSD) to determine which categories differ. Interpret the
results: Is there a difference in typing speed based on how frequently
people send messages?Work through the exercises, compare your results with the examples provided, and discuss any discrepancies with your peers and instructors.
Happy analyzing!