Brewer and Unsworth (2012) reported that individuals with low episodic memory ability exhibit a larger testing effect, a finding with potentially important educational implications. We conducted two replication attempts of that study. Exp 1 (n = 120) drew from a broad demographic sample and was conducted online, while Exp 2 (n = 122) was conducted in the lab with undergraduate students. Both experiments demonstrated a large testing effect across the range of episodic ability in our sample, and with no trend suggesting a larger testing effect for lower ability subjects. We show that apparent differences in the distribution of episodic ability levels between our samples and that of Brewer and Unsworth provide a plausible account of the contrasting correlation results, and that, more generally, sampling from a restricted ability range can yield positive, negative, or no correlation even if there is no difference in the effectiveness of testing for low vs. high ability subjects in the broader population. We discuss methodological and theoretical issues that complicate interpretation of individual differences effects in this domain, individual difference predictions of testing effect models, and educational implications.
A large body of empirical research has established that retrieval from memory during a test enhances subsequent memory for that information more than does an equivalent period of time spent restudying the same materials. This phenomenon has frequently been referred to as the testing effect or retrieval practice effect. In recent years, the testing effect has been repeatedly demonstrated using a wide variety of materials ranging from word pairs to lecture content (for reviews see Carpenter et al., 2008, McDaniel et al., 2007 and Roediger and Karpicke, 2006). While there has been a great deal of research into the cognitive mechanisms underlying the testing effect in recent years, the role of individual differences in cognitive abilities has only recently begun to receive attention ( Bouwmeester and Verkoeijen, 2011 and Brewer and Unsworth, 2012).
Much of the widespread interest in the testing effect reflects its potential for enhancing learning in applied contexts. Naturally, a conclusive finding that such enhancements are confined to a subset of individuals would be of great import. Brewer and Unsworth (2012) reported evidence suggesting just that. They had subjects complete a battery of assessments designed to measure working memory, attention control, episodic memory, and general-fluid intelligence (Unsworth & Spillers, 2010), along with a paired-associate task that served as a measure of the testing effect (study/test was compared to restudy, in a design roughly modeled after Carpenter, Pashler, & Vul, 2006). Brewer and Unsworth observed no correlation between working memory or attention control abilities and the magnitude of the testing effect. However, both the episodic memory and general-fluid intelligence constructs were negatively correlated with the testing effect; that is, low episodic memory and general-fluid intelligence scores were associated with a larger testing effect. Based on their results, Brewer and Unsworth concluded that test-enhanced learning is most effectively targeted at lower-ability students.
Brewer and Unsworth (2012) were circumspect in proffering explanations for the correlation between general-fluid intelligence and the testing effect. With regard to episodic memory, though, they advanced two potential accounts of the negative correlation with the testing effect. The first was that higher-ability subjects may be better able to use elaborative encoding in both the study/test and restudy conditions (relating to the elaborative retrieval hypothesis of Carpenter, 2009), thus reducing the size of the testing effect. The second was that lower-ability subjects may be forced to use more efficient retrieval strategies during initial testing.
The work described here focused on determining whether Brewer and Unsworth’s (2012) episodic memory results can be independently replicated and confirmed. The same methodologies and materials (provided by the original authors) were used. We completed two replication attempts, the first online, sampling from a general population of online experimental subjects, and the second in the laboratory, sampling from university students.
Experiment 1
In Experiment 1, we administered the four episodic memory measures (cued recall, picture source, gender source, and delayed free recall) used by Brewer and Unsworth (2012), along with the same paired-associate testing task (detailed in Carpenter et al., 2006), in the same overall order of presentation, and with the same delay interval between sessions (24 h). Aside from the online data collection (which we did not expect to cause differences in outcome; see Buhrmester et al., 2011 and Crump et al., 2013), the primary difference between this experiment and that of Brewer and Unsworth’s design is that we dropped their ability measures for working memory, attention control, and general-fluid intelligence.