We found that the students in the module courses consistently outperformed students in the control courses on the posttest, and that the effect of treatment on posttest of the GeDI was quite large. Specifically, students in the module courses show a marked improvement on the GeDI, while scores of students in the control courses decreased after instruction. This decrease is driven primarily by poorer performance on items about key concepts (Fig. 1). In this Discussion, we begin by acknowledging the limitations of our experimental design, and also explain why these limitations do not diminish our findings. We then discuss why we believe students in the control courses did poorly, and why students in the module courses did well. Our interpretation relies on the fact that the quantity and quality of time spent engaging students in making observations, collecting data, and constructing and testing predictions through a computer simulation provides a particularly robust learning environment. In the last part of the Discussion, we propose a revision to the learning framework that others (Andrews et al. 2012) have proposed for how students learn genetic drift.
Limitations of the experimental design
It is surprising that the scores of students in the control courses dropped after instruction. Here we consider aspects of our experimental design that may have contributed to this finding. Ultimately, we conclude that these aspects did not bias our results toward this unusual discovery.
Were students in the module courses more motivated to do well on the GeDI?
Students in the control courses did not receive credit for their scores on the GeDI, whereas most of the students in the module courses did. The students in the module courses who received credit for the number of items they answered correctly might have been more motivated to work harder, and do better, on the assessment (Wise and DeMars 2005; but see Couch and Knight 2015 for an opposing point of view). We accounted for this possibility by running two additional analyses. The first compared performance within the module courses. We compared the courses in which credit was assigned for correctness and courses in which credit was assigned only for completion. We found no difference between the groups. The second analysis compared control courses to the eight module courses that assigned credit for completion. We found the same results that we did when we used the full data set. Furthermore, we found no difference in pretest scores between the control courses and module courses, suggesting that the populations were relatively similar before instruction. Therefore, we think that poorer performance on the GeDI post-instruction was unlikely to be due to differences in student motivation between treatments.
How does small class size affect our results?
The average size of the courses in the module treatment was smaller than in the control courses. Eleven of the module courses had class sizes less than 30; although three of the control courses also had class size less than 30, the other two courses in the control treatment were larger than any of the classes in the module treatment (Tables 2, 3). Since small course sizes may impact learning, some of the difference in performance between the treatments could be due to the fact that module courses were smaller. We tested for this possibility with two additional analyses. The first was a GLMM that compared the control courses to the eight module courses with class sizes greater than 30, narrowing the difference in average class size between treatments. The results of this model differed only in that the significant effect of treatment on posttest performance on the items about misconceptions was lost. It did not change the significant differences in overall performance or on items about key concepts (Fig. 1; Table 3). In a second additional analysis, we saw no discernible relationship between class size and posttest performance. The average class sizes in the module treatment were essentially identical between the nine lowest performing classes and the remaining ten classes. Therefore, we suggest that the difference between control and module treatments on the GeDI post-instruction is unlikely to be due to the smaller class sizes in the module treatments.
How does time on task affect our results?
The increase in performance across most key concepts and misconceptions may be due to the fact that students in the module courses spent more time studying genetic drift. Beyond the qualitative differences between the instructional module and common classroom activities, the module takes approximately 2 h to complete (Table 3). We did not quantify the time devoted to genetic drift in the control courses, but it is likely that students in the module courses took part in instruction on genetic drift for longer than many of the students in the control courses.
In addition to differences in time on task, the quality of time spent working actively with genetic drift also differed. We recognize that most instructors do not have the classroom time to devote to this type of prolonged active engagement and hypothesis testing, nor do they have the time to develop modules that efficiently engage students in activities like the Ferrets module. Therefore, given limitations on instructors’ time, it is reasonable to interpret the increased performance of students in the module courses as due in part to the fact that those students spent more time actively solving problems about genetic drift. A major advantage of the Ferrets module is that it already developed, and it was done so through careful implementation of thoughtful pedagogical practices (e.g., American Association for the Advancement of Science 2011; NGSS Lead States 2013; Couch et al. 2015).
Did differences in time between pre- and posttesting affect our conclusions?
The time between pre- and posttest was much longer in the control courses than in the module courses. One interpretation of this result could be that the students in module courses performed better because there was less time between their pre- and posttests. This outcome would predict that the performance among students in the control courses either did not change, or did not increase as much as they did in the module courses. However, we find that the mean performance of students in the control courses actually drops—a result that is not consistent with the explanation that time elapsed is the best explanation. We also found that the course with the longest time between pre- and posttest (mean 97 days SD 4) showed a slight increase from pre- to posttest scores. Therefore, we conclude that the time between pre- and posttest is not sufficient to explain our results.
Performance in the control courses
Although performance among students in the control and Ferrets module courses did not differ significantly on the pretest, it did differ between treatments on the posttest (Figs. 1, 2; Table 4). Students in the module courses improved on the posttest, but students in the control courses performed significantly worse on the posttest because their performance on items about key concepts decreased (Fig. 1). Students in the control courses showed a decrease in their understanding of three of the key concepts associated with genetic drift (as defined by Price et al. 2014; Table 1; Fig. 2a): (1) that genetic drift can lead to a loss in genetic variation, (2) that the effect that drift can have is governed by a population’s effective population size, and (3) that genetic drift works simultaneously with—and can overwhelm—other evolutionary processes. However, students in the control classes increased their performance on the key concept that “random sampling error happens every generation” (Price et al. 2014: 71).
We know of only one other study that looked specifically at students’ understanding of genetic drift before and after instruction (Andrews et al. 2012). In that study, introductory students answered an open-ended question that required them to consider whether genetic drift could explain a shift in genotype. In the pretest, students referred to genetic drift infrequently: only 1 % of the 85 students referred to genetic drift; even within this 1 %, their comments were so vague that they could not be evaluated. After instruction, 21 of the 122 students referred to genetic drift, but only 13 indicated some knowledge of what genetic drift actually does (Table 2 in Andrews et al. 2012).
Andrews et al. (2012) used their results to propose a framework that describes how students acquire knowledge about genetic drift through three stages, one of which is learning to recognize genetic drift as distinct from other evolutionary processes, such as natural selection, mutation, and migration (Stage 2 in Fig. 3). For example, Item 6 on the GeDI asks students whether a biologist would agree or disagree with the (incorrect) statement that “The fact that individuals that were best suited to the environment had a higher rate of survival contributed to genetic drift” (Supplementary Materials in Price et al. 2014). In our study, we find a big increase in performance on this item for students in both the control and module treatments (Fig. 2b), indicating that they are in Stage 2. Because it is so challenging for students to recognize that evolution encompasses more than natural selection (Price and Perez 2016), the fact that the students in the control courses are making this change is noteworthy.
We postulate that students in the control courses are still in Stage 2, because, even though they recognize the existence of different evolutionary processes, they continue to be confused by the distinctions between them, and they are often distracted by vocabulary. For example, students might confuse genetic drift with gene flow, perhaps because they confound the word drift with the idea of migration (Andrews et al. 2012); on Item 18 of the GeDI, students are asked whether a biologist would agree or disagree with the (incorrect) statement that “Since there was no migration there could be no genetic drift” (Supplementary Materials in Price et al. 2014). Although students in the module courses improved on this item, performance on it did not change after instruction among the students in the control classes (Fig. 2c). This indicates that some confusion over vocabulary persists through instruction.
Students in the control group may be performing worse on the items about key concepts because their understanding of genetic drift is only just developing; inaccuracies are possibly being incorporated into or already existing in their conceptual frameworks (Stage 3 in Andrews et al. 2012). The items in the GeDI about key concepts predominantly focus on how genetic drift works and the effect that it has on a population. For example, Item 1 in the GeDI asks students whether a biologist would agree or disagree with the (correct) statement that “Genetic drift is more pronounced in the [founding] island population than the [larger] mainland population in these first few generations” (Supplementary Materials in Price et al. 2014). We suggest that what students typically learn during instruction is that genetic drift has a powerful effect in founding populations. This focus on small populations, however, can lead to an incorrect conclusion that genetic drift occurs only in small populations, and students often fail to recognize that drift occurs in all real, finite populations. The fact that a misconception like this could emerge from instruction may be a natural consequence of students making sense of new ideas. Indeed, it is unlikely that students think about the situations in which genetic drift occurs before they fully understand what genetic drift is.
Price et al. (2014) suggest that items about key concepts are less difficult for students than items about misconceptions. In this study, the opposite pattern appears to hold true. One key difference between our study and theirs that might explain the opposing findings is that the testing in Price et al. (2014) was completed before instruction. It may be that, for this challenging topic, misconceptions are most difficult prior to instruction, but they are nonetheless easier to dispel then key concepts are to acquire. Moreover, all of the courses used for final testing in Price et al. (2014) were upper division courses, in which students had previously been exposed to genetic drift. It is therefore conceivable that students were already in Stage 2 of the learning framework when they took the GeDI. The students in our control courses are primarily—but not exclusively—in general biology courses (Table 2). Future work exploring the pre- and post-instruction difficulty of items about misconceptions and key concepts could help inform a new model of student learning in genetic drift.
Performance in the module courses
The effect of the Ferrets module on student learning was substantial for a short intervention (average Cohen’s d within module courses = 0.63). Students in the Ferrets module courses significantly outperformed students in the control courses because they did better on items about key concepts, suggesting that they had a better understanding of genetic drift (Figs. 1, 2; Table 4). Their performance even improved on items about key concepts and misconceptions that were not explicitly covered in the Ferrets module, generally to a greater degree than did control course students (Fig. 2c).
The Ferrets module was designed to engage students by guiding them through the construction of their own concept of genetic drift by making observations, collecting data, and making and testing predictions. While this approach is not unique to the Ferrets module, the combination of observations and experimentation with simulations is what makes the module treatment different from the controls. Our experimental design does not allow us to determine how much each of the individual practices contributed to learning. Instead, we can only offer the evidence that the multi-faceted approach to instruction in the Ferrets module supported learning better than classroom instruction alone.
Although we cannot attribute the gains in GeDI scores to specific elements of the Ferrets module, we suggest that computer simulations support learning for topics such as genetic drift. Simulations allow students to investigate population-level phenomena that span generations, like genetic drift, which are otherwise not amenable to investigation in the classroom given the time and spatial scales involved. The visualizations available in the Ferrets module enable students to observe and experiment with several aspects of drift, including the random changes in allele frequency due to sampling error, that these changes occur every generation, and that drift occurs in populations of any finite size. In support of the suggestion that visualization may help teach random processes, Meir et al. (2005) demonstrated that simulations of osmosis and diffusion decreased misconceptions about those molecular phenomena because they allowed students to directly observe the random movement of molecules. Within the Ferrets module, students can set parameters such as population size and initial allele frequency; repeatedly test the effect of varying these parameters; and make and test predictions. Since drift is a phenomenon where the starting conditions impact the outcome in a probabilistic way, repeated testing and varying of parameters may help students build understanding in a way that is difficult to do with reading, lecture, and static representations. As suggested by Windschitl and Andre (1998), simulations that allow for exploration can be effective tools to overcome misconceptions and effect conceptual change. Separating the impacts of computer-based simulations and experimentation would be a topic for future research.
Some intriguing aspects of students’ performance suggest that learning during this activity is particularly sophisticated. In the learning framework hypothesized by Andrews et al. (2012), students begin to recognize different mechanisms of evolution (Stage 2) before they learn content specific to genetic drift (Stage 3); this is the pattern that we observed in students in the control courses. Students in the module courses were increasing their understanding of both vocabulary and genetic drift during the module (Figs. 1, 2; Table 4). Students in the module courses also improved on both key concepts and misconceptions, including some that were not directly addressed in the Ferrets module’s instructions or simulations (Table 2; Fig. 2). This approach was clearly effective.
Revised hypothetical framework for learning genetic drift
As described above, Andrews et al. (2012) hypothesized three stages for learning genetic drift: (1) undeveloped concepts of evolution and genetics at the broadest level, (2) undeveloped and overlapping concepts of different evolutionary mechanisms, and (3) developing understanding about genetic drift in particular. Our results lead us to suggest to revise this framework to incorporate multiple learning pathways, rather than a linear progression through the stages (Fig. 3). The results from students in the module classes suggest that students can move from Stage 1 to either Stage 2 or Stage 3, or to both Stage 2 and Stage 3 simultaneously.
We note that Stage 1 and Stage 2 are about students’ general understanding of evolution, not specifically genetic drift. Our interpretation is that, when students, such as those in the control courses, are moving through Stage 2, they are actually expanding what they know about evolution by recognizing that many different mechanisms of evolution exist. This realization in itself is quite challenging (Price and Perez 2016). Thus, there is misalignment between what students are learning and what instructors intend to be teaching. Instructors think they are teaching genetic drift, but student thinking is revolutionized by a more basic concept that there is more to evolution than natural selection.