The first article in the collection, by Furrow and Hsu (2019), introduces educators to 14 evolution education instruments and discusses the ways in which these tools may be used to embark upon evidence-based evolution education. For example, Furrow and Hsu outline how these instruments may be used to: inform the development of course learning objectives; uncover student ideas and misconceptions prior to teaching; measure student learning in an activity, module, or course; or inspire the design of activities targeting challenging concepts. The article contains numerous examples from the authors’ experiences that nicely illustrate how they have leveraged these tools to improve learning outcomes. The authors end by emphasizing that while there are many existing measurement instruments that should be more widely used, there are notable conceptual gaps (e.g., speciation, human evolution, sexual selection, quantitative genetics, evolutionary medicine, and biodiversity). Development of these tools will require collaborative efforts among evolutionary biologists and educators.
There are several different conceptual frameworks that may be used for evaluating the quality of measurement instruments (e.g. Standards of Educational and Psychological Measurement (AERA et al. 2014; Pellegrino et al. 2016). Mead et al. (2019) employ Messick’s (1995) seminal framework to document the forms of evidence that have been used to support the inferences drawn from many commonly-used evolution education measurement instruments. As Mead et al. emphasize in their review, it is the inferences drawn from measurement tools that we seek to validate, not the instruments themselves. The Standards have served an essential role in the design of measurement instruments for a half century, but they remain largely unknown to evolution educators. Mead et al. review the diverse range of evidence types that are commonly used to support claims about the meanings of instrument scores. Yet as their study reveals, many forms of evidence are lacking for evolution education instruments, and such evidence has been drawn from a narrow range of participant populations (e.g., particular universities, student levels, and demographic groups). This situation is not unique to evolution education, however. A parallel study of instrument quality in genomics and bioinformatics education found that > 90% of published studies lacked any form of supporting evidence (Campbell and Nehm 2013). Clearly, much more work is needed to expand the forms of the evidence that are used in instrument development and validation. The analytical framework and supporting literature review that Mead et al. provide help to capture a valuable “snapshot” of the field as is stands today, as well as to guide further research on instruments and instrumentation. Most striking in their review was the number of times many of the instruments were administered in some novel way without providing additional evidence that the inferences drawn from an instrument were valid and reliable.
Although genetic drift is an important evolutionary process, only one measurement instrument-the GeDI-has been developed to measure knowledge and misconceptions of it. Tornabene et al. (2018) explore the psychometric properties of the instrument using an updated theoretical framework (Rasch analysis) and a new participant sample (undergraduates from the northeastern United States). The study reviews the conceptual weaknesses of the most commonly-used analytical framework for developing and evaluating instruments (Classical Test Theory) and highlights the comparative strengths of Item Response Theory and Rasch Analysis. Tornabene et al. also explore whether the delivery of the GeDI instrument (i.e. the order of the item packets presented to students) impacts measures of performance. The study corroborates many claims advanced in the original validation study, bolstering confidence in the quality of the inferences drawn from GeDI scores. The study also highlights some improvements that would further differentiate high and low performing students. Overall, the study illustrates how validation is a never-ending process, and not one that ends with publication of a new instrument.
Three articles—Sbeglia and Nehm (2018), Romine et al. (2018), and Barnes et al. (2019)—focus their attention on three instruments (i.e., MATE, I-SEA, and GAENE) in order to better understand the measurement of evolution acceptance. The three articles provide an interesting study in contrasts given that they approach the challenge of evaluating these tools from different theoretical (instrument/construct vs. item focused) and psychometric (Classical Test Theory vs. Rasch) vantage points. As such, the three studies help to illustrate the diverse theoretical and methodological approaches that may be used to scrutinize the inferences that may be drawn from instrument scores.
Sbeglia and Nehm (2018) provide a detailed evaluation of the GAENE instrument. Their study advances work on the instrument in several ways. First, it uses a much larger and more diverse participant sample than the original study in a pre-post study design. Second, it examines whether the rating scale (e.g., agree, disagree) used to elicit student responses functions as anticipated. Third, it explores the impact of race, gender, and degree plan (biology majors and non-majors) on the acceptance measures produced by the instrument. Corroborating recent findings by Metzger et al. (2018), the study identifies several problematic aspects of item functioning that need to be addressed. In addition, it notes significant differences in Rasch measures between races and genders, but no significant differences between students pursuing biology vs. other degree plans. Finally, the study illustrates that measuring pre-post evolution acceptance using the GAENE and MATE does not produce different inferences about evolution acceptance magnitudes–both indicate that completing an evolution-focused class was associated with a small effect.
Given that three instruments—MATE, I-SEA, and GAENE—have been developed to measure evolution acceptance, and varying perspectives exist regarding the benefits or drawbacks of each one, Romine et al. (2018) take a fresh approach and administer all 57 items from the three instruments as a single corpus to a large student sample. Perhaps the most intriguing finding of their study is that the 38 positively-worded items from across the instruments hang together as a dimension that Romine et al. refer to as the “acceptance of the truth of evolution”, and the 19 negatively-worded items from across the instruments hang together as a dimension that they refer to as the “rejection of incredible ideas about evolution”. Romine et al. argue that the empirical distinction between acceptance of truth and rejection of incredible ideas aligns with theoretical perspectives about evolution acceptance and conceptual change theories in science education (see Posner et al. 1982). Specifically, Romine et al. argue that their acceptance of the truth of evolution dimension aligns with the intelligibility and plausibility aspects of conceptual change theory, and the rejection of incredible ideas about evolution captures students’ dissatisfaction with non-scientific ideas and the likelihood that they find the scientifically-acceptable idea fruitful. Overall, Romine et al. bring novel insights to the debate concerning the functioning of these three measurement instruments while advancing theoretical perspectives on the construct of evolution acceptance.
Taking a similar approach as Romine et al. (2018), Barnes et al. (2019) administered items from several evolution acceptance instruments as a single corpus. In contrast to the approach taken by Romine et al. (see above), Barnes et al. analyzed the findings from each instrument separately. By employing the same participant sample, Barnes et al. were able to explore the similarities and differences in scores generated by the instruments and thereby examine how instrument choice would impact results and conclusions about evolution acceptance. An impressive aspect of the study is the diversity of participant background variables collected (e.g., parental education, political affiliation, religiosity, religious affiliation). The study findings corroborate several themes from prior work: religiosity was a statistically significant negative predictor across all instruments, the nature of science was a statistically significant positive predictor of evolution acceptance across all evolution acceptance instruments, and the strength and statistical significance of the relationship between evolution understanding and evolution acceptance was variable depending on the instrument chosen. The study provides many insights into contradictory findings in the evolution education literature.
Beggrow and Sbeglia (2019) provide an interesting point of departure from the other articles in the collection by exploring the assessment of evolutionary knowledge and understanding across two different degree programs (anthropology and biology) within the same university. In addition to analyzing evolutionary reasoning across these educational contexts, Beggrow and Sbeglia develop new, human-focused versions of the ACORNS instrument in order to examine anthropology and biology students’ reasoning about human evolutionary change. The investigation explores an important cognitive question: it attempts to tease apart the roles that familiarity with taxa and traits play when measuring reasoning about evolutionary gain and loss. Their study of evolutionary reasoning about humans builds upon a growing body of work showing that organismal contexts used to uncover student reasoning have significant impacts (Nehm 2018; Sbeglia and Nehm 2019). It also highlights the complexity of comparing evolutionary reasoning in the two student groups given that they differ dramatically in background characteristics. The differences between populations in the study complicate the question of whether learning about evolution using human examples is an effective educational strategy.