Course description
The study took place over two consecutive years in the fall semester of the same introductory biology course at a large private university in the Northeast. In both years, the course was split into morning and afternoon lecture sections. Each section was taught by the same instructor. In Year 1, 620 students were enrolled in the class, and 410 students were enrolled in Year 2. The course is required for students majoring in biology and related fields, but, as the university does not offer a survey of biology course for non-majors, many students who do not intend to major in the life sciences also enroll. The course employs the widely used Campbell Biology textbook (Reece et al. 2014) and, from its syllabus description, is “the first of a two-course sequence comprising a survey of essential biological concepts ranging from the molecular level to global ecology.” According to the course description, students “explore the nature of science and the diversity of organisms within a framework of major themes including the flow and regulation of energy and information within living systems, and the central and unifying concept of evolution.”
Culturally sensitive and culturally competent practices were implemented early in the course in an effort to mitigate student perceptions of conflict between students’ religious beliefs and acceptance of evolution (Barnes and Brownell 2017; Barnes et al. 2017a; Bertka et al. 2019). Such practices included instruction in the Nature of Science, including the limitations of science to evidence and claims rooted in the natural, physical, testable world, and, perhaps more importantly, explicit examples of individuals across a diversity of religious traditions who had reconciled evolutionary science with their sincerely-held religious faith. Examples of the former include that students were presented with and given the opportunity to discuss the statements collected by the Clergy Letter Project (Zimmerman 2010), comprising clearly-worded affirmations from Christian clergy of many denominations as well as Jewish, Buddhist, Unitarian Universalist, and Humanist clergy, of the veracity and importance of evolution along with unambiguous assertions that acceptance of evolution is entirely compatible with the tenets of their religious traditions. Additionally, students were presented with specific examples of evolutionary scientists who are open and practicing Catholic, Protestant, and Evangelical Christians, other scientists who are Muslims, and religious leaders of the Hindu, Buddhist, and Christian traditions who have advocated for the compatibility of evolution and their respective faiths. Students were also presented with the Inter-Academy Panel Statement on the Teaching of Evolution (Inter-Academy Panel 2007) in which the academies of science from countries around the world expressed agreement on “evidence-based facts” which “scientific evidence has never contradicted,” including that the universe and Earth are billions of years old, that the evolution of life on Earth is also measured in billions of years, and that “all organisms living today, including humans, clearly indicate their common primordial origin.”
Evolution is used as a recurring and organizing theme throughout the course, and it is identified as such from the first day of class. Prior to the lesson at the heart of this study, students had explored evolution in the context of the origin of biological molecules, as an extension of cell theory, and as an explanation for the diversity of life. They had been assigned to read and had class discussion around the Understanding Evolution website’s sections on the patterns of evolution (University of California Museum of Paleontology 2021).
The lesson
To test the impacts of species context, two versions of the same phylogeny lesson were adapted for use in this study; a treatment version (human) and a control version (non-human mammals). The lesson took place over the course of a single 55-min class day. The structure of the lesson was adapted from a previously designed active learning lesson (Nelson and Nickels 2001, Additional file 1), and was typical of a normal class day. In each year, a coin flip was used to determine which section received the human version and which received the non-human mammal version. The same instructor (RDPD) taught both sections in each year to minimize any effect of instructor on student outcomes.
After a brief introduction from the instructor, students were instructed to form groups with their neighbors and work together to construct phylogenetic trees from molecular sequence data. Students were then given species names for each molecular sequence and used the phylogenies to answer questions about the different species. In the treatment version, several of the questions required students to consider humans’ placement in the phylogeny. These questions were the same in the control version, but asked about the corresponding non-human species, instead. The species in the human version of the lesson included humans, chimpanzees, gorillas, and other non-human primates. The non-human mammal version included dogs, and different species of weasel, badger, and mongoose. We chose these non-human mammals because they were likely to be familiar without evoking strong feelings, which may have been the case if the phylogeny was mostly domestic pets, charismatic megafauna, or species that students may be averse to. All materials of the lesson, including the worksheets and instructions provided to students, were identical between the treatment and control lessons other than the species names in the worksheet.
The lesson was situated in the context of students learning about phylogenetic trees in preparation for applying tree-thinking as the primary organizational scheme for exploring diversity within, and relationships among, major biological taxa. Students had some prior exposure to phylogenetic trees, including the type used in the study lesson, but they had not yet been presented with trees depicting humans in relation to other animals.
Measurements: independent variables
Students completed a pre-class survey online that was due before class began on the morning of the lesson. Students were given several days to complete this survey. This pre-class survey included instruments to measure phylogeny content knowledge, and several measures about students’ affect toward the course in general. These included students’ level of engagement with course content, perceived relevance of the course content, and discomfort with course content. Students’ phylogeny content knowledge before the lesson was measured using ten items from the Tree Thinking Concept Inventory (TTCI) (Gibson and Hoefnagels 2015) that were most pertinent to the lessons’ content. Previously developed instruments were modified and used to measure student engagement with course content (Richmond 1990), perceived relevance of course content (Frymier and Shulman 1995), and discomfort with course content (Barnes et al. 2020a). The prompt to each of these instruments on this pre-class survey instructed students to answer items in regard to the course (Additional file 1). As a shorthand, we refer to these course-level measures as Engagement (course), Relevance (course), and Discomfort (course). This study took place after students had been in the course for several weeks, so students had time to develop opinions about the course.
Student acceptance of human evolution was measured using the Inventory of Student Evolution Acceptance (I-SEA) (Nadelson and Southerland 2012). In Year 1, students completed the entire I-SEA once during the final week of the course. In Year 2, students completed just the human evolution portion of the I-SEA as part of the pre-class survey. Because human evolution acceptance is the primary construct of interest in this study, we only analyzed the eight items making up the human evolution portion of the I-SEA. As previous research has shown, the I-SEA can function as sub-scales for acceptance of microevolution, macroevolution, and human evolution (Sbeglia and Nehm 2019).
Measurements: dependent variables
Students were assigned an additional online survey due within three days of the phylogeny lesson. This post-class survey measured phylogeny content knowledge using the same ten items from the TTCI that students took on the pre-class survey. This survey also included the same items to measure engagement, perceived relevance, and discomfort instruments, but this time the prompts instructed students to refer to their experience with the content from the phylogeny lesson. Thus, these measures capture students’ engagement with the phylogeny lesson content, perceived relevance of the phylogeny lessons’ content, and discomfort with the phylogeny lessons’ content. As a shorthand, we refer to these as Engagement (lesson), Relevance (lesson), and Discomfort (lesson).
An overview of the study design can be found in Fig. 1. All research activities were approved by Syracuse University IRB, protocol #18-248.
Analyses
All instruments showed high internal reliability for measures regarding the course content measures about the lesson. For perceived relevance, Cronbach’s alpha was 0.88 for responses about the course and 0.90 for responses about the lesson. For engagement, α = 0.86 for responses about the course and 0.88 for responses about the lesson. For discomfort, α = 0.92 for responses about the course and 0.93 for responses about the lesson. For the human acceptance scale from the I-SEA, α = 0.91. Evidence of validity for all instruments was supported based on internal structure and the relationship of measurements to other variables (Additional file 1). Responses for content knowledge, engagement, perceived relevance, and discomfort were summed and treated as continuous variables for each student.
Multiple linear regression modeling was used to test our predictions. The first prediction, that human examples will be associated with greater learning gains than non-human animal examples, was tested by modeling the effect of treatment (human or non-human animal lesson) on post-class TTCI scores. We included pre-class TTCI scores, human evolution acceptance scores (from the I-SEA), and year (Year 1 or Year 2) as control variables in this model. The effect of treatment in this model provides an explanation of whether species context resulted in significantly different gains to TTCI scores, holding TTCI scores prior to the lesson, human evolution acceptance, and experimental year constant. Two additional models were run to test the associated predictions that the effect of species context on learning gains will be moderated by (1) students’ prior content knowledge and (2) students’ acceptance of human evolution. These models were the same as the main effects model described above but included interactions between (1) treatment and pre-class TTCI scores and (2) treatment and human evolution acceptance scores. The interaction in each of these models was used to determine whether any effect of species context on learning gains was moderated by either prior content knowledge or human evolution acceptance.
The second prediction, that human examples will be associated with greater perceived relevance of the lesson content than non-human animal examples, was tested by modeling the effect of treatment on students’ relevance (lesson) scores. In this model, we included relevance (course) scores, human evolution acceptance scores, and year in the model as control variables. The effect of treatment in this model was used to test whether there was a main effect of treatment on perceived relevance, holding students’ perceived relevance of course content, human evolution acceptance, and experimental year constant. One additional model was run to test the associated prediction that the effect of species context on perceived relevance of lesson content will be moderated by students’ acceptance of human evolution. This model was the same as the main effects model but included an interaction between treatment and students’ acceptance of human evolution. The third prediction, that human examples will be associated with greater engagement with the lesson content than non-human animal examples, and the associated prediction that this effect will be moderated by students’ acceptance of human evolution, was tested using the same two-step modeling framework described for perceived relevance, except measures of perceived relevance of course and lesson content were replaced with measures of engagement with course and lesson content, respectively.
The fourth prediction, that human examples will be associated with greater student discomfort with the lesson content than non-human animal examples for students with lower acceptance of human evolution, was tested using logistic regression. This approach was taken based on the distribution of discomfort (course) and discomfort (lesson) scores. Discomfort scores of zero and four were extremely common, resulting in a bimodal distribution not suitable for typical linear regression (Additional file 1: Figure S1). Discomfort (lesson) scores were dichotomized so students with a score greater than zero were coded as 1 and students with a score of zero as 0. Using these dichotomized scores, logistic regression was used to model the probability that a student reported any level of discomfort with the lesson content above zero, with treatment, discomfort (course), human evolution acceptance scores, year, and an interaction between treatment and human evolution acceptance scores modeled as predictors variables. The effect of this interaction was used to test whether there was a moderating effect of human evolution acceptance on the association between species context and discomfort with the lesson content. An additional main effects model without the interaction was run to help interpret the interaction coefficient.
Missing data and multiple imputation
Complete responses to all items in the pre- and post-surveys were uncommon. After accounting for issues with non-response, 145 out of 620 consenting students from Year 1 and 128 out of 410 consenting students from Year 2 provided complete case data (40.8% of the consenting students).
To handle issues with missing data, the regression analyses described above were run two different ways. First by using listwise deletion and then using multiple imputation methods. For the imputation, we assumed that data were missing at random (MAR) (Rubin 2004), because class grades, which were available for nearly every student, were negatively associated with data missingness. A Mann–Whitney test indicated that students with complete data for the surveys performed significantly better in the course than those with missing data, U = 69,678, p < 0.001. Multiple imputation was run with a fully conditional specification (van Buuren 2007, 2018) in R using the mice package (van Buuren et al. 2011). Predictive mean matching (pmm) was used to calculate imputed values of all variables (Andridge and Little 2010). All variables were imputed at the level of the sum score. Because interactions were of theoretical interest, imputations were performed separately for students in the human arm and students in the animal arm of the study. One hundred datasets were imputed for each subset before being recombined into the final imputed datasets. Diagnostics for convergence and model fit were performed before analyses were performed on the imputed data (Additional file 1: Figures S3–S14). Regression model results were pooled according to recommended guidelines (Rubin 2004).
Model estimates were similar between listwise deletion and the imputed datasets for all models. However, because of uncertainty involved in analyses with missing data, we report regression results for both methods.