Skip to main content
  • RESEARCH ARTICLE
  • Open access
  • Published:

Different evolution acceptance instruments lead to different research findings

Abstract

Background

Despite widespread concern about the differential measurement of evolution acceptance among researchers, no one has systematically explored how instrument choice can impact research results and conclusions in evolution education studies. In this study, we administered six evolution acceptance instruments in a single survey to students in undergraduate biology courses at universities in Arizona, Colorado, and Utah. We conducted separate analyses with the same students for the six different evolution acceptance instruments to understand how different results and conclusions may arise based on different evolution acceptance instruments used.

Results

We found statistically significant differences in levels of evolution acceptance across the three student populations when using a human evolution acceptance instrument, but not when using a microevolution acceptance instrument. Further, significance/effect sizes of variables associated with evolution acceptance differed beyond sampling variation depending on the evolution acceptance instrument used. The results of analyses using different evolution acceptance instruments were most often dissimilar when examining the effect of evolution understanding and identifying as Protestant/Mormon on evolution acceptance.

Conclusions

We found that different instruments used to measure evolution acceptance sometimes led to different research results and conclusions. The extent to which variables predicted evolution acceptance was dependent on the instrument used to measure acceptance, which has the potential to explain over 30 years of conflicting research on the relationship between evolution acceptance and understanding. These results indicate that before researchers may be able to determine how to best improve evolution acceptance, the evolution education community may need to articulate a consistent definition of evolution acceptance and identify a singular valid and reliable instrument to quantify evolution acceptance so results can be compared across studies.

Introduction

Decades of research has resulted in little consensus on which factors are most important for student evolution acceptance (Barnes et al. 2017; Mead et al. 2018; Smith 2009a) and how to best increase evolution acceptance (Barnes and Brownell 2017; Mead et al. 2018). One explanation for this lack of consensus is that researchers use different instruments to measure evolution acceptance that were designed using different definitions of evolution acceptance (Glaze and Goldston 2015; Lloyd-Strovas and Bernal 2012; Smith 2009a). In this study, we administered a survey containing multiple evolution acceptance instruments to undergraduate students in Arizona, Colorado, and Utah. We illustrate the similarities and differences in the results from each instrument and how the instrument choice can impact the results and conclusions of a study.

Background

Due to the low levels of evolution acceptance among members of the public (Gallup 2017; Pew 2013) and college students (Brem et al. 2003; Ingram and Nelson 2006; Rice et al. 2010; Walter et al. 2013), research on how to increase evolution acceptance has become one of the predominant subfields of evolution education. Despite over 30 years of research on how to improve evolution acceptance, rates of acceptance in the United States have remained relatively unchanged (Gallup 2017). Further, although at least 300 articles have been published examining evolution acceptance, little consensus has emerged regarding the relationship between different variables and evolution acceptance. While some studies have found large positive relationships between evolution acceptance and evolution understanding (Rutledge and Warden 1999; Trani 2004), some studies report a weak relationship between acceptance and understanding (Athanasiou and Papadopoulou 2012; Cavallo et al. 2011; Deniz et al. 2008; Großschedl et al. 2014; Nadelson and Sinatra 2009), and other studies report no relationship between evolution acceptance and understanding (Bishop and Anderson 1990; Brem et al. 2003; Lawson 1983; Sinatra et al. 2003). Further, what best predicts evolution acceptance is variable across studies; religiosity, evolution understanding, and Nature of Science (NOS) understanding have each been reported as the biggest predictors for evolution acceptance in different studies (Carter and Wiles 2014; Dunk et al. 2017; Glaze et al. 2014; Mead et al. 2018; Weisberg et al. 2018). This lack of overall consensus on the relationships between evolution acceptance and other variables could be one reason why we have seen so little change in evolution acceptance in the United States for 30 years. How can educators determine the best methods for increasing evolution acceptance if the research community has not reached consensus on how variables relate to evolution acceptance?

One explanation for these inconsistencies in evolution acceptance findings is that researchers measure evolution acceptance differently and that this could lead to disparate results and conclusions. Prior to the publication of peer-reviewed evolution acceptance instruments, evolution education researchers used dozens of different evolution acceptance instruments that were usually constructed for use in a single study (Bishop and Anderson 1990; Johnson and Peeples 1987; Lawson 1983; Sinatra et al. 2003). The Measure of the Theory of Acceptance of Evolution (MATE) was published in 1999 and it has slowly gained popularity in the evolution education community to measure evolution acceptance (Rutledge and Warden 1999). More recently, the Inventory of Student Evolution Acceptance (I-SEA; Nadelson and Southerland 2012) and the Generalized Acceptance of EvolutioN Exam (GAENE; Smith et al. 2016) have been published to measure evolution acceptance. The availability of different instruments to measure evolution acceptance means that evolution education researchers have to make a decision about the best way to measure evolution acceptance.

Evolution education researchers have expressed repeated concern about how evolution acceptance is measured (Glaze and Goldston 2015; Lloyd-Strovas and Bernal 2012; Sickel and Friedrichsen 2013; Smith 2009a). Dating back to more than 20 years ago, review articles have suggested problems with the measurement of evolution acceptance. In Smith et al. (1995) the authors highlighted that individuals with different levels of Nature of Science (NOS) understanding could be confused by the wording of questions in instruments meant to measure evolution acceptance, which could lead to inflated or even erroneous correlations between Nature of Science (NOS) understanding and evolution acceptance. In 2009, Smith et al. wrote a review article in which they also expressed concern that some instrument items meant to capture evolution acceptance may measure evolution understanding, which could lead to inflated correlations between evolution understanding and evolution acceptance. When Lloyd-Strovas and Bernal (2012) reviewed the literature on undergraduate evolution education, they found it hard to detect patterns because the instruments used to measure evolution acceptance were so different that they claimed the studies were “not comparable.” In Sickel and Friedrichsen (2013) the authors raised concerns that instruments used to measure evolution acceptance of the respondent include items about whether the respondent thinks that scientists accept evolution, which could lead to inflated rates of evolution acceptance in research findings. Nadelson and Southerland (2012) further expressed concern that many evolution acceptance instruments do not disentangle the role of context (e.g. evolution occurring in humans or evolution occurring in plants) in evolution acceptance. However, despite the prevalence of these concerns, the evolution education community has not reached a consensus for how we should measure evolution acceptance and researchers continue to use a variety of instruments to measure student evolution acceptance. Further, researchers often compare conclusions from studies that use different evolution acceptance instruments (Wiles and Alters 2011; Glaze and Goldston 2015). This practice implies that conclusions are comparable across studies using different evolution acceptance instruments, but little prior research has determined whether using different evolution acceptance instruments leads to the same conclusions about evolution acceptance (Romine et al. 2018; Sbeglia and Nehm 2018).

In our literature review, we found preliminary evidence that different instruments to measure evolution acceptance could be a cause of inconsistent research results in the literature. When we examined research studies exploring the relationship between evolution acceptance and evolution understanding, we found that Bishop and Anderson (1990), Sinatra et al. (2003), and Hermann (2012) all used similar measures of evolution acceptance in which students were asked the extent to which they believed/accepted evolution or thought evolution was true/credible. All three of these studies found no relationship between evolution acceptance and evolution understanding. However, we found that in studies in which researchers used the Measure of Acceptance of the Theory of Evolution (MATE) to measure evolution acceptance, researchers consistently found a positive relationship between acceptance and understanding (Rutledge and Warden 2000; Trani 2004); no studies to our knowledge that use the MATE have ever reported an insignificant relationship between evolution acceptance and understanding. However, these studies were conducted with different populations of students, so researchers cannot determine from these studies alone whether it was the evolution acceptance instruments used that led to the conflicting research findings.

There have been very few published studies that have used the I-SEA or the GAENE to measure evolution acceptance because they are fairly new. Nadelson and Hardy (2015) have used the I-SEA to show that introductory undergraduate psychology students’ acceptance of microevolution, macroevolution, and human evolution are related to more trust in science and scientists, less conservative political orientations, and weaker religious commitment, similar to findings using other instruments (Dunk et al. 2017; Glaze and Goldston 2015). Using the I-SEA, other studies have shown that undergraduate students have higher acceptance of microevolution than macroevolution and human evolution (Nadelson and Hardy 2015; Nadelson and Southerland 2012; Schleith 2017). In one study, researchers showed that scores on the GAENE and MATE were strongly correlated among introductory undergraduate health sciences students (Metzger et al. 2018) and one study showed that GAENE scores increased among some high school students after evolution instruction (Pobiner et al. 2018). No peer-reviewed published study to our knowledge has reported the results of the relationship between evolution understanding and acceptance using either the GAENE or I-SEA.

One goal of the current study was to explore whether inconsistent research conclusions about evolution understanding and evolution acceptance could arise because of different instruments used to measure evolution acceptance. However, we also explore whether research findings could be inconsistent when using different evolution acceptance instruments to examine relationships with Nature of Science (NOS) understanding, religiosity, religious affiliation, political affiliation, and race/ethnicity.

Research questions and methods

The overarching goal for the study was to compare research findings from different instruments that have been previously used to measure evolution acceptance. We administered six different multi-item instruments to measure evolution acceptance in a single survey to the same students and then compared the findings from each instrument using predefined criteria to determine what results and conclusions were different.

Our specific research questions were:

  1. 1.

    Do various instruments lead to different conclusions about the level of evolution acceptance among populations?

  2. 2.

    Do various instruments lead to different results and conclusions about the relationships between student variables (e.g., evolution understanding, religion) and levels of evolution acceptance?

Survey and administration

In the fall of 2017 and spring of 2018, we sent a survey to ~ 2300 students from nine introductory biology courses at a research-intensive university in urban Arizona, ~ 190 students in two introductory biology courses at a comprehensive institution in rural Colorado, and ~ 200 students in four introductory biology courses at a primarily undergraduate institution in suburban Utah. Data were collected mid-semester. Students were offered a small amount of extra credit for completing the survey. We collected student evolution acceptance with six evolution acceptance instruments and also collected demographic information from students. To ensure differences in results were not due to an instrument order effect, students were given the evolution acceptance instruments in a random order. Demographic questions were presented at the end of the survey. For Arizona students, we also collected data on their evolution understanding and Nature of Science (NOS) understanding. See Table 1 for a list of data we collected for this study. The research study was approved by the Arizona and Colorado institutional review boards for all research in this manuscript, Protocol Numbers 00007719 and 1131916-2, respectively.

Table 1 Data collected in the current study for each student population

Evolution acceptance instruments

We administered six evolution acceptance instruments to students. Below we describe each instrument, including its prior prevalence in the literature, the motivation behind the instrument’s construction, its unique features compared to other instruments, and its validity and reliability evidence.

Measure of Acceptance of the Theory of Evolution (MATE; Rutledge and Warden 1999)

The MATE is the most popular instrument used to measure evolution acceptance in the evolution education literature (Smith et al. 2016); we found 51 studies published in academic journals that have used the MATE to measure evolution acceptance.

The MATE has 20-items and was originally designed to measure high school biology teachers’ evolution acceptance. Items on the MATE were reviewed and approved by a group of evolutionary biologists, science educators, and a philosopher of science to establish content validity (Rutledge and Warden 1999). The authors of the MATE reported that a factor analysis of their instrument revealed only one factor, and the authors deemed this single construct was evolution acceptance. However, more recent analyses have suggested the MATE is multi-dimensional (Metzger et al. 2018). The authors of the MATE reported acceptable reliability with high school biology teachers (Rutledge and Warden 1999) and in a subsequent publication from Rutledge and Sadler (2007), the authors reported acceptable reliability of the MATE with college students. High reliability of the MATE has been confirmed by multiple subsequent studies (Barone et al. 2014; Manwaring et al. 2015; Nadelson and Sinatra 2009).

Rutledge and Warden (1999) did not provide an explicit definition of “evolution acceptance” for the basis of their instrument, but did cite Schwabb (1968) as justification for what they included in the MATE, in the following passage:

“because informed decisions of acceptance or rejection of a scientific theory are based on evaluations of substantive and syntactical elements of a domain, fundamental concepts of evolutionary theory and the nature of science were selected to be addressed by the MATE: the processes of evolution, the available evidence of evolutionary change, the ability of evolutionary theory to explain phenomena, the evolution of humans, the age of the earth, the independent validity of science as a way of knowing, and the current status of evolutionary theory within the scientific community.” (pg. 14)

Rutledge and Warden define what concepts are included in the definition of “evolution acceptance,” but do not define what is meant by “acceptance”. Based on the strongly agree-strongly disagree Likert-style response scale of the MATE, acceptance is implicitly defined as level of agreement with each of the concepts included in the MATE.

The MATE has been criticized for including questions that could measure evolution understanding, Nature of Science (NOS) understanding, religiosity, and perceptions of scientists’ acceptance of evolution (Smith 2009a). If items on the MATE measure these constructs in addition to evolution acceptance, then we may see inaccurate correlations between evolution acceptance and these constructs. For instance, one item on the MATE asks students if they agree that “the age of the earth is at least 4 billion years”; to be scored as an acceptor on this question, students would need to use their understanding of the age of Earth. A student could have an inaccurate understanding of the age of the Earth (e.g. that it is 2 billion years old) and choose “disagree” even though they accept that the Earth is old and accept evolution. The use of the MATE is often justified by its prior prevalent use in the literature, even though many criticisms of this instrument have been published (Metzger et al. 2018; Nadelson and Southerland 2012; Romine et al. 2017; Smith et al. 2016).

Inventory of Student Evolution Acceptance (I-SEA: microevolution, macroevolution, and human evolution; Nadelson and Southerland 2012)

The I-SEA was published by Nadelson and Southerland in 2012 and we identified two studies published in the academic literature (Nadelson and Southerland 2012; Nadelson and Hardy 2015) and three dissertations/theses that have used the I-SEA to measure evolution acceptance.

The I-SEA is a 24-item Likert-style instrument developed to measure evolution acceptance among high school and college students. The I-SEA was constructed to provide a new measure of acceptance that addressed what the authors considered two shortcomings of prior instruments used to measure evolution acceptance. First, they argued that other instruments conflated students’ evolution acceptance with student evolution understanding, so the I-SEA was designed to not conflate these. Second, other instruments did not disaggregate microevolution from macroevolution from human evolution, so the I-SEA was developed with three distinct subscales of student acceptance of microevolution, macroevolution, and human evolution.

Nadelson and Southerland (2012) defined evolution as being comprised of three distinct contexts that are relevant for evolution acceptance: microevolution; defined as the results of evolution in the short term, macroevolution; defined as the results of evolution in the long term, and human evolution; defined as the evolution of the human species specifically. They defined acceptance as “the examination of the validity of the knowledge supporting the construct, the plausibility of the construct for explaining phenomenon, persuasiveness of the construct and fruitfulness or productivity of the empirical support for the construct.” (pg. 1639)

The I-SEA definition of evolution acceptance is different from other instruments in that it discriminates acceptance of different evolutionary contexts. Macroevolution and human evolution tend to be in direct conflict with commonly held religious beliefs in the United States, but microevolution is not (Pobiner 2016; Scott 2005). Therefore, levels of evolution acceptance are often higher for microevolution than macroevolution and human evolution (Nadelson and Hardy 2015; Nadelson and Southerland 2012) and relationships between predictor variables and evolution acceptance may change depending on the different subscale.

Exploratory and confirmatory factor analyses of the I-SEA have confirmed a three factor structure of microevolution, macroevolution, and human evolution to the I-SEA and the three resulting subscales had high internal consistency coefficients of > .80. Further, content validity of the I-SEA was supported by a group of experienced biology teachers, science teacher educators, and college biology faculty members who reviewed the items on the I-SEA (Nadelson and Southerland 2012). The authors of the I-SEA recommended that the instrument could be used as three separate instruments to measure acceptance of evolution or as an aggregate single instrument. Because research findings may be different for different subscales on the I-SEA, we treat the I-SEA as three distinct instruments: I-SEA microevolution acceptance, I-SEA macroevolution acceptance, and I-SEA human evolution acceptance.

Generalized Acceptance of EvolutioN Evaluation (GAENE; Smith et al. 2016)

The GAENE was published in 2016 and we have identified three studies (Metzger et al. 2018; Pobiner et al. 2018; Smith et al. 2016) published in academic journals that used the GAENE to measure evolution acceptance.

The GAENE is a 13-item Likert-style instrument originally designed to measure high school and college students’ evolution acceptance. Items on the GAENE went through an iterative construction process, with two rounds of pilot testing with students, two rounds of validation with science education experts, and two rounds of reliability, factor, and Rasch analyses, the second set of which showed acceptable reliability and validity of the GAENE with high school and college students (Smith et al. 2016).

The GAENE was constructed to provide a new measure of evolution acceptance that addressed what the authors identified as two major weaknesses of other instruments used to measure evolution acceptance. First, the authors of the GAENE built the instrument so that it would not conflate evolution understanding with evolution acceptance, and second, it was built from an explicit definition of evolution acceptance while other instruments used to measure evolution acceptance often operate from an implicit definition of evolution acceptance (Smith et al. 2016). The authors explicitly define evolution acceptance as:

“The mental act or policy of deeming, positing, or postulating that the current theory of evolution is the best current available scientific explanation of the origin of new species from preexisting species.” (pg. 8)

Items on the GAENE, unlike items on other instruments used to measure evolution acceptance, gauge the extent to which an individual is willing to advocate for evolution. For instance, students are asked to agree or disagree with statements such as, “It is important to let people know about how strong the evidence for evolution is,” and “I would be willing to argue in favor of evolution in a public forum such as a school club, church group, or meeting of public school parents,” which is different from other evolution acceptance instruments in which students are asked how much they agree with the current claims of evolutionary theory.

100-point instrument of self-defined acceptance

We constructed an evolution acceptance instrument with the purpose of mimicking the ones used by Bishop and Anderson (1990), Sinatra et al. (2003), and Hermann (2012) described in the introduction. We found five studies that used similar instruments to measure evolution acceptance. Our instrument was a composite score of three items in which students used a slider scale to indicate from 0 to 100 points: (1) “To what extent do you accept evolution?” (2) “To what extent do you believe evolution?” and (3) “To what extent do you think evolution is true?” In the study by Sinatra et al. (2003), students placed an “X” on a horizontal number line to indicate the extent to which they thought evolution was credible and in Bishop and Anderson (1990) students were asked, “Do you believe the theory of evolution to be truthful?.” In Hermann (2012) students were asked “To what extent do you accept (believe) evolution?” None of these three studies found a significant relationship between evolution understanding and acceptance. The instrument used in the current study was constructed with the purpose of mimicking an instrument like these that does not provide any definition of evolution acceptance and only relies on the student’s definition of evolution acceptance. This is the only instrument we use in this study that relies completely on the respondents’ own definition of evolution acceptance; other instruments generally provide students with specific contexts about evolution acceptance with which to agree or disagree.

We conducted think-aloud interviews (Willis 2004) with 25 undergraduate students with these items from this instrument. We instructed students to read each question aloud, then explain what they perceive the question is asking, and then to walk us through their reasoning as they answer the question. Finally, we asked students if there was anything confusing about the questions and if they had any suggestions for how they could be improved. Interviews indicated that students did not misinterpret the questions and answered the questions in a manner we would expect given their reasoning (e.g., one student answered “100” to “to what extent do you think evolution is true?” because she thought the scientific evidence was strong for evolution while another student answered “7” because she thought there was evidence for some aspects of evolution but not others). No student found the questions confusing or in need of clarification. However, six out of 25 students did indicate that the questions were repetitive and seemed to be asking the same question. Inter-item correlation coefficient ranges for items on this instrument were very high (r = .85–.89) and the reliability coefficient also very high (α = .95). Unlike the other evolution acceptance instruments used in this study, this instrument did not undergo formal evaluation by experts and was not published as a peer-reviewed paper dedicated to the instrument’s development. Nonetheless, it is similar to instruments used in other peer-reviewed studies published by leaders in the field of evolution education and these studies have been highly cited and thus fairly influential in the evolution education literature (Bishop and Anderson 1990; Sinatra et al. 2003).

Demographics and other predictor variables

We collected information on race/ethnicity, parent education level, political affiliation, religious affiliation from all students. Religiosity was also collected from all students and was defined as the extent which students see religion as important to their identity and the extent to which they participate in religious activities. Those associated with a religious denomination and those who selected “nothing in particular” as their religious affiliation were categorized based on their agreement on a 5-point Likert-scale from strongly disagree to strongly agree with the statements “I attend religious services regularly” and “my religion or faith is an important part of my identity” (Cohen et al. 2008). Religiosity was treated as an ordinal variable in which atheists and agnostics were categorized at the lowest end.

To measure Arizona students’ evolution understanding, we used two subscales on the Evolutionary Attitudes and Literacy Instrument (EALS; Hawley et al. 2010). We only used the two subscales (13 items) from the instrument that measure “Evolutionary Knowledge” (e.g., “In most populations, more offspring are born than can survive”) and “Evolutionary Misconceptions” (e.g., “Evolution is a linear progression from primitive to advanced species”). Students were asked to decide whether each item was true or false based on their evolution understanding, rather than on a Likert scale from strongly agree-strongly disagree so that the student would not answer based on their personal opinion but based on their conceptual understanding of the scientific theory of evolution. Students’ scores were calculated by determining the number of items answered correctly. We chose to use the EALS to measure evolution understanding because it has been used in other evolution education studies (Dunk et al. 2017; Short and Hawley 2015), has shown evidence of reliability and validity among college students (Hawley et al. 2010), and the items do not appear to conflate evolution acceptance with evolution understanding.

To measure Arizona students’ Nature of Science (NOS) understanding, we used the Rutledge and Warden (2000) modified version of the Johnson and Peeples (1987) instrument. This 20-item instrument probes students about a wide range of characteristics of science, including but not limited to: the tentative nature of science (e.g., “A fact in science is a truth that can never be changed”), the scientific method (e.g., “The initial step of the scientific method is to test a hypothesis”), the techniques of science (e.g., “To make any determinations about historic occurrences in nature, there must be direct observations”), and the limits of science (e.g., “Scientists must limit their investigations to the natural world”). Students’ scores were calculated by determining the number of items answered correctly (True-or-False response scale). We chose to use this instrument because it has been repeatedly used in the evolution education literature to measure Nature of Science (NOS) understanding and has shown large correlations with evolution acceptance in several studies (Dunk et al. 2017; Glaze et al. 2014; Rutledge and Warden 2000).

We also collected Arizona students’ cumulative end of semester college GPA from the university’s registrar.

Analyses and results

We wanted to compare research findings from different instruments when the student sample remained constant. Therefore, we only included data from students that provided answers for all questions used in the analyses, so the sample would be exactly the same across the analyses using different instruments.Footnote 1 This resulted in 742 complete responses from students in Arizona, 102 complete responses from students in Colorado, and 79 complete responses from students in Utah. We used SPSS version 25 for all analyses.

The religious background and political beliefs of students in Arizona, Colorado, and Utah were notably different. Ninety-two percent of students who we surveyed in Utah identified as Mormon/LDS, while only 1–2% of students in Arizona and Colorado identified as Mormon/LDS. Further, students in Utah scored one standard deviation higher on the measure of religiosity compared to students in Arizona and Colorado (Table 2). Students in Utah were also more likely to identify as Republican (57%) compared to students in Arizona (16%) and Colorado (16%). Past research shows that the general Mormon population has some of the lowest evolution acceptance rates (Baker et al. 2018), so we would expect a priori that the Utah student population would be less accepting of evolution compared to students in Arizona and Colorado. Arizona and Colorado students scored similarly on mean religiosity (Table 2), which was surprising because Colorado is generally rated as a less religious state compared to Arizona (Pew 2016). However, the institution sampled in Colorado was located in a rural part of Colorado, which could explain the similarity in Arizona and Colorado students’ religiosity scores. See Table 2 for a comparison of the demographics collected from all three student populations.

Table 2 Demographic characteristics of three student populations

For ease of interpretation, all scores from evolution acceptance instruments were relativized by the maximum scores so that scores from each instrument ranged from 0 to 1. From this point onwards, when we refer to instrument scores we are referring to the relativized scores. Internal consistency analyses revealed that all six evolution acceptance instruments had acceptable reliability (α = .84–.98). All alpha values for each evolution acceptance instrument by student population can be found in the Additional file 1: Table S1.

Bivariate correlations of students’ scores on the six different evolution acceptance instruments revealed that scores from the six different instruments were moderately to strongly correlated with one another in Arizona (r = .49–.83, p < .001), Colorado (r = .49–.82, p < .001), and Utah (r = .36–.79, p < .001). Correlation coefficients for each instrument by population of student can be found in the Additional file 1: Tables S2–S4.

Research question #1: do various instruments lead to different conclusions about the level of evolution acceptance among populations?

One research question that evolution education researchers often explore is whether evolution acceptance differs across students in different contexts (Short and Hawley 2015). We examined whether different evolution acceptance instruments lead to different conclusions about levels of evolution acceptance among three undergraduate student populations.

The samples in Arizona, Colorado, and Utah did not meet the homogeneity of variance assumptions needed to run one-way ANOVA’s when exploring population level differences with the MATE (Levene’s statistic = 7.95, p < .001), the GAENE (Levene’s statistic = 3.94, p = .02), the I-SEA microevolution (Levene’s statistic = 4.92, p = .007), or the 100-pt scale (Levene’s statistic = 11.18, p < .001), according to Levene’s test. Therefore, to determine if there were differences in evolution acceptance across the three student populations, we ran Welch’s robust tests of equality of means (Field 2009). Following Welch’s test, we then conducted Games-Howell post hoc tests that corrected for multiple comparisons (Field 2009) to identify which populations were statistically significantly different from one another. We conducted a separate test for each of the six different evolution acceptance instruments as the dependent variable with student population (AZ, CO, or UT) as the predictor variable. We consider a conclusion using one instrument to be different from a conclusion using a different instrument when one instrument detects statistically significant differences between two populations and another instrument does not.

Welch’s test detected differences in mean evolution acceptance scores across populations using all evolution acceptance instruments (Welch’s range (2, 920) = 14.61–22.02, p < . 01), except the I-SEA microevolution instrument (Welch’s (2, 920) = .123, p = .88.) in which student scores in AZ, CO, and UT were not deemed statistically significantly different from one another, indicating that each instrument showed similar patterns (see Fig. 1). Games-Howell post hoc tests revealed that Arizona students scored higher than Utah students on evolution acceptance when using every instrument (p < .01), except for the I-SEA microevolution instrument (p = .88). When comparing Arizona and Colorado populations, the test revealed higher scores among students in Arizona when using the MATE (p = .001), the I-SEA human evolution instrument (p = .039), the GAENE (p < .001), and the 100-point instrument (p = .014) but not using the I-SEA microevolution instrument (p = .986) or the I-SEA macroevolution instrument (p = .754). When comparing Colorado and Utah students, the test only detected differences using scores from the I-SEA human instrument (p = .003), in which Colorado students scored higher. Figure 1 summarizes the results comparing mean levels of evolution acceptance across populations by evolution acceptance instrument.

Fig. 1
figure 1

Results comparing mean relativised evolution acceptance scores on six different evolution acceptance instruments from students in Arizona, Colorado, and Utah using Welch’s test and Games-Howell post hoc comparisons tests. Error bars represent the 95% confidence interval. Different letters represent statistically different scores among populations for each instrument. Comparisons were made across populations for a single instrument and not across instruments

Research question #1 discussion We found that most evolution acceptance instruments, except the I-SEA microevolution and I-SEA macroevolution, showed that AZ had the highest evolution acceptance, so many of the instruments showed similar patterns. However, differences in evolution acceptance across populations were found with some evolution acceptance instruments but not others, indicating the results were inconsistent with some instruments. This was most notable within the three subscales of the I-SEA. Differences in levels of evolution acceptance between populations were consistently found using the I-SEA human instrument, but not the I-SEA microevolution instrument, which supports that there is higher variation in levels of acceptance of human evolution across different populations of students than microevolution.

Students in Utah scored higher on religiosity, so we expected that there would be higher perceived conflict between evolution and these students’ religious beliefs compared to the students in Arizona and Colorado who were less religious on average. As Nadelson and Southerland (2012) argued when they created the I-SEA, religious individuals may perceive more conflict between their religious beliefs and human evolution than macroevolution or microevolution and this may explain the difference we see in results from the I-SEA microevolution and I-SEA human evolution instruments.

These results suggest that using the I-SEA subscales, and specifically separating the analyses by subscale, may give researchers a more nuanced understanding of the differences in evolution acceptance across populations. Further, if researchers use instruments for evolution acceptance that are more focused on microevolution, then researchers could report higher rates of evolution acceptance and could also miss differences in overall evolution acceptance between populations of students. Further, if researchers do not inquire about human evolution acceptance with their instrument, they may not identify population level differences in evolution acceptance that would be apparent with a different instrument that included questions about human evolution.

Research question # 2: do various instruments lead to different results and conclusions about relationships between student variables (e.g. evolution understanding, religion) and levels of evolution acceptance?

Another common aim of evolution education research is to identify what variables are related to evolution acceptance and what positively or negatively predicts whether someone will be accepting of evolution. We examined whether researchers could get different results and come to different conclusions about what variables would predict evolution acceptance if they used different evolution acceptance instruments.

We restricted these analyses to students in Arizona for which we collected student evolution understanding and Nature of Science (NOS) understanding. We used linear regression analyses to explore whether variables predicted evolution acceptance differentially depending on the instrument used to measure evolution acceptance. We inputted all predictor variables (evolution understanding, NOS understanding, GPA, course level, parent education level, religiosity, religious denomination, political affiliation and race/ethnicity) into regressions with scores from each evolution acceptance instrument as a dependent variable (six regressions total). Table 3 illustrates how each variable was inputted in the analyses.

Table 3 Description of categorical and ordinal variables in the regression analyses

To determine if study results were different across analyses we compared confidence intervals of the unstandardized beta coefficients from each independent variable across the six regressions and identified the cases in which the confidence intervals do not overlap. If the confidence intervals for a single independent variable do not overlap between analyses then we deem the revealed impacts of this independent variable on evolution acceptance to be different beyond sampling variation (Schenker and Gentleman 2001) and constitute a different “result” (Table 5). We then determined if researchers would make different conclusions based on these data by comparing whether a variable will be deemed statistically significant across analyses and thus whether the variable would be considered a significant predictor of evolution acceptance across analyses. If when using one instrument we would conclude that a variable was statistically significantly related to evolution acceptance, but with another instrument we would conclude that this same variable is not statistically significantly related to acceptance, we deemed that this would be a different “conclusion” drawn based on different instruments used (Table 6).

Students had an average GPA of 3.3 (SD = .61), scored an average of 9 out of 13 on the understanding of evolution measure (SD = 2.14), and scored an average of 13 out of 20 on the Nature of Science (NOS) understanding measure (SD = 2.68).

Overall, the collective variables explained a statistically significant amount of variation in evolution acceptance from six models across all evolution acceptance instruments (range of adjusted R2 = .29–.42, p < .001). The variables explained 42% of the variation in MATE scores (F (16, 725) = 34.20, p < .001), thirty-five percent of the variation in I-SEA microevolution scores (F (16, 725) = 26.36, p < .001), thirty-one percent of the variation in macroevolution scores (F (16, 725) = 21.82, p < .001), thirty-five percent of the variation in I-SEA human evolution scores (F (16, 725) = 26.04, p < .001), twenty-nine percent of the variation in GAENE scores (F (16, 725) = 18.31, p < .001), and thirty-one percent of the variation in 100-pt evolution acceptance scores (F (16, 725) = 21.87, p < .001). All six full regression tables with r-squared values, F-statistics, standardized and unstandardized coefficients, standard errors, and t-statistics, can be found in the Additional file 1: Tables S5–S10.

Religiosity was the strongest predictor of evolution acceptance across all instruments (β range = − .29 to − .25, p < .001), except for the I-SEA microevolution acceptance instrument where evolution understanding was the strongest predictor (β = .26, p < .001).

We found that the results across analyses were often different. We compared the confidence intervals of the unstandardized beta coefficients for each predictor variable across the six regression models. Non-overlapping confidence intervals from unstandardized beta coefficients indicated that the regression coefficients for several predictor variables were different based on the evolution acceptance instrument used. We identified non-overlapping confidence intervals for the effect of parent education, evolution understanding, Nature of Science (NOS) understanding, religiosity, and identifying as Protestant or LDS, Muslim, Republican, Hispanic, or another race/ethnicity. All calculated confidence intervals and the p-values for coefficients can be found in Table 4. A summary of cases in which confidence intervals overlap and do not overlap can be found in Table 5.

Table 4 Calculated confidence intervals and p-values for unstandardized beta coefficients from six regressions
Table 5 Evaluation of confidence intervals of unstandardized beta coefficients from predictor variables across evolution acceptance instruments used to determine if results across analyses are different

Of particular note was the differential effect of evolution understanding and religious denomination across evolution acceptance instruments. Evolution understanding had a noticeably variable effect on evolution acceptance depending on the evolution acceptance instrument used. Confidence intervals for the effect of evolution understanding on MATE scores and I-SEA microevolution scores were almost identical (CI = .015, .019), but confidence intervals for the effect of evolution understanding on I-SEA macro evolution scores (CI = .010, .014), I-SEA human evolution scores (CI = .009, .014), GAENE scores (CI = .003, .008), and 100-pt scores (CI = .004, .012) were lower, thus indicating that evolution understanding is a weaker predictor with these instruments. Additionally, the confidence interval for the effect of identifying as Protestant/LDS was lower for I-SEA microevolution scores (CI = − .040, − .009) than MATE scores (CI = − .076, − .046), I-SEA macroevolution scores (CI = − .085, − .053), I-SEA human evolution scores (CI = − .092, − .056), and GAENE scores (CI = − .093, − .058). Additionally, the effect of identifying as Protestant/LDS appeared to be particularly strong in predicting scores on the 100-pt scale (CI = − .186, − .127).

The differences in confidence intervals of regression coefficients indicate different study “results”, with a “result” defined as the coefficient rendered by the analyses. This is different from the colloquial usage of the term “result” which suggests statistical significance. We consider differences in statistical significance across models more specifically as differences in conclusions rendered from the different results (coefficients). Would we have come to different conclusions about what variables are statistically significantly related to evolution acceptance, based on different evolution acceptance instruments? To answer this, we look to whether variables were deemed as statistically significant (p < .05) in each regression model.

Whether variables were statistically significant was different for different evolution acceptance instruments; some variables were significant and positive predictors across all instruments, some variables were significant but negative predictors of acceptance across all instruments, and finally some variables were significant predictors of acceptance with some instruments but not others. The directionality of the relationships between predictor variables and acceptance was constant across all models; variables were never significant positive predictors in one model and then significant negative predictors in other models. See Table 6 for a summary of which variables were statistically significant across evolution acceptance instruments.

Table 6 Summary of the statistical significance of predictor variables across linear regressions used to determine if conclusions were different across analyses

Religiosity was a statistically significant negative predictor of evolution acceptance across all instruments. Nature of Science (NOS) understanding was a statistically significant positive predictor of evolution acceptance across all evolution acceptance instruments (Table 6).

GPA was a positive significant predictor of evolution acceptance across almost all instruments, but GPA was not significant for GAENE scores. Evolution understanding was a significant positive predictor of evolution acceptance across all instruments except it was not a significant predictor of 100-pt scores. Identifying as Protestant or LDS was a significant negative predictor of evolution acceptance for almost all instruments except the I-SEA microevolution instrument (Table 6).

Identifying as a race/ethnicity in the other race/ethnicity category was a negative significant predictor of scores from the I-SEA human evolution acceptance instrument, but no other instrument. Identifying as Catholic, other religion, Black/African American, or Hispanic was not a significant predictor of evolution acceptance scores for any evolution acceptance instrument (Table 6).

Because the effect of evolution understanding on evolution acceptance has been a topic of contention in the evolution education literature, and the observation that the beta coefficients were different across instruments for this variable, we decided to explore the extent to which evolution understanding scores could independently explain evolution acceptance scores across evolution acceptance instruments. We conducted simple linear regressions using evolution understanding as the sole predictor variable for each evolution acceptance instrument. Evolution understanding alone was a statistically significant predictor of evolution acceptance across all instruments (F range = 35.39–188.62, p < .001), but the amount of variance in evolution acceptance that could be explained by evolution understanding ranged from 4 to 20% across evolution acceptance instruments. See Table 7 for a summary of the results for these regressions looking at the independent effect of understanding evolution on evolution acceptance across evolution acceptance instruments.

Table 7 Summary of simple linear regressions exploring the independent effect of evolution understanding on evolution acceptance across evolution acceptance instruments

Research question #2 discussion Research results and conclusions exploring relationships between variables and evolution acceptance differed depending on the instrument used to measure evolution acceptance. First, the coefficients for many predictor variables differed depending on the evolution acceptance instrument, most notably for evolution understanding and identifying as a Protestant/LDS. Further, whether college GPA, course level, parent education, evolution understanding, identifying as a Protestant/LDS, Muslim, Democrat, Republican, or Asian and other race/ethnicity predicted evolution acceptance was different depending on the evolution acceptance instrument used. Similar to what has been reported in the literature, we saw that the strength and statistical significance of the relationship between evolution understanding and evolution acceptance was variable across instruments. Why does evolution understanding predict evolution acceptance differentially according to these different instruments?

First, many researchers use the MATE intending to measure only evolution acceptance, but the MATE has items that require understanding of evolution. So, a student categorized as fully “accepting evolution,” according to the MATE, needs to have an accurate understanding of evolution, which could have led to stronger relationships between understanding and accepting evolution. Our results revealed that evolution understanding had among the strongest coefficients when we used the MATE to measure evolution acceptance.

However, we found that evolution understanding had similar beta coefficients for I-SEA microevolution scores. The I-SEA was designed to avoid questions that measure evolution understanding, so the relationship between evolution understanding scores and scores seen on the I-SEA microevolution instrument is likely not due to the same conflation issues that may be occurring with the MATE (Nadelson and Southerland 2012; Smith et al. 2016). However, many people do not see conflict between their religious beliefs and microevolution (Scott 2005). Therefore, evolution understanding may be more strongly related to I-SEA microevolution scores because a higher understanding is able to impact acceptance more strongly in the absence of a belief/identity barrier. Indeed, we found that evolution understanding was more predictive of I-SEA microevolution scores than I-SEA macroevolution and human evolution scores. This reasoning aligns with prior research that shows that understanding is more related to acceptance of a topic when an identity barrier is not present (Kahan and Stanovich 2016; Weisberg et al. 2018).

For the I-SEA macroevolution, I-SEA human evolution, and GAENE, our results indicate that evolution understanding is a significant, but weaker, predictor of evolution acceptance. Further, when controlling for other variables, evolution understanding was not at all predictive of 100-pt acceptance scores. But how do these results compare with the findings from prior evolution education literature?

The results of this study have the potential to explain the contradictory findings we have seen in the evolution education literature over the last 30 years. In fact, the results we found here are strikingly similar to the findings we have seen in the literature with regards to evolution understanding. Sinatra et al. (2003), Bishop and Anderson (1990), and Hermann (2012) who used instruments similar to the 100-point instrument, found no relationship between evolution acceptance and evolution understanding, similar to our results in this study. However, researchers who have used the MATE to measure evolution acceptance have often found large significant relationships between acceptance and evolution understanding (Rutledge and Warden 2000; Trani 2004), which is also similar to what we found in this study. Our study, which examines these instruments in the same student population, suggests that different patterns noted in these prior studies may not just be an artifact of different study populations.

Future research

There were several limitations to this study that warrant future research. This study was restricted to exploring the differences in measurement of evolution acceptance, but researchers studying evolution education also use very different instruments to measure evolution understanding and Nature of Science (NOS) understanding. We measured these predictor variables with instruments that have been used in several other studies, but results could be different if we used different instruments. For instance, research findings may differ if we had used the Conceptual Inventory of Natural Selection (CINS) (Anderson et al. 2002) or the Measure of Understanding of Macroevolution (MUM) (Nadelson and Southerland 2009) to measure evolution understanding instead of the EALS because understanding of natural selection may be related to evolution acceptance differently than understanding of macroevolution (Nadelson and Southerland 2010). Additionally, research findings could have been different if we used the Views of Nature of Science (VNOS) instrument (Abd-El-Khalick et al. 2001) or the Evolution and Nature of Science (ENOS) instrument to measure student Nature of Science (NOS) understanding. Future research should explore how differential measurement of other constructs other than evolution acceptance may lead to different research findings.

We reported the results from multi-item evolution acceptance instruments in this manuscript. Single item instruments, such as those used by the Gallup or Pew polls (Gallup 2017; Pew 2013), have been used in evolution education studies that are published in high impact journals (Miller et al. 2006; Weisberg et al. 2018) and are often widely cited as statistics for the US public’s evolution acceptance. An analysis of different single item instruments and how they may lead to different research findings would be of value to the field in future studies.

We treated Likert scale scores as interval data. Although this is a common practice in the analyses of Likert scale data, we recognize that from a measurement theory perspective this assumption may not stand (Hambleton et al. 1991).

Finally, we explored how multi-item evolution acceptance instruments operate in three different undergraduate student populations and the differences we found are likely dependent on the specific populations from which we sampled. Future studies should confirm that differences in instruments, particularly those measuring acceptance of microevolution, macroevolution, and human evolution are present in other populations of students.

Discussion and conclusion

Evolution education researchers have many different options to measure evolution acceptance, yet this study highlights that the choice of the instrument used to measure evolution acceptance can influence the results and conclusions of a study. We found that whether and how much a variable predicted evolution acceptance was dependent on the instrument used to measure evolution acceptance. Further, we also found that in some cases, and particularly when examining the sub-scales of the I-SEA, whether a researcher would find differences in levels of evolution acceptance across populations of students was dependent on the evolution acceptance instrument used. These findings highlight that the diversity of instruments used to measure evolution acceptance could be an underlying reason for conflicting results in the evolution education literature because people are comparing studies that used different evolution acceptance instruments.

Our study was not designed to identify the “best” instrument to measure evolution acceptance, so we cannot give a recommendation for what we think researchers should use. However, this study does highlight the need for the evolution education community to be more specific about what is meant by “acceptance of evolution” and to take steps to improve the alignment of their research questions with their evolution acceptance instrument. Some of these instruments use specific items to elicit student acceptance of evolution, whereas some instruments ask broad questions that allow students to use their own definitions of what it means to accept evolution. The underlying issue may be that there are many different definitions of acceptance of evolution (Smith 2009a, b; Smith et al. 1995, 2016). If different evolution acceptance instruments were built using different definitions of evolution acceptance, then these instruments were built to measure different constructs, even if researchers all use the term “evolution acceptance.” Does evolution acceptance only include microevolution or does evolution acceptance also include macroevolution and human evolution (Nadelson and Southerland 2012)? To what extent does someone need to understand evolution and the Nature of Science (NOS) to accept evolution (Rutledge and Warden 1999)? To what extent does someone need to advocate for evolution to accept evolution (Smith et al. 2016)? Is it enough to simply let respondents define evolution for themselves in surveys or do the researchers need to define evolution for the respondent (Bishop and Anderson 1990; Sinatra et al. 2003)? Our study provides empirical evidence that it is important for researchers to be thoughtful about what the evolution education community actually wants to measure when they measure evolution acceptance. Different instruments will measure different aspects of evolution acceptance and may then influence the conclusions that researchers will draw.

We encourage the evolution education community to consider these issues and reach consensus on a definition of evolution acceptance and a common instrument to measure evolution acceptance, based on this consensus definition. Much could be learned by comparing evolution acceptance across different institutions and contexts, but our study indicates that researchers may not be able to use different instruments to do so. We argue for increased dialogue across the evolution education community about the relative strengths and weaknesses of different evolution acceptance instruments. While we are hesitant to give any further recommendations, we encourage other researchers to be critical of existing instruments and not use an instrument’s popularity alone in making a decision about which instrument to use.

Notes

  1. The majority of the excluded responses had data that was missing at random.

Abbreviations

NOS:

Nature of Science

References

Download references

Authors’ contributions

MEB helped conceptualize the study, collected and interpreted data, performed statistical analyses, and drafted the manuscript. HMD helped conceptualize the study, collected data, performed statistical analyses, and aided in revising drafts of the manuscript. EAH helped conceptualize the study, collected and interpreted data, performed statistical analyses, and aided in revising drafts of the manuscript. Yi Zheng helped to plan the analyses, reviewed the analyses and statistics for critical content, and aided in revising drafts of the manuscript. SB helped conceptualize the study, interpret the data, and drafted the manuscript. All authors read and approved the final manuscript.

Acknowledgements

We would like to acknowledge Katelyn Cooper, Ryan Dunk, Logan Gin, and Dan Grunspan for their friendly reviews of the manuscript. We would also like to thank Gale Sinatra and Heath Ogden for their help gathering preliminary data during the early stages of the project.

Competing interests

The authors declare that they have no competing interests.

Availability of data and materials

All data analyzed in this manuscript is available upon request from Sara Brownell (sara.brownell@asu.edu).

Ethics approval and consent to participate

The Institutional Review Board of Arizona State University and University of Northern Colorado approved the procedures for this study (ASU IRB #00007719; UNC IRB # #1131916-2). Written informed consent was obtained by all participating students at the beginning of the study.

Funding

National Science Foundation (Grant Numbers DGE-1311230 and 1712188), which provided graduate student and postdoctoral support for this project.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sara E. Brownell.

Additional file

Additional file 1.

Additional tables.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Barnes, M.E., Dunlop, H.M., Holt, E.A. et al. Different evolution acceptance instruments lead to different research findings. Evo Edu Outreach 12, 4 (2019). https://doi.org/10.1186/s12052-019-0096-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12052-019-0096-z

Keywords