Teaching evolution in U.S. public middle schools: results of the first national survey

Despite substantial research on the teaching of evolution in the public high schools of the United States, we know very little about evolution teaching in the middle grades. In this paper, we rely on a 2019 nationally representative sample of 678 middle school science teachers to investigate how much time they report devoting to evolution and the key messages they report conveying about it, using this information to assess the state of middle school evolution education today. Throughout these analyses, we provide comparative data from high school biology teachers to serve as a baseline. We find that, compared to high school biology teachers, middle school science teachers report themselves as less well-equipped to teach evolution, devoting less class time to evolution, and more likely to avoid taking a stand on the scientific standing of evolution and creationism. We show that middle school science teachers with extensive pre-service coursework in evolution and in states that have adopted the Next Generation Science Standards are more likely to report devoting more class time to evolution. Similarly, we show that middle school teachers in states that have adopted the Next Generation Science Standards and who are newer to the profession are more likely to report themselves as presenting evolution as settled science. Our findings suggest avenues for the improvement of middle school evolution education through teacher preparation and public policy; in addition, a degree of improvement through retirement and replacement is likely to occur naturally in the coming years. More generally, our results highlight the need for further research on middle school education. Our broad statistical portrait provides an overview that merits elaboration with more detailed research on specific topics.

The middle school years are important for introducing students to many of the evolutionary concepts that they will need to master the topic in high school. (In the United States, middle schools generally serve grades 6 through 8, with students usually 11 to 14 years old, while junior high schools generally serve grades 7 through 9; with students usually 12 to 15 years old; for brevity, we will use "middle school" to abbreviate "middle or junior high school. ") Indeed, the foundations for understanding evolution are woven into the Next Generation Science Standards beginning in kindergarten. Therefore, attention to how evolution is taught in the middle grades is long overdue. As we show below, middle school science teachers report devoting considerable time to evolution, so it is imperative that the quality of that instruction and the challenges faced by middle school science teachers be understood and appreciated. In this paper, we aim to take a first and important step in spurring research on evolution education in the middle grades that is comparable to that at the high school level.
To that end, the paper proceeds as follows. First, we document and summarize the meager body of relevant research that we could identify. Second, we introduce our methodology and data set, a 2019 nationally representative sample of 678 middle school science teachers. Third, we present a series of results on how much time these teachers report devoting to evolution and the key messages they report conveying to students, using this information to assess the state of middle school evolution education today. Throughout these analyses, we provide comparative data from high school biology teachers. Fourth, we offer some discussion, providing an overview of the results, a discussion of the limitations of the study and possible directions for future research, and recommendations for teacher preparation and public policy.

What do we know about evolution teaching in the middle grades?
After an extensive search, we identified only a handful of scholarly reports on evolution teaching in middle school. Nadelson and Nadelson's (2010) study of teacher attitudes toward teaching evolution included a subsample of 13 middle school teachers, but the article never breaks out this group for comparison-probably due to the unreliability of statistical comparisons with such a small sample. Fowler and Meisels (2010) report results from 85 middle school teachers, part of a convenience sample of Florida NSTA members, finding that 67% of them agreed that evolution is a central principle in biology, while 40% felt that one "does not need to understand evolution in order to understand biology. " Based on their review of the literature, Glaze and Goldston (2015) conclude that "elementary and middle school teachers demonstrated greater misgivings" about teaching evolution, as well as less acceptance of evolution, "than their secondary counterparts. " This is a very plausible conclusion, but we were unable to find firm empirical evidence of this in the work they cite. Finally, as part of her recent dissertation, Klahn (2020) conducted structured interviews with ten middle school science teachers about evolution, finding that they favored emphasizing microevolution over macroevolution as a teaching approach, few of them discussed human evolution, and they were concerned about pressures from inside and outside the school.
Taken together, the existing literature suggests that many of the same themes that arise in research on high school biology teachers are present among middle school science teachers, but it is impossible to say much more than that. Indeed, if all these studies provided precise statistical comparisons, the combined sample size would be under 110, so it is not possible to assess whether the observed teachers are representative of all middle school teachers or even of teachers in their particular locale. Thus, there is an acute need for research that uses standard questions and methods and representative samples to understand whether middle school evolution teaching differs from evolution teaching in high schools, andif so-by how much. In the next section, we describe a study that does precisely this.

Methods
Fielded between February and May of 2019, the 2019 Survey of American Science Teachers included both a high school and a middle school sample. Results from the high school responses can be found in Plutzer, Branch, and Reid (2020), and the methods for the present study are described in detail in that report. We repeat the description of the methods here for the convenience of readers. The sample was drawn, based on investigator specifications, from a national teacher file maintained by MDR (Market Data Retrieval, a Dun and Bradstreet direct mail firm that maintains the largest mailing list of educators in the US). To ensure national coverage, national lists of 30,847 high school biology teachers and 55,001 middle school science teachers were first stratified by state and urban/suburban/other location. With the District of Columbia serving as a single stratum, this produced 151 segments. Within each segment, we selected a random sample with a high school sampling probability of 0.081 and a middle school probability of 0.046, yielding an initial set of 2503 high school biology teacher and 2511 middle school teacher names and addresses.
Replicating precisely the survey protocol used by Berkman and Plutzer's 2007 survey of high school biology teachers (Berkman et al. 2008), and consistent with best practices for mail surveys (Dillman et al. 2014), we then sent each teacher an advance prenotification letter explaining the survey and telling them that a large survey packet would arrive in a few days. The packet included a cover letter, a token pre-incentive (a $2 bill), a 12-page survey booklet, and a postage-paid return envelope. One week later a reminder postcard was sent, and a complete replacement packet (though without an incentive) two weeks after that. In the week after the replacement packet was mailed, we emailed reminders to the roughly 85% of non-responding teachers for whom we had valid emails. Two email reminders and one final postcard-saying that the study was about to close-followed.
The overall response rate was 40% for the high school sample and 34% for the middle school sample (using AAPOR response rate formula #4). To place this in context, sample surveys of teachers vary considerably in their overall response rate, ranging from the low single digits (Puhl et al. 2016;Troia and Graham 2016;Davis et al. 2017;Dragowski et al. 2016) and the mid-teens (Lang et al. 2017;Hart et al. 2017) to Department of Education survey programs that approach 70% (National Center for Education Statistics n.d., 2019; Centers for Disease Control and Prevention 2015). In that light, our response rate is at the high end of results achieved outside of government-sponsored studies. However, survey scientists have sought to discourage a heavy reliance on response rates as indicators or data quality. Indeed, scores of studies show that there is no simple relationship between response rates and Total Survey Error or response bias (e.g., Keeter et al. 2000;Groves and Peytcheva 2008;Keeter 2018), leading to a greater focus on direct measures of a sample's representativeness. To this end, we conducted a detailed non-response audit, and found that the responding teachers were broadly representative of the target population. Details are provided in the Appendix (see Tables 10, 13, and 14, and the accompanying text).
We augmented the design weights with a non-response adjustment, and we report weighted estimates throughout this report, although the unweighted results are almost always similar. Full details on the methods of contact, the non-response audit, and methods of weight calculation are provided in the Appendix.
The full pencil-and-paper questionnaire was a twelvepage booklet. In addition to the items discussed in this report, the questionnaire also included sections on the teaching of climate change, textbook selection, and additional questions about how teachers manage controversy in their classrooms-topics that are beyond the scope of this paper.

Pre-service coursework
Middle school science teachers are expected to be generalists, with most middle school science classes covering a mixture of earth and space sciences, life sciences, chemistry, and physics. Some may also include technology or computer applications. Thus, the likelihood that any middle school teacher has extensive coursework in any one area is low. As a consequence, we expect middle school teachers to have less pre-service coursework on evolution and less in-service continuing education on evolution, as compared to high school biology teachers. (The National Survey of Science & Mathematics Education found this to be true with regard to preservice coursework in 2012 and 2018: see Smith 2020, p. 10, Table 2.8.) And that expectation is borne out by our survey, as shown in Table 1. Because the distribution of coursework for middle school teachers is quite skewed, we combine answers to two different questions and reduce to four levels of coursework (detailed breakdowns are provided in Supplementary Materials, Additional file 1: Table S1). Table 1 shows that 42% of middle school teachers reported that they had not taken even a single college-level course with any evolution content. An additional 23% reported taking just one course; for four in five of this group, this was not a course primarily focused on evolution but another science course that devoted one or more class sessions to evolution.

Understanding of the scientific consensus
Given their less extensive coursework on evolution, we expected fewer middle school teachers to be aware of the scientific consensus on evolution. That is indeed the case. We asked all teachers, "To the best of your knowledge, what proportion of scientists think that humans and other living things have evolved over time?" The Pew Research Center's (2015) survey of AAAS members estimates the correct answer as 98%. As shown in Table 2, only 55% of middle school teachers correctly answered that 81-100% of scientists think that humans evolved, in contrast to 71% in the high school sample.
In addition, we expected that the middle school science teachers would be less likely report accepting evolution themselves. The teachers were asked about their view on human origins, using a question frequently used in general population public opinion polls. Table 3 presents the results, along with the results from a 2019 Gallup poll of the general public for reference (Brenan 2019). These results show while middle school teachers were more likely than high school teachers to choose the creationist response, they were significantly (p < 0.001) less likely to do so than members of the general public. 2

Teaching evolution: time devoted to the topic and messages conveyed to students
Evolution is expected to be taught in middle school science classes in most states: Vazquez (2017) notes that the state science standards in all but two states mention natural selection and 37 mention evolution. Nevertheless, because middle school science classes are more general and tend to be more multidisciplinary than high school biology, and because middle school students are less capable of understanding complicated concepts like those involved in evolution than high school students, we expected to find that middle school science teachers devote less time to both general evolution and human evolution than their high school counterparts.
We asked each teacher to select his or her primary class (that with the largest enrollment) and tell us about the allocation of time devoted to various topics in that class. The question began, "Thinking about how you lay out your class for the year, please indicate how many class hours (40-50 min) you typically spend on each of the following broad topic areas. " After cell biology, ecology, and human health and disease, the teachers were asked about human evolution, general evolutionary processes, and (later in the sequence) "intelligent design or creationism. " 3 We first report on the two evolution topics. As Table 4 shows, only 45% of middle school science teachers reported devoting any class time to human evolution and only 60% to general evolutionary processes; this is in comparison to 78 and 91% of high school biology teachers, respectively. Combining both answers, the mean number of class hours reportedly devoted to evolution (totaling human and general) is nearly double in high school compared to middle school (17.2 class hours versus 9.1).
If we restrict comparison to those teachers who reported devoting at least one class hour to evolution, however, the difference narrows (18.6 class hours versus 14.6). That is, when middle school science teachers cover evolution at all, they spend only 22% less time on the subject than high school biology teachers do. To put it another way, assuming a 5-day week, high school biology teachers spend about 3.7 weeks on evolution on average, while middle school science teachers spend about 2.9 weeks.
But focusing on the class hours reportedly devoted to the topic can obscure important differences in how the science of evolution is characterized. To assess the messages that teachers convey to students, we presented a series of prompts about the themes that teachers emphasize in the overall course organization and the messages they convey to students.
Perhaps most important is the prompt, "When I teach about the origins of biological diversity (including answering student questions) … I emphasize the broad scientific consensus that evolution is fact. " 4 Teachers could agree, disagree, or choose "not applicable. " The results, reported in the upper panel of  middle school science teachers are half as likely as high school biology teachers to strongly agree. But this result is deceptive because of the large number of middle school teachers who chose "not applicable. " 5 If we restrict our comparison to those who reported devoting at least one class hour to evolution and exclude those who chose "not applicable, " the picture is rather different. As shown in the lower panel of Table 5, the gap in emphasis is much narrower than it appeared initially. Indeed, combining "agree" with "strongly agree" shows that (among those teaching about evolution) 82% of middle school science teachers reported emphasizing the scientific consensus that evolution is a fact, which is comparable to the 86% of high school biology teachers who reported conveying this message.

Creationism in the middle school science classroom
As any student of science education in the United States knows, the teaching of evolution is only half the story. For more than a century, a number of secondary science Table 3 Personal views on human evolution (percentages within column)

Which comes closest to your views on the origin and development of human beings?
Middle school High school General public (   5 Eighty percent of middle school teachers selecting "not applicable" also reported spending zero hours on human evolution and general evolution. However, their avoidance of evolution is not due to evolution being completely irrelevant to their classes. Of the middle school teachers who selected "not applicable, " 64% reported devoting class time to cell biology, ecology, or biodiversity-topics to which evolution is clearly relevant. educators-such as Roger DeHart, John Freshwater, and Rodney LeVake, to name a few recent high-profile instances-have actively promoted or given credence to non-scientific alternatives to evolution in their public school classrooms. We therefore asked how many middle school science teachers currently discuss creationism in their classes. At first glance (upper panel of Table 6), middle school science teachers look a lot like high school biology teachers in terms of time devoted to creationism, including intelligent design. Overall, 7% of middle school science teachers reported spending 1-2 class hours and 8% reported spending three class hours or more on creationism, while the corresponding percentages for high school biology teachers are 9 and 4%, respectively. Yet this ignores how few middle school science teachers reported covering evolution. So we also compared high school and middle school teachers who reported devoting at least one class period to evolution. Those results, in the lower panel of Table 6, show that among this subset of teachers, slightly more middle school science teachers reported introducing creationist ideas into the classroom than their high school counterparts (19 versus 14%). But these data alone don't reveal whether these teachers are discussing creationism in order to advocate for it or to criticize it.
To better understand the content of this instruction, we posed two prompts to teachers: "I emphasize that intelligent design is a valid, scientific alternative to Darwinian explanations for the origin of species" and "I emphasize that many reputable scientists view creationism or intelligent design as valid alternatives to Darwinian theory. " (These ask about the teacher making the assertion without and with appeals to scientific authority.) We report the responses to these prompts two ways: first providing the overall distribution (upper panels of Table 7) and next restricting analysis to those who did not choose "not applicable" (lower panels of Table 7). Looking at this more restricted sample, we see substantial differences. About 80% of high school teachers disagree with each statement, which suggests that when creationism arises through student questions or by the teachers themselves, they use the occasions to counter the idea  that creationism is scientifically credible. While about 18% of high school teachers agree with either the first or the second statement, or both, however, about 36% of middle school teachers agree with the first and about 31% agree with the second. Thus, if they cover these topics, middle school teachers are far more likely to discuss creationist ideas in ways that give them the legitimacy of science.
A clearer picture emerges when we combine the three questions about teaching emphasis into a teaching typology, along the lines of that developed by Plutzer et al. (2020). In this typology, there are four groups of teachers: those who send the message that evolution is settled science by emphasizing the broad scientific consensus on evolution while not emphasizing the scientific credibility of creationism; those who send mixed messages by emphasizing the broad scientific consensus on evolution while also emphasizing the scientific credibility of creationism; those who avoid the issue by emphasizing neither; and those who send a pro-creationist message by emphasizing the scientific credibility of creationism but not the broad scientific consensus on evolution. The top panel of Table 8 shows many more high school biology teachers report teaching evolution as settled science than do middle school science teachers. In contrast, nearly twice as many middle school science teachers (10.5% compared to 5.8%) report sending exclusively procreationist messages and more than 50% more (21.4% compared to 13.8%) report sending mixed messages.
For comparison, the bottom panel of Table 8 restricts analysis to those teachers who report devoting at least one class hour to either evolution or creationism, thereby reducing the number of avoiders, but nevertheless raising concerns. Even among middle school science teachers who devote formal class time to evolution or creationism, more than one in five (21%) do not comment on the scientific standing of those views, nearly one in five (18%) convey mixed messages by endorsing both evolution and creationism, and nearly one in ten (9%) endorse creationism alone.

Which factors promote more and better teaching of evolution in the middle grades?
The comparison with high school biology teachers provides a general characterization but also reveals considerable diversity among middle school science teachers. Judging from their reports, some teach far more evolution than others; and some convey messages consistent with the scientific consensus while others do not. In this section, we first seek to explain the variation in hours devoted to evolution and then to explain the variation in the messages teachers convey.
We focus on a suite of variables previously identified in the literature as potentially important predictors in the teaching of evolution. These include a key policy variable-whether the teacher works in a state that has adopted the Next Generation Science Standards (NGSS: NGSS Lead States 2013), which treat evolution as a disciplinary core idea of the life sciences. Plutzer et al. (2020)  concluded that the treatment of evolution in the NGSS helped to produce a significant change in the emphasis on evolution in public high school biology classrooms between 2007 and 2019. So we looked to see whether public middle school teachers in NGSS states report allocating their time differently and sending different messages than their colleagues in states that have adopted non-NGSS standards, either based on the same Framework (National Research Council 2012) on which the NGSS are based or not. We also examine two different measures of teachers' formal preparation: whether they hold a degree (undergraduate or graduate) in a scientific discipline, and the weighted sum of the number of semester-length (or quarter-length) classes they completed that focused primarily on evolution and the number of courses they completed that devoted at least one full class session to evolution (focused classes are weighted double). (The two coursework measures, if treated separately, would be highly correlated, making it difficult to disentangle their effects; the weighted sum treats them together.) Finally, we also look at teacher seniority, which can affect teaching in a number of ways. Most critically, the most senior teachers were teaching many years before the NGSS were released and may have developed teaching approaches that they are reluctant to change to correspond to the demands of newer standards.
To examine the impact of all these factors on teaching, we estimate two multivariate models. The first regresses the total number of class hours devoted to evolution on these independent variables. The resulting regression slope estimates and 90% confidence intervals are reported in Fig. 1 (in which the baseline effects of omitted comparison groups are included as a convenience for interpretation).
The first group of estimated effects concern state adoption of the Next Generation Science Standards. The results show that middle school teachers in NGSS states report devoting 2.5 more class hours to evolution than teachers in states where the standards are more loosely based on the Framework or not based on the Framework at all. This is a substantial, and statistically significant, effect.
Teachers' college-level coursework in evolution also has a statistically significant effect. The coefficients show a strong increasing trend, with even one or two prior courses having a substantial impact (increasing reported coverage by 1.9 and 4.2 class hours, respectively). For middle school science teachers, even small exposure to college-level evolutionary science seems to matter greatly. After accounting for their more extensive coursework, middle school science teachers holding a degree in science report providing more coverage as well (1.7 additional class hours), but this estimate is not statistically significant. Finally, the plot shows that teachers with more than twenty years of experience devote fewer class hours to evolution, but the estimate is far short of statistical significance.
Overall, then, it appears that state adoption of the NGSS has an important impact on the number of class hours devoted to evolution that a typical middle school student will experience. Middle school students are likely to have additional focused instruction on evolution if their teachers majored in science and if their teachers Fig. 1 Effects of policy and formal preparation on the number of class hours devoted to evolution by middle school science teachers. Ordinary least squares regression estimates and 90% confidence intervals (accounting for sample weights and design effects, N = 596). Contrast (baseline) categories included for reference have no confidence intervals Fig. 2 Effects of policy and formal preparation on odds of reporting teaching evolution as settled science. Binary logistic regression estimates and 90% confidence intervals (accounting for sample weights and design effects, N = 402). Contrast (baseline) categories included for reference have no confidence intervals completed college coursework with even minimal evolution content.
We next turn to model the emphasis given to evolution and creationism (as assessed by our typology). Because the typology is a non-ordered nominal variable, the appropriate model is a multinomial logistic regression model. The results of that model are reported in Supplementary Materials, Additional file 1: Fig. S1. Because the results for three of the typology outcomes were similar, we report a simpler model in Fig. 2, in which the dependent variable is coded 1 if the teacher is classified as teaching evolution as settled science and 0 otherwise. The sample here, as in the lower panel of Table 8, includes only teachers who reported devoting class time to either evolution or creationism. This coefficient plot shows the relative risk ratio (also called the odds ratio) of teaching evolution as settled science relative to all other alternatives. Markers showing ratios less than one (to the left of the red reference line) mean that the variable reduces the odds of teaching evolution as settled science; ratios over one represent positive effects. The graph also includes 90% confidence intervals around the estimates.
A notable effect revealed by this analysis concerns teacher seniority (which was not a statistically significant factor in the previous model of class hours devoted to evolution). Teachers with more than twenty years of experience are less likely to teach evolution as settled science (their odds of doing so are 49% lower than those with 10-19 years of experience). As shown in the more detailed model reported in Additional file 1: Fig. S1, this effect is primarily driven by more experienced teachers adopting avoidance strategies to navigate instruction in evolution. These teachers are slightly more likely to convey mixed messages but especially likely to convey no messages at all to students regarding the scientific standing of evolution.
Other than seniority, the patterns of effects here are similar to those predicting time devoted to evolution, though the smaller sample size means there is more uncertainty around each estimate. Most notably, the odds that middle school science teachers in NGSS states will report teaching evolution as settled science are more than 1.5 times greater than those in non-NGSS states. Formal course preparation has positive effects as well, but with sizable impacts requiring three or more courses covering evolution. In contrast to the previous model of class hours devoted to evolution, having a degree in science does not have a statistically significant effect (beyond the accompanying increase associated with the coursework required to earn the degree).

Overview
Middle school can and should play a major role in promoting scientific literacy in general, and in laying the groundwork for understanding evolution, the foundational framework for modern biology, in particular. Indeed, the Next Generation Science Standards promote the introduction of basic evolutionary science in the middle grades to serve as the first stage of secondary science instruction. And yet, no previous research had ever sought to measure the extent of evolution teaching and the emphasis given to evolution and creationism in US middle schools. This paper takes a first step toward filling that void.
We find that evolution is less frequently covered as a formal class topic in middle school science classes than in general biology classes typically taken by students in the ninth or tenth grade. This is not surprising, given that middle school science classes are more general and tend to be more multidisciplinary than high school biology. However, those teachers who do cover evolution devote only slightly less time to it than do high school teachers.
Middle school science teachers are more likely than their high school counterparts to report that they promote creationism, send mixed messages about the scientific standing of evolution, or simply avoid endorsing evolution's status as settled science. Consequently, fewer than 40% of middle school science classes are led by a teacher who emphasizes that evolution is a well-established scientific fact. 6 We find that forthright teaching of evolution in the public school middle school science classroom is more likely to occur when teachers themselves have strong science preparation in the form of multiple college courses that covered evolution and hold a degree in science rather than a more general education degree. We also find that instruction is both more extensive and more robust in NGSS states.

Limitations of the present study and suggested directions for future research
A limitation of the study is that the questions used in the survey were not systematically assessed for validity and reliability. This may be of especial concern with regard to those that are prima facie susceptible to multiple interpretations, such as the question assessing personal views on human evolution and the question about "the broad scientific consensus that evolution is fact" (discussed above in notes 2 and 4, respectively). Developed for Berkman and Plutzer's 2007 survey of high school biology teachers (Berkman et al. 2008), these questions were retained for purposes of comparison (as in Plutzer et al. 2020), but it would be desirable to assess their validity and reliability before using them in the future.
A further limitation of the study is that the survey did not probe as deeply as it could have in certain areas of teacher understanding-although obviously no survey can ask about everything that might be relevant. In particular, the survey did not attempt to investigate the degree to which the teachers understand evolution using standard instruments such as MATE (Rutledge and Warden 2000), as, e.g., Glaze and Goldston (2019) did with high school teachers. Similarly, the survey did not attempt to investigate the degree to which the teachers understand the nature of science-a factor correlated with understanding and acceptance of evolution (see, e.g., Lombrozo et al. 2008)-as, e.g., Nehm and Schonfeld (2007) did with high school teachers. Such investigations would be worth conducting in future research.
Similarly, the survey did not probe as deeply as it could have in certain areas relevant to teacher preparation. In particular, no data were collected about licensure. 7 Different states license middle school teachers in different grade bands-for example, in Alabama, middle school teachers are licensed to teach grades 4 through 8, while in North Carolina, they are licensed to teach grades 6 through 9, while Wisconsin offers both an elementary and middle school license for kindergarten through grade 9 and a middle and high school license for grades 4 through 12-and different types of licenses often involve different requirements about teachers' content knowledge. It would therefore be interesting to investigate the connections between licensing and classroom practice with regard to evolution, although, because the effect of different licensing regimes on classroom practice is largely mediated by different approaches to pre-service teacher preparation, we expect that such a study would tend to confirm the present results.

Implications for teacher preparation and public policy
We found that the most senior middle school science teachers are those who are most likely to avoid addressing evolution's scientific status, which suggests that a degree of improvement through retirement and replacement is likely to occur naturally in the coming years. But what positive steps can be taken to improve middle school evolution education?
We found that middle school science teachers were more likely to devote more class hours to evolution and more likely to present evolution as settled science when they themselves have strong science preparation. In light of this finding, it is clear that improving evolution education at the middle school level depends on middle school science teachers acquiring a solid, scientific understanding of evolutionary biology. It is beyond the scope of the present study to address the vexed and complex question of how to do so, but we suggest that a reasonable and achievable goal would be for middle school science teachers to achieve parity, with respect to both time devoted to and emphasis on the settled status of evolution, with their high school counterparts.
It would be helpful for there to be strong incentives for teacher preparation programs to ensure that middle school science teachers learn about evolution properly. Because we found that middle school science teachers were more likely to devote more class hours to evolution and more likely to present evolution as settled science in states that adopted the NGSS, the recommendation for public policy with regards to state science standards is clear: to improve evolution education at the middle school level, adopt the NGSS or standards with a comparable treatment of evolution. Doing so will provide teacher preparation programs with the incentive to ensure that newly minted teachers are able to meet the demands of the standards. In addition, since only 27 states require that middle school teachers pass a subject-specific licensing test (National Council on Teacher Quality 2020), it seems plausible that reforms to licensure that required teachers charged with teaching evolution to demonstrate their mastery of the field of biology would similarly provide incentive to teacher preparation programs to ensure that pre-service teachers learn about evolution properly.

Conclusion
Middle school science teachers play a key role in the science education of U.S. students. Our broad statistical portrait provides an overview that merits elaboration with more detailed research on specific topics such as middle school lesson plans, professional development for middle school science teachers, teacher education curricula, and more. These topics, and many others, have been studied extensively at the high school level. It is time to pay comparable attention to the middle grades, with an eye not only to understanding but also to alleviating the challenges to the teaching of evolution. We hope that this initial study will spur further research on middle school evolution education.

Background
The 2019 Survey of American Science Teachers is the third of a series of three scientific surveys of science teachers. The first, the 2007 National Survey of High School Biology Teachers, was funded by the National Science Foundation and focused on high school biology teachers and their approach to the teaching of evolutionary biology. The second, the 2014-2015 National Survey of American Science Teachers, was conducted by Penn State with the National Center for Science Education and focused on the teaching of climate change. This second study added a sample of middle school teachers and sampled high school teachers of all four core subjects: earth science, biology, chemistry, and physics. The 2019 Survey of American Science Teachers, the third study in the series, retains a focus on high school biology teachers (from the 2007 survey) and middle school science teachers (from the 2014 to 2015 survey).
In order to allow valid comparisons to prior surveys, the most recent effort replicated many of the questions and adhered closely to the study design from previous waves. As a result, when examining the data from identical questions, it is possible to compare this wave's middle school sample to the middle school sample from 2014 to 2015, and to compare the high school biology sample to the 2007 survey and to the biology subgroup within the 2014-2015 high school sample.

Sampling
The 2019 Survey of American Science Teachers employs two stratified probability samples of science educators. The first represents the population of all science teachers in public middle or junior high schools in the United States. The second represents all biology or life science teachers in public high schools in the United States.
There is no comprehensive list of such educators. However, a direct mail marketing company, Market Data Retrieval (MDR, a division of Dun and Bradstreet) maintains and updates a database of 3.9 million K-12 educators.
MDR selected probability samples conforming to our specifications. Specifically, MDR first identified eligible schools (public middle and junior high schools, and public high schools) and then selected all middle school teachers with the job title "science teacher" and all high school teachers with the job title "biology teacher" or "life science teacher. " The middle school universe contained 55,001 teachers with full name, school name and school address. From these, teachers were selected with probability 0.0455 independently from each of 151 strata defined by urbanism (city, suburb, all others) and state, with the District of Columbia being its own stratum. This resulted in a sample of 2511 middle school science teachers. The high school biology universe contained 30,847 teachers with full name, school name and school address. From these, teachers were selected with probability 0.0810 independently from each of 151 strata defined by urbanism (city, suburb, all others) and state, with the District of Columbia being its own stratum. This resulted in a sample of 2503 high school science teachers.
Of the 5014 elements in the two samples, MDR provided current email addresses for 4150, or 82.8%.

Questionnaire design
The questionnaires for this survey included questions employed in the 2007 National Survey of High School Biology Teachers (which focused on the teaching of evolution), and the 2014-2015 National Survey of American Science Teachers (which focused on the teaching of The survey was initially written for pencil/paper administration and-when finalized-programmed so it could be administered on the Qualtrics online survey platform.

Fieldwork
The survey design was a "push to mail" strategy in which all 5014 respondents received an advance pre-notification letter, a survey packet with incentive ($2 in cash) and a postage paid return envelope, two reminder postcards and a replacement survey packet. Non-respondents for whom we had an email address then received an email invitation to complete the survey online.
This included 3161 non-respondents with emails supplied by MDR, and an additional 352 collected during the non-response audit.
Non-respondents then received two additional email reminders. Field dates are summarized in Table 9.

Non-response audit
Beginning on April 11, 2019, after most paper surveys had been received and logged, we identified a subsample of 700 non-respondents, and launched a detailed non-response audit on this group. The primary goal was to confirm or disconfirm their eligibility. From the time we began the audit of non-respondents, we received questionnaires from 62 of these teachers. They were removed from the audit, leaving 638 audited non-respondents.
For each person, we first searched for their school, and sought to locate a current school staff directory. If no directory was found, we searched all classroom web sites at the school, and searched the school web site for the teacher's full name and last name. If we found a match for the teacher anywhere on the school web site, that non-respondent was confirmed as eligible.
In some cases, we found a teacher in the same subject and same first name, but with a different last name. If we were able to absolutely confirm that teacher had recently changed names (e.g., their email matched the name in our list) that teacher was confirmed as eligible.
If we did not find the teacher, we did two broader web searches. First, a search with the teacher's full name and the keyword "science. " In some instances, this brought up results indicating that the teacher had changed jobs or retired (e.g., information on the former teacher's LinkedIn page). These were confirmed as ineligible. We recorded the following outcomes: Teacher confirmed as eligible-listed on school website.
Teacher confirmed as eligible-classroom web pages identified.
Teacher confirmed as eligible-other (e.g., listed in recent news story).

Confirmed ineligible-school has current staff directory, and teacher not listed.
Confirmed ineligible-other (e.g., teacher identified as instructing in a different subject).
Unable to determine-school does not have a staff directory.
Unable to determine-school does not have functional web site.
The final results of the audit are summarized in Table 10.  Thus, of all non-respondents (and assuming ¼ of the unknowns are ineligible) we estimate that 72% are eligible. This is the basis for calculating the "e" component in the response rate (American Association for Public Opinion Research 2006).

Dispositions and response rates
Every individual on the initial mailing list of 5014 names and addresses was assigned a disposition code.
A survey was considered complete if the respondent answered questions from at least two of the following three question groups: Question #1 which asked teachers how many class hours they devoted to each of nine topics (appearing on the second page of the paper questionnaire), a group of attitude questions appearing on pages 7-8 of the written questionnaire; and a group of demographic and background variables on pages 9 and 11 of the paper questionnaire.
A survey was considered partially complete if the respondent answered at least how many class hours they devoted to each of nine topics (appearing on the second page of the paper questionnaire). A summary of the dispositions appears in Table 11.

Response rates
We utilize the response rate definitions published by the American Association for Public Opinion Research (2006). These require an estimate of the percentage of all non-respondents who are eligible or non-eligible (e.g., due to retirement) to complete the survey. This quantity, referred to as e, was estimated from a detailed audit of 638 non-respondents. Based on these dispositions we calculate the response rate (AAPOR response rate formula #4) to be 37%. This is interpreted as the percentage of all eligible respondents who submitted a usable questionnaire (complete or partially complete). Respondents who returned questionnaires that are blank or fail to qualify as partial, are considered non-respondents. The details of the response rate calculation are reported in Table 12.

Response rates by teacher and school characteristics
Response rates can be broken down and estimated for different groups, providing that there are data for nonrespondents as well as respondents. As a result, we cannot test for differences based on questionnaire items (we lack information on seniority, degrees earned, religiosity, and so on for all non-respondents). We can, however, utilize "frame" variables and those provided by the direct mail vendor MDR. Table 13 reports on eight such comparisons.
Teacher characteristics. The response rate was somewhat lower for middle school teachers (34%) compared to high school biology teachers (40%). Using the salutations (Mr., Ms., Miss, Ms., etc.) provided in the direct mail file, we classified teachers as female, male, or gender unknown. The latter group included a small number of teachers with salutations of "Dr. " or "Coach. " However, the large majority had gender-ambiguous first names such as Tracy, Jamie, Kim or Chris. Men (39%) and women (38%) did not differ significantly, but we had a lower return among those whose communications could not be personalized (Dear Kim Smith rather than Dear Mr. Smith, for example). 8 The value of conducting an email follow-up to the pencil/paper survey is evident in the 39% response rate for those teachers with a valid email supplied by the vendor (those lacking an email had a 30% response rate). Note that some of these additional returns were paper surveys returned only after teachers received an email announcing the availability of a web survey.
School type. We had a somewhat lower response rate from teachers at public charter schools (31%). Note, however, that because charters still represent a tiny slice of the public school market, raising their response rate to the overall average would have only increased the number of surveys completed by charter school teachers by three or four.
School demographics. As in previous surveys, we find lower response rates from teachers working in schools with medium or large minority populations. Schools whose student bodies are more than 15% African American or more than 15% Hispanic, or more than 50% free lunch eligible, all had response rates between 30 and 33%.
Urbanism. Finally, response rates did not differ substantially by urbanism except for schools in central cities with populations exceeding 250,000. Teachers in these large school systems responded at a 30% rate.
Overall, we uncovered systematic differences. By and large these are modest in magnitude and do not introduce major distortions in the data. For example, teachers in large central city school systems constituted 12% of the teachers we recruited, and 10% of the final data set. However, since these individual differences might be additive (e.g., central city schools with many minority and school lunch-eligible students), we estimated a propensity model to assess the total impact of all factors simultaneously. Table 14 reports a logistic regression model in which the dependent variable is the submission of a usable survey (scored 1, all other dispositions scored 0, with confirmed ineligible respondents dropped from the analysis).
This confirms most of the observational difference reported in Table 13. The odds ratio column is more intuitive and shows that the odds of returning a usable survey was 26% higher in the high school sample, 30% higher for teachers with a valid email on file, and about 26% higher when we used a gender-based salutation. Teachers at schools with sizable Black and Hispanic presence in the student body are also underrepresented (odds ratios below 1). However, after controlling for student body composition, the effects of school lunch eligibility and urbanism are diminished.
Propensity scores. We use this model to calculate the probability to respond for all original members of the sample. That allows us to calculate the response propensity for all respondents. Those whose characteristics make them unlikely to respond must, therefore, speak on behalf of more non-respondents. We use the inverse of the propensity as a second-stage weighting adjustment.

Weighting
Analysis weights were constructed in a two-stage process. A base weight adjusts for possible under-coverage by the sample supplier and the non-response adjustment balances the sample based on characteristics that are predictive of non-response (e.g., student body composition).
Base weight. MDR claims to have contact information for approximately 85% of all K-12 teachers, but that coverage rate can vary by grade, subject, and state.
We assume that science teachers comprise the same percentage of all middle school teachers in each state, and we assume that biology teachers constitute the same share of high school faculty in each state. It follows that the distribution across states in the MDR data base should be proportional to the number of teachers in each state. If not, adjustment is necessary to make the sample fully representative.
We therefore constructed the following two ratios: Number of middle school teachers as counted by the National Center for Education Statistics Number of middle school teachers in MDR direct mail data base and These were each standardized to have a mean of 1.0 so that ratios above 1 indicate relative under-coverage by MDR.
Non-response calibration. The second stage weight is based on the logistic regression model reported in Table 14. From this model, we calculated the probability of completing the survey (defined as completing a usable survey, classified as "complete" or "partial" in Table 11. The second stage non-response adjustment is simply the inverse of the response propensity, 1/π. Analysis weight (designated as final_weight in the data set) is the product of the first stage coverage adjustment and the second stage non-response adjustment, standardized so it has a mean of 1. The weights range from 0.24 to 3.23, with a standard deviation of 0.35. Ninety percent of the cases have weights between 0.55 and 1.60, indicating that weighting will have only a small impact on statistical results in comparison to unweighted analyses.
Additional file 1: Table S1. Reported pre-service and continuing education coursework on evolution (percentages within row). Figure S1. Effects of policy and formal preparation on teacher emphasis as measured by typology class. Multinomial logistic regression estimates and 90% confidence intervals (accounting for sample weights and design effects).