Appendix
2019 Survey of American Science Teachers: materials and methods
Background
The 2019 Survey of American Science Teachers is the third of a series of three scientific surveys of science teachers. The first, the 2007 National Survey of High School Biology Teachers, was funded by the National Science Foundation and focused on high school biology teachers and their approach to the teaching of evolutionary biology. The second, the 2014–2015 National Survey of American Science Teachers, was conducted by Penn State with the National Center for Science Education and focused on the teaching of climate change. This second study added a sample of middle school teachers and sampled high school teachers of all four core subjects: earth science, biology, chemistry, and physics. The 2019 Survey of American Science Teachers, the third study in the series, retains a focus on high school biology teachers (from the 2007 survey) and middle school science teachers (from the 2014 to 2015 survey).
In order to allow valid comparisons to prior surveys, the most recent effort replicated many of the questions and adhered closely to the study design from previous waves. As a result, when examining the data from identical questions, it is possible to compare this wave’s middle school sample to the middle school sample from 2014 to 2015, and to compare the high school biology sample to the 2007 survey and to the biology subgroup within the 2014–2015 high school sample.
Sampling
The 2019 Survey of American Science Teachers employs two stratified probability samples of science educators. The first represents the population of all science teachers in public middle or junior high schools in the United States. The second represents all biology or life science teachers in public high schools in the United States.
There is no comprehensive list of such educators. However, a direct mail marketing company, Market Data Retrieval (MDR, a division of Dun and Bradstreet) maintains and updates a database of 3.9 million K–12 educators.
MDR selected probability samples conforming to our specifications. Specifically, MDR first identified eligible schools (public middle and junior high schools, and public high schools) and then selected all middle school teachers with the job title “science teacher” and all high school teachers with the job title “biology teacher” or “life science teacher.”
The middle school universe contained 55,001 teachers with full name, school name and school address. From these, teachers were selected with probability 0.0455 independently from each of 151 strata defined by urbanism (city, suburb, all others) and state, with the District of Columbia being its own stratum. This resulted in a sample of 2511 middle school science teachers.
The high school biology universe contained 30,847 teachers with full name, school name and school address. From these, teachers were selected with probability 0.0810 independently from each of 151 strata defined by urbanism (city, suburb, all others) and state, with the District of Columbia being its own stratum. This resulted in a sample of 2503 high school science teachers.
Of the 5014 elements in the two samples, MDR provided current email addresses for 4150, or 82.8%.
Questionnaire design
The questionnaires for this survey included questions employed in the 2007 National Survey of High School Biology Teachers (which focused on the teaching of evolution), and the 2014–2015 National Survey of American Science Teachers (which focused on the teaching of climate change). A few new questions were developed to measure teachers’ perceptions of local public opinion.
The survey was initially written for pencil/paper administration and—when finalized—programmed so it could be administered on the Qualtrics online survey platform.
Fieldwork
The survey design was a “push to mail” strategy in which all 5014 respondents received an advance pre-notification letter, a survey packet with incentive ($2 in cash) and a postage paid return envelope, two reminder postcards and a replacement survey packet. Non-respondents for whom we had an email address then received an email invitation to complete the survey online.
This included 3161 non-respondents with emails supplied by MDR, and an additional 352 collected during the non-response audit.
Non-respondents then received two additional email reminders. Field dates are summarized in Table 9.
Non-response audit
Beginning on April 11, 2019, after most paper surveys had been received and logged, we identified a subsample of 700 non-respondents, and launched a detailed non-response audit on this group. The primary goal was to confirm or disconfirm their eligibility. From the time we began the audit of non-respondents, we received questionnaires from 62 of these teachers. They were removed from the audit, leaving 638 audited non-respondents.
For each person, we first searched for their school, and sought to locate a current school staff directory. If no directory was found, we searched all classroom web sites at the school, and searched the school web site for the teacher’s full name and last name. If we found a match for the teacher anywhere on the school web site, that non-respondent was confirmed as eligible.
In some cases, we found a teacher in the same subject and same first name, but with a different last name. If we were able to absolutely confirm that teacher had recently changed names (e.g., their email matched the name in our list) that teacher was confirmed as eligible.
If we did not find the teacher, we did two broader web searches. First, a search with the teacher’s full name and the keyword “science.” In some instances, this brought up results indicating that the teacher had changed jobs or retired (e.g., information on the former teacher’s LinkedIn page). These were confirmed as ineligible. We recorded the following outcomes:
Teacher confirmed as eligible—listed on school website.
Teacher confirmed as eligible—classroom web pages identified.
Teacher confirmed as eligible—other (e.g., listed in recent news story).
Confirmed ineligible—school has current staff directory, and teacher not listed.
Confirmed ineligible—other (e.g., teacher identified as instructing in a different subject).
Unable to determine—school does not have a staff directory.
Unable to determine—school does not have functional web site.
The final results of the audit are summarized in Table 10.
Thus, of all non-respondents (and assuming ¼ of the unknowns are ineligible) we estimate that 72% are eligible. This is the basis for calculating the “e” component in the response rate (American Association for Public Opinion Research 2006).
Dispositions and response rates
Every individual on the initial mailing list of 5014 names and addresses was assigned a disposition code.
A survey was considered complete if the respondent answered questions from at least two of the following three question groups: Question #1 which asked teachers how many class hours they devoted to each of nine topics (appearing on the second page of the paper questionnaire), a group of attitude questions appearing on pages 7–8 of the written questionnaire; and a group of demographic and background variables on pages 9 and 11 of the paper questionnaire.
A survey was considered partially complete if the respondent answered at least how many class hours they devoted to each of nine topics (appearing on the second page of the paper questionnaire). A summary of the dispositions appears in Table 11.
Response rates
We utilize the response rate definitions published by the American Association for Public Opinion Research (2006). These require an estimate of the percentage of all non-respondents who are eligible or non-eligible (e.g., due to retirement) to complete the survey. This quantity, referred to as e, was estimated from a detailed audit of 638 non-respondents. Based on these dispositions we calculate the response rate (AAPOR response rate formula #4) to be 37%. This is interpreted as the percentage of all eligible respondents who submitted a usable questionnaire (complete or partially complete). Respondents who returned questionnaires that are blank or fail to qualify as partial, are considered non-respondents. The details of the response rate calculation are reported in Table 12.
Response rates by teacher and school characteristics
Response rates can be broken down and estimated for different groups, providing that there are data for non-respondents as well as respondents. As a result, we cannot test for differences based on questionnaire items (we lack information on seniority, degrees earned, religiosity, and so on for all non-respondents).
We can, however, utilize “frame” variables and those provided by the direct mail vendor MDR. Table 13 reports on eight such comparisons.
Teacher characteristics. The response rate was somewhat lower for middle school teachers (34%) compared to high school biology teachers (40%). Using the salutations (Mr., Ms., Miss, Ms., etc.) provided in the direct mail file, we classified teachers as female, male, or gender unknown. The latter group included a small number of teachers with salutations of “Dr.” or “Coach.” However, the large majority had gender-ambiguous first names such as Tracy, Jamie, Kim or Chris. Men (39%) and women (38%) did not differ significantly, but we had a lower return among those whose communications could not be personalized (Dear Kim Smith rather than Dear Mr. Smith, for example).Footnote 8
The value of conducting an email follow-up to the pencil/paper survey is evident in the 39% response rate for those teachers with a valid email supplied by the vendor (those lacking an email had a 30% response rate). Note that some of these additional returns were paper surveys returned only after teachers received an email announcing the availability of a web survey.
School type. We had a somewhat lower response rate from teachers at public charter schools (31%). Note, however, that because charters still represent a tiny slice of the public school market, raising their response rate to the overall average would have only increased the number of surveys completed by charter school teachers by three or four.
School demographics. As in previous surveys, we find lower response rates from teachers working in schools with medium or large minority populations. Schools whose student bodies are more than 15% African American or more than 15% Hispanic, or more than 50% free lunch eligible, all had response rates between 30 and 33%.
Urbanism. Finally, response rates did not differ substantially by urbanism except for schools in central cities with populations exceeding 250,000. Teachers in these large school systems responded at a 30% rate.
Overall, we uncovered systematic differences. By and large these are modest in magnitude and do not introduce major distortions in the data. For example, teachers in large central city school systems constituted 12% of the teachers we recruited, and 10% of the final data set. However, since these individual differences might be additive (e.g., central city schools with many minority and school lunch-eligible students), we estimated a propensity model to assess the total impact of all factors simultaneously.
Table 14 reports a logistic regression model in which the dependent variable is the submission of a usable survey (scored 1, all other dispositions scored 0, with confirmed ineligible respondents dropped from the analysis).
This confirms most of the observational difference reported in Table 13. The odds ratio column is more intuitive and shows that the odds of returning a usable survey was 26% higher in the high school sample, 30% higher for teachers with a valid email on file, and about 26% higher when we used a gender-based salutation. Teachers at schools with sizable Black and Hispanic presence in the student body are also underrepresented (odds ratios below 1). However, after controlling for student body composition, the effects of school lunch eligibility and urbanism are diminished.
Propensity scores. We use this model to calculate the probability to respond for all original members of the sample. That allows us to calculate the response propensity for all respondents. Those whose characteristics make them unlikely to respond must, therefore, speak on behalf of more non-respondents. We use the inverse of the propensity as a second-stage weighting adjustment.
Weighting
Analysis weights were constructed in a two-stage process. A base weight adjusts for possible under-coverage by the sample supplier and the non-response adjustment balances the sample based on characteristics that are predictive of non-response (e.g., student body composition).
Base weight. MDR claims to have contact information for approximately 85% of all K–12 teachers, but that coverage rate can vary by grade, subject, and state.
We assume that science teachers comprise the same percentage of all middle school teachers in each state, and we assume that biology teachers constitute the same share of high school faculty in each state. It follows that the distribution across states in the MDR data base should be proportional to the number of teachers in each state. If not, adjustment is necessary to make the sample fully representative.
We therefore constructed the following two ratios:
$$ \frac{{\begin{array}{*{20}{c}} {Number\;of\;middle\;school\;teachers\;as\;counted\;} \\ {by\;the\;National\;Center\;for\;Education\;Statistics} \end{array}}}{{\begin{array}{*{20}{c}} {Number\;of\;middle\;school\;teachers\;in\;MDR} \\ {direct\;mail\;data\;base} \end{array}}} $$
and
$$ \frac{{\begin{array}{*{20}{c}} {Number\;of\;high\;school\;teachers\;as\;counted\;} \\ {by\;the\;National\;Center\;for\;Education\;Statistics} \end{array}}}{{\begin{array}{*{20}{c}} {Number\;of\;biology\;teachers\;in\;MDR} \\ {direct\;mail\;data\;base} \end{array}}} $$
These were each standardized to have a mean of 1.0 so that ratios above 1 indicate relative under-coverage by MDR.
Non-response calibration. The second stage weight is based on the logistic regression model reported in Table 14. From this model, we calculated the probability of completing the survey (defined as completing a usable survey, classified as “complete” or “partial” in Table 11.
The second stage non-response adjustment is simply the inverse of the response propensity, 1/π.
Analysis weight (designated as final_weight in the data set) is the product of the first stage coverage adjustment and the second stage non-response adjustment, standardized so it has a mean of 1. The weights range from 0.24 to 3.23, with a standard deviation of 0.35. Ninety percent of the cases have weights between 0.55 and 1.60, indicating that weighting will have only a small impact on statistical results in comparison to unweighted analyses.