The teaching of evolutionary theory and the Cosmos–Evidence–Ideas model

Evolutionary theory (ET), as many researchers have pointed out, is one of the cornerstones of Biology, whose understanding facilitates the study of all its other fields since it offers general and dominant explanations for the phenomena it examines. Thus, the intense research activity presented in relation to the teaching and learning of evolution is justified. Various methodological approaches attempt to conclude in an effective way how to overcome the barriers associated with the acceptance and understanding of ET. In the present research, the usefulness of the Cosmos– Evidence–Ideas (CEI) model as a tool for enhancing the effectiveness of selected activities for teaching ET is tested. Two different Teaching Learning Sequences (TLS) were designed, implemented, and evaluated, in one of which CEI was used as a design tool. Next, a comparison of the evaluation outcomes of the two TLSs was conducted. It was found that students from both groups increased their performance. This increase was slightly greater for the students who were taught evolution through the TLS, designed with the CEI model. An interpretation is given for the extent of that increase related to the model’s characteristics, and suggestions for better improvement in the future are included. To sum up, there are indications that the CEI model might have the potential to enhance the effectiveness of a TLS for ET when used as a design tool.


Introduction
A report by leading university professors from across Europe on science education (Osborne and Dillon 2008) acknowledges that the biggest problem in the curriculum of science teaching is the absence of general dominant explanations for the phenomena that are taught.Teaching is limited to providing, in a piecemeal way, knowledge that seems unrelated to each other, and students are not directed to relate them through a unifying theory, which would give meaning to learning by providing answers to students' big questions about how the world works.Therefore, the need to reform the curriculum for science education is highlighted, with the aim of addressing the above deficiency.Especially for Biology, the main unifying theory, which constitutes the explanatory framework for all biological phenomena, is the Evolution Theory (ET).
Teaching ET is a difficult task for science teachers, and this is due to several reasons related to both its acceptance and understanding.Research in cognitive psychology and science education has shown that the relationships between understanding, acceptance, belief, knowledge, and preference are complex, poorly understood, and controversial (Smith 1994;Southerland 2000), and this is the case for the acceptance of ΕT too.Most studies in the field link ET acceptance with knowledge (Bishop and Anderson 1990;Lawson and Worsnop 1992;Meadows et al. 2000;Sinatra et al. 2003;Scharmann 2005;Rice et al. 2011Rice et al. , 2015;;Hermann 2012;Kahan 2015), religiosity (Plutzer and Berkman 2008;Masci 2009;Athanasiou and Papadopoulou 2012;Ha et al. 2012;Rissler et al. 2014), and trust in science (Nadelson and Hardy 2015).Teaching and comprehending the nature of science (NOS) has been proposed as a way to deal with the resistance to acceptance (Athanasiou et al. 2012;Bramschreiber 2014;Cofre et al. 2018;Nelson et al. 2019) without, however, being certain that this will also improve understanding (Crawford et al. 2005;Nehm and Schonfeld 2007).Kreher and McManus (2023) suggest cross-curricular teaching of evolution in order to improve its acceptance.Interdisciplinary teaching seems to have positive effects on understanding as well (Hanish and Eirdosh 2020; Kreher and McManus 2023).Hermann (2013) argues that Biology teachers find themselves at the intersection of a multitude of scientific, religious, political, and social factors related to the teaching of evolution.Political and religious ideologies likely prevail against science when they conflict because people choose to believe only those facts that support their worldview (Wood et al. 2012).Teachers often choose to teach evolution selectively or not at all (Nunez et al. 2012) due to several reasons (Meadows et al. 2000;Sanders and Ngxola 2009;Nunez et al. 2012;Borgerding et al. 2015;Klahn 2020).Also varying is the different emphasis they place on teaching evolution as opposed to creationism (Plutzer et al. 2020).Kampourakis (2022) poses the need to educate future teachers so as to distinguish two types of literacy among evolution teaching: evolution literacy, related to content knowledge and skills, and evolutionary literacy, involving socio-ethical implications of scientific knowledge.Of particular interest is the fact that in countries that have adopted the Next Generation Science Standards (NGSS: NGSS Lead States 2013), qualitative differences were found in the way teachers approach the teaching of evolution.Those with long experience spend more time teaching it, while younger ones tend to present it to their students as an established science (Branch et al. 2021).
Especially for Greek teachers, the relevant studies (Athanasiou and Papadopoulou 2012;Katakos et al. 2013) highlight as the most important factor affecting the acceptance of ET the limited knowledge due to a lack of systematic teaching interventions on the subject.Moreover, (Prinou et al. 2008;Venetis and Mavrikaki 2017), teachers have many alternative understandings of the concepts related to EΤ and this acts as a hindrance in their choice to teach EΤ or whatever effort they intend to make to help their students overcome the conceptual barriers that make it difficult for them to understand it.
Regarding the conceptual barriers that make it difficult for teachers and students to understand ΕΤ, the most important ones are essentialism, teleology, and causality by intention (Kampourakis 2020a(Kampourakis , 2020b)).Essentialism emerges very early in childhood (Setoh et al. 2013) and is the intuitive understanding that all living things consist of a "substance" that is passed down from parents to their children and has the ability to be transformed when circumstances require it.Teleology, often seen in children's and adults' justifications of evolutionary phenomena, is the notion that all objects and living things are made for a purpose.Teleology is associated with causality by intention, according to which the purpose of things has been assigned by some intelligent entity (Zogza et al. 2009).
The foregoing conceptual barriers were found to be the primary causes of errors in student justifications of evolutionary phenomena in the historical surveys conducted by Brumby (1979Brumby ( , 1984)), Bishop and Anderson (1990) and others (Greene 1990;Settlage 1994;Demastes et al. 1995;Nehm and Schonfeld 2007).The general scheme followed by these justifications is that a change occurring in the environment introduces mutations into the organisms, which adapt the individuals to the changed conditions, and these acquired characteristics of the individuals are passed on to the next generation.Research that followed attempted to group the interpretations by various criteria, such as the way they are created (Smith 2010) or whether they use naturalistic or creationist argumentation (Evans et al. 2010;To et al. 2017).However, the general consensus is that understanding evolution processes is difficult because they are largely counterintuitive (Garvin-Doxas and Kymkowsky 2008) and include many challenging concepts (Jordens et al. 2016;Tibell and Harms 2017).Understanding is driven by intuitive reasoning, which leads to misunderstandings of ideas like natural selection, the source and function of diversity, the evolution unit, the rate and course of evolution, etc. (Evans et al. 2012;Kampourakis and Zogza 2008;Kelemen et al. 2013).Zabel and Gropengiesser (2011) pointed out that students learning evolution theory is not a linear process; it can be visualized as roaming on a landscape where each student formulates his or her own learning trajectory.Related to the above is the observation that the incorrect use of scientific terminology is commonplace, especially for novice learners (Ryan 1985;Nehm et al. 2010;Rector et al. 2013), which makes communication on evolutionary issues difficult in many ways.
Given the importance of teaching evolution and the variety of challenges it poses, as described above, various empirical studies have been carried out over the years, seeking ways to teach ET in an effective way.Indicatively, Jiménez-Aleixandre (1992) highlighted as an important factor for improving the understanding of ET and the retention of knowledge the discussion with the students about how they themselves perceive the Lamarckian ideas, in contrast to the simple comparison of general Lamarckian concepts with Darwinian theory.Banet and Ayuso (2003) used teaching methods that were explicitly grounded in the theory of conceptual change, describing them as ``comprehension-based learning through action'' and reporting that understanding and retention of knowledge improved.Geraedts and Boersma (2006) suggest their method, which they call "guided reinvention", as effective.Kampourakis and Zogza (2009) report positive outcomes from their teaching, which encouraged students to confront their conceptual conflicts.Perez and Gutierrez (2015) developed the "Weaving Evolutionary Thinking" method following a specific methodology in content clarification and reported encouraging results for understanding.Jördens et al. (2016) found that students' interpretations of evolutionary change appeared disjointed between levels of life organization and were significantly improved when they were given opportunities during instruction to interact with these levels and learn how they relate to each other.Doudna (2016) was able to significantly improve his students' understanding of ET using the technique of "hierarchical repetition", combined with active learning practices.Asterhan and Dotan (2018) found that students who received feedback on their erroneous responses on open ended questions, regarding natural selection, outperformed control students.Pobiner et al. (2018) had positive results regarding the understanding of ET by teaching it through human examples.Nevertheless, Grounspan et al. (2021) focus attention on the contradictory results that teaching evolution through human examples has, depending on the diverse students' backgrounds.Bertka et al. (2019) concluded that respecting students' cultural and religious beliefs is an effective way to create a supportive classroom climate that will facilitate the development of learning.A more recent study (Maley and Seyedi 2022) tests using fiction to teach science.According to that research, when students are asked to apply the scientific ideas to their favorite stories, their enjoyment, engagement, and interest increase, and their assessment scores might improve.
The preceding overview leads to the conclusion that the magnitude of the difficulties that appear in the field of teaching and learning in ET requires multiple ways of teaching management to be investigated and proposed.The present research seeks to propose an additional tool that has the potential to enhance the effectiveness of the selected teaching activities, regardless of the teaching method followed.It is based on the didactic utilization of two epistemological models for the organization and design of an effective Teaching Learning Sequence (TLS) for ET.A TLS is a short-term intervention research project that comprises teachers' guides with well-documented teaching strategies and anticipated student responses, as well as teaching-learning activities that have been empirically verified and tailored to student reasoning (Μeheut and Psillos 2004;Psillos and Kariotoglou 2016).The development of a TLS presupposes the repetition of successive cycles of implementation and formative evaluation of teaching activities.The first model used for this purpose is the model of Educational Reconstruction, which was chosen as it is quite widespread as a tool for designing TLSs.Key components of the model are the clarification of content and the analysis of its educational value, research in teaching and learning, the design of learning environments, and assessment (Duit and Treagust 2003).
The second model, Cosmos-Evidence-Ideas (CEI) (Tselfes 2003), presented in Fig. 1, has been used in previous studies (Psillos et al., 2004;Kallery et al. 2009) for the epistemological analysis of teaching activities with the aim of highlighting their characteristics that contribute to their effectiveness.The model, based on Hacking's (1992) classification of laboratory entities, groups the "components" of teaching activities into three distinct entities: Cosmos (sample, data generators, devices that interact with the sample, raw data such as change graphs, photos, etc.), Evidence (data that has undergone some form of processing, e.g., estimation, reduction, analysis, interpretation), Ideas (theoretical concepts, beliefs, questions, fundamental knowledge, theoretical models, systematic theories).These entities constantly interact among themselves and are transformed as a result of this interaction.
Specific types of activities are suggested (Kallery et al. 2009) through which these individual entities are related to each other (Table 1).That is, connections are made, which are claimed to improve understanding by directing students' attention to specific concepts, processes, and relationships that make up an integrated activity.Some of the activities are representational (R), since their dominant feature is the use of language to represent Ideas, Evidence, and the effect of Cosmos on them.Some others are interventional (I), in the sense that they guide students to take action and modify a part of Cosmos, based on ideas or evidence.Psillos et al. (2004) found that a long-term, persistent empirical effort to develop rich teaching activities, for a specific topic, leads to the realization of the connections expected from the CEI model.This fact is considered proof of the validity of the model as a tool for predicting the efficacy of an activity.Moreover, the enrichment of activities suggests a perception of CEI as a general tool through which the deliberate combination of hands-on knowledge with theoretical modeling can lead to potentially rich activities that can be adapted to multiple contexts.
The aforementioned empirical studies, taken as a whole, conclude with design principles that shed light on particular facets of instruction that they believe are crucial for improving understanding.Whichever of these approaches one chooses to follow in their teaching can be combined with the CEI model to further increase its effectiveness.The model treats each educational activity as a set of interactions, each of which alone and collectively contributes equally to the efficacy of the activity.An instructional activity's effectiveness could be negatively impacted if some of these interactions are missing or inadequate.The benefit of the present proposal is that it offers the opportunity for prediction and focused intervention during planning in order to bridge such gaps and improve the efficiency of activities.As a result, the required iterative cycles of implementation and formative evaluation during the development process of a TLS could be reduced.
Following is an analysis of how empirical research was designed and implemented, the main purpose of which was to determine whether the CEI model could enhance students' understanding of evolution.It examines whether adopting the model during the design phase of a TLS, which is already based on effective teaching proposals, can assist students in weakening their alternative views and incorporating more scientific ideas into the arguments they are expected to provide for evolutionary phenomena.This study is part of a wider project to test the validity of the CEI model as a design tool for efficient TLSs in several biological subject areas.A related study is being conducted, but it differs from the one that was given in that it focuses on teaching the basic concepts of ecology.The core study question-whether the model is beneficial in the teaching of various biological subject areas-should be answered as a result of the findings of the two concurrent studies.The objective of the research is to propose the CEI model as a tool for the a priori description and modeling of the teaching activities of a TLS, in a way that leads to their targeted modification so as to improve their effectiveness.

Participants
Two different TLSs were implemented in two different groups of the 3rd grade of a junior high school (14-15 years old) in northern Greece, in which the first author taught in the academic year 2021-2022.The groups had a similar composition in terms of the students' performance in the natural sciences and their interest in Biology.They were the only 3rd grade classes in the school, so there was no alternative for the selection of participants.Both groups had 21 students; however, those who were absent from the initial or final assessment or from more than one teaching scenario were excluded.Thus, 15 students from one group and 18 from the other finally participated in the research.

The TLSs
Initially, a TLS of six teaching scenarios lasting one hour each was designed following the guidelines of the Educational Reconstruction model ( and sociability (Deci and Ryan 2004).Furthermore, models were used for the instruction taking in mind suggestions for their successful use (Goodrum 2004;Harrison and Coll 2008;Sickel and Friedrichsen 2012).The activities included in the scenarios were mainly inquiry-type, and their implementation required the cooperation of students in small or larger groups.The worksheets that accompanied them had a suitable structure to facilitate the students in collecting and processing of the data as well as drawing the conclusions they were aiming for.
The main concepts that addressed each scenario and the indicative activities through which they attempted them are briefly presented in Table 2.This initial TLS (TLS 1) was analyzed according to the CEI model to identify which of the connections between the entities were included in its activities.Subsequently, the activities were modified and enriched in such a way as to include all types of promoted connections between the three entities of the model.This process resulted in the revised TLS 2. Table 2 shows the differences between the TLSs in terms of the types of connections between the entities of the model promoted by their activities.The two TLSs were implemented in parallel and independently in the two groups that participated in the research (Figs. 2, 3).

The evaluation of the TLSs
The evaluation of the TLSs and the comparison of their effectiveness were made using, as a main tool, a prepost questionnaire designed for the purposes of this research.Its design was based on already existing questionnaires in the same cognitive field, which have been tested for their reliability and validity (Andersson and

Table 2 Brief description of teaching scenarios
Italics describe activities added to TLS 2

Teaching scenario Main activities Connections
TLS 1 TLS 2 1 Similarities and differences Students observe pictures of organs belonging to different organisms, learn how to interpret a phylogenetic tree and guided to conclude common descent using classification, cross-matching, and discussion Νone of the previous questionnaires were appropriate for the current study, since the concepts included in the TLS differed from the concepts whose understanding was assessed using them.The questionnaire that was created included eight multiple-choice questions (1-8 ) and three openended questions (9-11).The multiple-choice ones had four possible answers, of which only one was correct and was scored one point, while the remaining answers were widely held alternative ideas of the students, recorded in the literature, and scored zero (maximum score 8 points).In pairs, they had a similar theme and tried to find out the students' understanding of the role of chance in evolution, the role of natural selection, the unit of evolution, as well as the origin and importance of biodiversity (see Appendix) . The three open-ended questions were about problem-solving.In these, the students were asked to interpret evolutionary phenomena.Students' answers before and after instruction to the open-ended questions were expected to differ in two qualitative characteristics: • In the alternative ideas on which they base the justifications of evolutionary phenomena • In the scientific ideas they incorporate into their answers The quantification of the answers to the open-ended questions based on the alternative ideas they include was rejected as an option because, as documented by the literature (Zabel and Gropengiesser 2011), the movement of the students towards the scientific positions regarding the understanding of ET cannot be perceived as a linear path.Rather, it can be likened to conceptual wandering, which practically means that weakening an alternative idea does not mean adopting the scientific perspective on the phenomenon under consideration.It is more likely that it will be replaced by another alternative idea, and later by another, and this will happen several times until successive conceptual conflicts shape a perception that will progressively incorporate more and more elements of the scientific interpretation of the subject.
Thus, from the above two qualitative characteristics that differentiate the students' answers, the second one was chosen as a criterion for their quantification.The number of scientific concepts that students include in their answers is indicative of their movement towards scientific positions.Even the mere mention of such a concept in the context of a question indicates that the student has somehow begun to associate it with the interpretation of the phenomenon.Sometimes this association is at its starting point; in that case, the concept is simply referred to or may be incorporated incorrectly.In other cases, the association is complete, so the concept is adequately incorporated into the justification of the evolutionary phenomenon and adds to its overall interpretation.
In conclusion, the rating scale created to quantify the answers to questions 9, 10, and 11 of the pre-post questionnaires was formed as follows: 1 point for each scientific concept mentioned +1 point when properly incorporated into the justification +1 point when illustrated by example The analysis of the open-ended questions was carried out independently by two researchers using the same rating scale.The scores were compared, and when they differed, discussion followed in order for the researchers to come to an agreement on a new score.
The questionnaire as a whole was piloted with a group of 17 students to identify conceptual, linguistic, and other types of difficulties.It was also given to a group of experts in the didactics of biology (a university professor, a PhD holder, and four PhD candidates) to evaluate its content validity.Following feedback from the student panel and the expert panel, modifications were implemented to increase its validity.To test the reliability of the tool, it was given for completion to 86 students (same age, same social characteristics) who did not participate in the research.Based on their responses to the MCQs, Crombach's a was calculated for that type of questions, and found to be 0.706, which is reliable, according to Cohen et al. (2002).Thus, it was used for the study with no further modifications, since the pilot group and the target group shared the same social and cultural characteristics.
In order to evaluate the TLSs, the mean scores across all the MCQs were calculated (possible score range 0-1).In the same way, the average scores for the total three open-ended questions were calculated (minimum score: zero, maximum: indefinable, depended from students' answers).Student' s performance was compared, as follows: • The students of the implementation group with those of the comparison group before (pre) the implementation of the TLS • The students of the implementation group with those of the comparison group after (post) the implementation of the TLS • Before (pre) and after (post) the TLS for the students of the comparison group • Before (pre) and after (post) the TLS for the students of the implementation group The above comparisons were made using independent samples and paired samples t-tests, while in the case of non-parametric tests, Mann-Whitney U and Wilcoxon signed-rank tests, or Sign tests, were used, respectively.The control of the size of statistically significant differences was made with Cohen's d and r coefficients.The detection of extreme values resulted from the overview of the boxplots, while the tests of the normal distribution of the samples were made using the Shapiro-Wilk's test.The homogeneity of the samples was judged by Levene's test, and the level of significance for conducting the tests was set at 0.05 (95%).The data was analyzed using the IBM SPSS Statistics V.29 program.

Comparison of the groups before the implementation of the TLS
An independent sample t-test was applied to investigate if there was a difference in the average performance on the multiple-choice questions between the two groups before the TLS was implemented.Parametric test was chosen as the distributions of the two groups were normal (Shapiro Wilk's, TLS 1 group: p = 0.240 > 0.05 and TLS 2 group: p = 0.097 > 0.05) while there is homogeneity of fluctuations (Levene's test, p = 0.515).The performance of the TLS 1 group in the multiple-choice questions in relation to the TLS 2 group (Table 3) had no statistically significant difference: mean = 0.028, 95% CI [− 0.113, 0.169], t (31) = 0.401, p = 0.691.At the same time, it was tested if there is a difference in the performance averages for the open-ended questions too (see Appendix), between the two groups, before the implementation of the TLS.The distribution of values (Table 4) was not normal (Shapiro Wilk's, TLS1 group: p = 0.035 < 0.05, TLS 2 group: p = 0.45 < 0.05) so, non-parametric Mann-Whitney U test was chosen.It was observed that the mean score distributions of the two groups were not similar.It was found that the performance of the TLS 1 group (mean rank = 16.430) in the open-ended questions in relation to the TLS 2 group (mean rank = 17.470) did not have a statistically significant difference, U = 143.500,z = 0.320, p = 0.762.
According to the above results, the level of prior knowledge that the students had on the topics examined with the questionnaire did not differ significantly between the two groups.

Comparison of the groups after the implementation of the TLS
It was examined whether there is a difference in the average performance on the multiple-choice questions, between the two groups, after the implementation the TLS.An independent samples t-test was chosen as the distributions of the two groups were normal (Shapiro Wilk's, TLS 1 group: p = 0.261 > 0.05 and TLS 2 group p = 0.394 > 0.05) while there is homogeneity of fluctuations (Levene's test, p = 0.947).There was no statistically significant difference between the TLS 1 group's performance on the multiple-choice questions and the TLS 2 group's performance (Table 3 The existence of a difference in the average performance in the open-ended questions, after the application of the teaching scenario, between the two groups was tested.Its distribution was not normal (Shapiro Wilk's, TLS 2 group: p = 0.022 < 0.05, TLS 1 group: p = 0.193 > 0.05) so, non-parametric Mann-Whitney U test was chosen.A non-statistically significant difference between the TLS 1 group and the TLS 2 group was observed after the TLS was implemented (mean rank = 13.670 vs. 19.780;U = 185.000,z = 1.827, p = 0.073).
Completing the comparison, it appears that the level of students' later knowledge on the topics examined with the questionnaire did not differ statistically between the two groups.

Before and after, TLS 1 group
A Paired samples t-test was then used in order to determine if there is a statistically significant difference between the mean score before and after the implementation of the TLS for the comparison group, in terms of multiple-choice questions.The distribution of the difference of the averages is normal (Shapiro Wilk's, p = 0.598).The students in the TLS 1 group had better performance in the multiple-choice questions after the implementation of the TLS than before (Table 3).However, the 0.025 increase in mean is not statistically significant (95% CI [− 0.098, 0.148], t (14) = 0.435, p = 0.670).
On the open-ended questions, however, there was a significantly better improvement than on the multiplechoice questions.In order to confirm the above difference, a comparison was made of the average scores before the implementation of the TLS and after it for the comparison group, in terms of open-ended questions.The difference in means follows a normal distribution (Shapiro Wilk's, p = 0.074).The average score of the students in the TLS 1 group after the TLS increased compared to that before the implementation of the TLS (Table 3).The increase is of the order of 0.889, statistically significant and very large (95% CI [0.535, 1.242], t (14) = 5.394, p = 0.000 < 0.05, d = 1.393).
Finally, it seems that the students in the TLS 1 group improved their performance after teaching, but this improvement is statistically significant only for the openended questions of the questionnaire.

Before and after, TLS 2 group
Also of interest is the analysis of the average scores of the students in the TLS 2 group before and after the TLS, initially for the multiple-choice questions.It was checked whether there is a statistically significant difference between the mean score before and after the implementation of the TLS for the application group, in terms of multiple-choice questions.The distribution of the difference is not normal (Shapiro Wilk's, p = 0.068) symmetric only (histogram overview) so, non-parametric Wilcoxon signed-rank test was chosen for comparison.There was improvement in the median of the averages after the implementation of the TLS for the multiplechoice questions compared to before (Table 3), although this difference (0.115) was not statistically significant (z = 1.710, p = 0.087).On the open-ended questions of the same group, all 18 students improved their average scores.The statistical test chosen in this case too is the Paired Samples t-test as the distribution of the difference is normal (Shapiro Wilk's, p = 0.078).The average score of the students in the TLS 2 group after the TLS increased compared to that before the implementation of the TLS (Table 3).An increase of 1.426 was observed, which was statistically significant and very large (95% CI [1.076, 1.776], t (17) = 8.606, p = 0.000 < 0.05, d = 2.028).
In conclusion, it appears that the level of subsequent knowledge of the students in the TLS 2 group improved for the topics examined by the open-ended questions (9-11) of the questionnaire.The same does not apply to the topics covered by the multiple-choice questions (1-8) in which the improvement has not been found statistically significant.
To sum up, the two groups having started from the same cognitive level, increased their performance on the open-ended questions in a statistically significant way.That increase seems to be greater in the TLS 2 group compare to TLS 1 group (d = 2.028 and 1.393 respectively).

Qualitative differences
Focusing on the qualitative characteristics of the students' responses, we highlighted significant differences between the two groups.After instruction, the students in the TLS 2 group included common descent, mutations, and natural selection with a significantly higher frequency in their answers (marked with bold in Table 5).This was determined by counting the mentions of particular scientific concepts in the students' answers to the open-ended questions (Table 5).However, in the TLS 1 group, no corresponding large increase was observed in any of the concepts identified in the set of before and after responses.

Conclusions-discussion
Summarizing the results, we find that the two groups that participated in the research cognitively started from the same starting point regarding the topics examined by the questionnaire, and both had an increase in their performance, but only in the open-ended questions did that increase prove to be statistically significant.However, they ended up not having a statistically significant difference in final performance, even though the improvement of the TLS 2 group seems to be better than that of the TLS1 group, regarding the effect size.The qualitative differences in the final answers the students provided to the open-ended questions also reflect the difference in improvement.Students in the TLS 2 group used scientific ideas like common ancestry, mutations, and natural selection more frequently in their responses than those in the TLS 1 group.Therefore, it appears that TLS 2 students received more benefit from the instruction, though not to the extent that this difference is statistically significant.The results of earlier studies on the teaching and learning of ET (Zabel and Gropengiesser 2011) and the use of the CEI model as a TLS assessment tool (Psillos et al., 2004) are consistent with the interpretation of these results, which is discussed below.
First, the finding that there was no statistically significant difference in students' performance on the multiple-choice questions of the questionnaire was partly expected.As mentioned in the assessment mode analysis, it is clear that students follow different learning paths to understand evolution through natural selection.The learning process is best likened to wandering through a conceptual landscape rather than climbing a one-dimensional ladder.Each student follows his own trajectory to reach the same point as others (Zabel and Gropengiesser 2011).Students successively jump over conceptual barriers and gradually approach a perception that is closer to the scientific point of view.This process does not begin and end with the start and end of a teaching intervention.However, the evaluation of the teaching intervention is limited to a snapshot of this learning process, which continues long after it has ended.Therefore, the way the questionnaire' s responses were analyzed, could only detect how many students reached the final-desired point of this learning path, after the end of the intervention.Based on what has been said above, it is not surprising that these students were few.This fact cannot therefore be considered an indication of the failure of teaching interventions.Having in mind this fact, the questionnaire also contained open-ended questions, allowing researchers to learn more about the students' learning trajectory and the scientific concepts they used to support their arguments.In the open-ended questions, it appeared that there was a statistically significant improvement in both groups, indeed a very large one.So, the teaching interventions helped the students move closer to the scientific positions.The use of the CEI model in the design phase resulted in the improvement of an already effective TLS, which was based on effective evolution teaching proposals sought in the literature.
The second issue that arises and needs interpretation in order to answer the research question is that differences in the final performances of the students in the two different groups were not so grade as to be found statistically significant, although two different teaching approaches were used.Therefore, an analysis is required for the kinds of differences that the two distinct teaching approaches had.The selection of the activities that made up the original TLS, without taking into account the CEI model, was made in a random way, in terms of the number of the CEI connections included.Their empirically assessed teaching value was the only criterion for being included in the TLS.Their subsequent analysis revealed that they included several of the connections promoted by the model.In fact, in each scenario, in addition to the representational connections that are common in most activities performed in science classrooms, there are also interventional connections.So, modifying them to fully meet the model specifications involved adding some representational type connections.This observation is of particular importance and is linked to the results of Psillos et al. (2004).They analyzed the successive versions of a TLS in the thematic area of fluids in physics that evolved in parallel with teaching trends over time.
In their first analysis of curriculum activities, they found a lack of intervention practices.This changed in all successive TLSs, which emphasized different standards of laboratory work by students.That is, as TLS evolved, interventional connections were added to its activities.Representational connections are more likely to already exist in selected activities because of the ease with which students interact with real-world objects through their ideas.As it seems, the crucial component for the effectiveness of an activity is the co-existence of interventional and representational connections between the model' s entities.The distinction between representational and interventional connections is instructively fruitful both for the analysis and for the planning of activities, since it makes possible the identification of activities within a TLS that are dominated by the interaction of human and material factors and their distinction from those in which human interaction has the first say (Psillos et al., 2004).Taking into account what was said above in order to explain the results of the present research, we find that, in the initial TLS of evolution (TLS 1), there were representational and interventional connections from the start, and this, as it turned out in retrospect, was the essential ingredient for its effectiveness, even though certain subconnections of one or another type were absent.Thus, the two TLSs, while seemingly having differences, were not so many or so important, according to the model, as to affect their relative effectiveness largely.However, even these small changes made to the original TLS using the CEI model had an impact on how effective it was.This result is an indication of its potency as a tool for creating efficient TLSs.It would be interesting to look into the results of similar research in which a comparison would be made between two TLSs, one of which has fewer interventional-type connections.That research is already in progress and is the one mentioned in the methodology, which includes TLSs for basic concepts of ecology.In light of the results of the aforesaid study, it could be possible to test the conclusion of the present study that the essential component that increases the effectiveness of a TLS is the simultaneous presence of representational and interventional connections to its activities.Also, it could be confirmed that the CEI model has the potential to increase the effectiveness of a TLS, that has been created following effective teaching proposals concluded from previous empirical research.
In closing, it would be remiss not to mention the main limitations of the research.The most important was the small number of participants.To address this issue, a teacher outside of the research group who teaches in a school with more children would need to implement the TLSs from the start.This option, for various reasons, did not exist.Another important limitation was the available time.In the Greek school, biology is taught only one hour a week throughout junior high school, and the hours programmed from the junior high school syllabus to be devoted to ET are 5.This fact, combined with the special condition that, during the research, the schools were operating with alternating periods of operation and distance learning due to the COVID-19 pandemic, made things quite difficult.As a result, constant rescheduling of teaching and restructuring of the program timetable were required.

Appendix
Sample multiple-choice questions from the questionnaire used in the evaluation.The correct answers are italicised.
In a population of dogs, some individuals are tall, others are short, some are hairy, and others are hairless.This is because… a.The dogs had to change in order to survive, so new favorable traits were developed.b.The environment in which each dog lived caused changes in its hereditary characteristics so that it could survive c.Random changes in DNA and different combinations of parents' genes, which occurred in many successive generations for many years, caused the creation of new varieties.d.There had to be dogs of different sizes and characteristics, so with the appropriate mutations, new varieties were created.
Parents pass on to their offspring.…a.All the habits they acquired in the course of their lives b.Only the traits which were favorable during their lifetime c.A random combination of their inherited traits d.The characteristics that will be useful to them, depending on the environment in which they live

Examples of the analysis of the open-ended questions in the questionnaire
Example 1 Question 9 pointed out: "It is estimated that there are 1.5 million species of organisms that are spread over almost all parts of the planet.These organisms have important similarities but also many differences between them, in how they gain energy, how they move, how they reproduce, in their structure, etc.Where do you think all these similarities and differences between organisms are due?".
One student responded after instruction: "I think it's due to random DNA changes and different combinations of parents and genes made over many successive generations.Thus, after many years, many species emerged that have some common characteristics but also have differences." Here, the student makes reference to (a) mutations, which he characterizes as random changes in DNA (2 points: one for the mention and one for the proper incorporation), (b) genetic recombination, which he defines as various combinations of parents and genes (+ 2 points), and c) the length of time that evolution occurs, which he defines as many successive generations (+ 2 units).Therefore, the answer is marked with a total of 6 points.

Example 2
Question 10 stated: "The cheetah is a carnivorous mammal that can run faster than 97 km per hour when chasing its prey, while its ancestors are estimated to have only reached 32 km per hour.How would a biologist explain this evolution of cheetah's ability to run so fast?' .
Student Answer: " Α biologist would logically explain it in mutations.And with mutations, they managed to adapt better to the environment." Only one scientific concept is mentioned here: mutations, without properly incorporating it into the justification.In particular, intent is attributed to mutations, which is a misunderstanding.The score for the answer is 1 point.

Fig. 1
Fig.1The Idea-Cosmos-Evidence model(Tselfes 2003) Mitosis-MeiosisThey collect data from chromosome tabs, create and use model organisms, make predictions, and describe and discuss how the activity is related to theoretical terms, Chinese whisperer' s game, link the activity with ideas through cross-matching, justify the observed differences using theoretical terms be birds competing for seed collection, record and process their data, draw conclusions and describe the phenomenon under study simulate genetic drift using pom-poms, record and analyze data and make predictions C → E I → E C → I All 6 Evolution' s mechanisms in action Using a digital simulation, they test given scenarios or make new ones in order to investigate how evolution mechanisms affect population sizes C → E I → C All Wallin 2006; Price et al. 2014).

Table 2
(Da Silva et al. 2015)l 2013;Bertka et al. 2019)teaching as suggested in previous studies.Examples include: highlighting the unifying role of evolution theory(To et al. 2017), creating active learning environments(Neubrand and Harms 2017), ensuring a climate of free expression of student's thoughts(Smith 1994;Basel et al 2013;Bertka et al. 2019)emphasis on the different use of concepts in scientific and everyday speech(Da Silva et al. 2015)and satisfying basic psychological needs for autonomy, competence

Table 3
Descriptive statistics, multiple-choice and open-ended questions (TLS 1 and 2 groups)

Table 4
Range of values for open ended questions

Table 5
Frequency of concepts mentioned in students'responses to the open-ended questions