This thesis aims to conduct a mix-methods investigation into native English-speaking teachers (NESTs) and non-native English-speaking teachers (NNESTs) assessment of university level English as a second language (ESL) student's oral presentation. To collect data for this study, all faculty members and instructors currently teaching in at the departments of English and Communication Studies at Minnesota State University, Mankato (MNSU) were invited to participate in an online survey using Qualtrics Survey Software (Qualtrics). After receiving email invitations, altogether 31 people provided their consent to participate. Among the 31 teacher-participants, there were 19 NESTs and 12 NNESTs. The participants then took part in the data collection process by completing the online survey where they firstly watched and assessed a video-recorded oral presentation of an ESL student by scoring her performance using an analytic rating scale and then provided feedback on the student's strengths and areas of weakness. Secondly, each participant rated a question-item section on the survey about which oral presentation assessment criteria they thought was most or least important to them as raters. Finally, the participants answered five biographic questions to provide required personal information. The data was analyzed using Qualtrics and SPSS 23 Software. To begin the data analysis, the participants were divided into two groups, i.e. group 1 NEST and group 2 NNEST on the basis of the participant's biographic data provided on Qualtrics. Afterward, SPSS was used to calculate descriptive statistics such as the mean (M) and standard deviation (SD) of the data, and to conduct a t-test to compare the data from the two groups and identify any significant difference between them. Next, to determine the inter-rater reliability, the Intraclass Correlation Coefficient on SPSS was used to estimate the Cronbach Alpha for: 1), all the participants; 2), the NEST participants; and 3) the NNEST participants. As it is a mixed-methods research study, the qualitative data, in the form of the feedback provided by the participants, were categorized according to which assessment criteria the participants commented on the most and the least, how many participants from the two groups commented on them, and how many times they were acknowledged to be strengths or pointed out as areas of improvement. The result of the qualitative data analysis was then compared to the results of the quantitative data analysis. In case of the quantitative data, the results revealed that even though there were differences in NESTs and NNESTs' assessment of the oral presentation, the differences did not hold statistically significant value. For example, judging by the mean scores on the assessment criterion 'speaks naturally', there seemed to be a noticeable difference between the NEST and NNEST groups. However, when an independent samples t-test was performed, it resulted in no significant difference. The reliability statistics disclosed that the Cronbach's Alpha of the scores given by all participants (0.72) and the NEST participants (0.71) represented good inter-rater reliability. However, the Cronbach's Alpha of the scores given by the NNEST participants (0.19) showed low inter-rater reliability. The determination of which assessment criteria were regarded as most or least important in the perception of the participants resulted in a significant difference between the two groups in their rating of the importance of oral presentation as an effective speaking assessment tool, where the NNESTs agreed that it was important but the NESTs less so. In the end, the analysis of the qualitative data revealed areas of consistency and also inconsistency in the feedback in comparison to the quantitative data. For example, the analysis of the qualitative data revealed that a high number of NESTs commented on the assessment criteria regarding natural speech and most of the comments were positive. On the other hand, most NESTs provided negative comments on 'speech volume'. This result is inconsistent with the quantitative result as the highest mean score in NEST group was received by natural speech and the lowest was received by speech volume. Likewise, in the NNEST group, a high number of participants commented on 'pronunciation' and 'body language' and most of those comments were negative. A low number of NNEST participants commented on 'eye contact' and that comment was negative. This result is partly consistent with the quantitative data because the lowest mean score in the NNEST group was given to 'pronunciation' on the rating scale. One of the limitations of this study is the low number of participants. It is suggested that a larger group of NESTs and NNESTs should be invited in case of future research. In addition, the study also puts forth pedagogical implications. For instance, it discusses the need to train raters prior to the assessment process, as well as the need for NNEST raters to know how to provide constructive feedback on grammar.


Glen Poupore

Committee Member

Sarah A. Henderson Lee

Date of Degree




Document Type



Master of Arts (MA)




Arts and Humanities

Creative Commons License

Creative Commons Attribution-NonCommercial 4.0 International License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License



Rights Statement

In Copyright