Focus and Effects of Peer and Machine Feedback on Chinese University EFL Learners’ Revisions of English Argumentative Essays

The present mixed-method study examined the focus and effects of peer and machine feedback on the revisions of English argumentative essays. The study collected data from 127 Chinese university EFL learners, which included Draft 1, peer feedback (PF), PF-based Draft 2, machine feedback (MF), MF-based Draft 2, questionnaires, and interview recordings. The main findings were: (a) peer feedback was primarily concerned with content errors while machine feedback mainly involved language errors, (b) significant differences occurred in most types of errors between Draft 1, PF and PF-based Draft 2, and between Draft 1, MF, and MF-based Draft 2, (c) the uptake of ‘introducing a new topic in Conclusion’ was a powerful predictor of PF-based Draft 2 scores, and (d) the participants generally moderately considered peer and machine feedback to be useful. Based on the findings, some implications are discussed on how to better implement and enhance the quality of peer and machine feedback.


Introduction
As an essential component of students' academic development in a second/ foreign language (SL/FL), writing requires a considerable amount of time and effort since it involves higher order thinking, which makes it very challenging for many SL/FL writers (Cope et al., 2011;Dikli & Bleyle, 2014). Consequently, feedback plays a critical role in enhancing the quality of students' compositions. Nevertheless, assessing writing and providing feedback are also timeconsuming and challenging. This is why though teacher feedback is more 76 effective (Goldstein, 2004;Hattie & Timperley, 2007;Keh, 1990;Sterna & Solomo, 2006;Vardi, 2009), machine and peer feedback has been developed and implemented in both classroom and other learning situations (Allen & Katayama, 2016;Shintani, 2015). Even though both peer review and machine feedback have proved to have positive effects on SL/FL learners' rewrites (Caulk, 1994;Hyland & Hyland, 2006;Rollinson, 2005;Rollinson, 1998Rollinson, , 2005Topping, 1998;Yu & Lee, 2015), conflicts always exist about the actual effects (Anson, 2006;Xie, Ke & Sharma, 2008). Few studies have examined peer and machine feedback simultaneously either. Moreover, considering that accuracy is both an important and frustrating issue in writing (Li, Link & Hegelheimer, 2015), it is worthwhile to analyze more specifically the impact of peer and machine feedback on the quality of SL/FL learners' rewrites. For these reasons, the present mixed-method study, targeting Chinese university EFL (English as a FL) learners, explored the focus and effects of peer and machine feedback on learners' rewrites of English argumentative essays.

Literature Review
Defined as the "information with which a learner can confirm, add to, overwrite, tune, or restructure information in memory, whether that information is domain knowledge, meta-cognitive knowledge, beliefs about self and tasks, or cognitive tactics and strategies" (Winne & Butler, 1994, pp. 5740), feedback has been long held to facilitate the learning of SLs/FLs (Ellis, 2011;Ferris, 2010;Hattie & Timperley, 2007).

review indicated that
Focus and Effects of Peer and Machine Feedback… 77 PA was of adequate reliability and validity in a wide variety of applications and had positive formative effects on student achievement and attitudes. Ion et al.'s (2016) analyses of 637 feedback units showed that peer feedback helped students better develop the task in their writing.
In addition, trained PA can be more effective (Ellis, 2011;Kulkarni et al., 2015;Min, 2006). For example, Min (2006) examined the impact of trained responders' feedback on EFL college students' revisions in terms of revision types and quality. After a four-hour in-class demonstration of how to do peer review and a one-hour after-class reviewer-teacher conference with 18 students, the instructor-researcher collected students' first drafts and revisions, as well as reviewers' written feedback, and compared them with those produced prior to training. The results indicated that students incorporated a significantly higher number of reviewers' comments into revisions after the peer review training, and that the number of revisions with enhanced quality was significantly higher than that before the peer review training. The researcher thus concluded that trained peer review feedback could positively impact EFL students' revision types and quality of texts, supported by a subsequent study (Liu & Chai, 2009).
Moreover, peer feedback proves to be beneficial to students in other aspects (Ellis, 2011;Kurt & Atay, 2007;Lundstrom & Baker, 2009;Miao et al., 2006). Miao et al. (2006) examined peer and teacher feedback on essays of the same topic written by Chinese University EFL learners. Analyses of student texts, questionnaires, video recordings and interview transcripts revealed that peer feedback improved student autonomy thought it was less adopted in students' rewrites. Kurt and Atay's (2007) eight-week experimental study of 86 Turkish prospective teachers (PTs) of English showed that the peer feedback group experienced significantly less writing anxiety than the teacher feedback group at the end of the study. The study also revealed that the peer feedback process helped the PTs become aware of their mistakes and helped them look at their essays from a different perspective. Lundstrom and Baker (2009) did a study with 91 university students in nine writing classes at two proficiency levels to see which was more beneficial to improving student writing: giving or receiving peer feedback. The results indicated that the givers, who focused solely on reviewing peers' writing, made more significant gains in their own writing over the course of the semester than did the receivers, who focused solely on how to use peer feedback.

Machine Feedback
As technology develops, machine feedback becomes possible via computers and internet. The technology often used for feedback on writing is Automated Writing Evaluation (AWE) software which generates automated scores based on techniques such as artificial intelligence, natural language processing and latent semantic analysis (Philips, 2007;Shermis & Burstein, 2003;Ullmann, 2019), and provides written feedback in the form of general comments, specific comments and/or corrections (Stevenson & Phakiti, 2014). In recent years, using AWE to provide feedback in the writing classroom has steadily increased, such as Project Essay GraderTM (PEG), e-rater, Intelligent Essay AssessorTM (IEA), and IntelliMetricTM (Stevenson & Phakiti, 2014). In China, the most widely used is www.pigai.org. 1 While many scholars applaud AWE as a means of freeing instructors from marking assignments and enabling them to devote more to writing instruction (Hyland & Hyland, 2006;Philips, 2007;Ullmann, 2019), others doubt whether AWE is capable of providing accurate and effective feedback (Anson, 2006).
For example, Li et al. (2015) used mixed-methods to investigate how Criterion1 affected writing instruction and performance. Four ESL writing instructors and 70 non-native English-speaking students participated in the study. The results showed that Criterion1 led to increased revisions and that the corrective feedback from Criterion1 improved accuracy from a rough to a final draft. AbuSeileek and Abualsha'r (2014) investigated the effect of computermediated corrective feedback on 64 EFL learners' performance in writing over the course of eight weeks. The participants were randomly assigned to either a no-feedback control condition or a corrective feedback condition. The researchers found that students who received computer-mediated corrective feedback while writing achieved better results in their overall test scores than students in the control condition who did not receive feedback. Cheng (2017) employed a mixed-method to investigate the impact of online automated feedback (OAF) on the quality of 138 university students' reflective journals in a 13-week EFL course. The findings showed that the experimental group outperformed the control group in the overall score of the final reflective journal and demonstrated a significant improvement in scores across reflective journals. The results of these two studies show that AWE has a positive impact on the quality of students' writing, supporting those of earlier studies (Chen & Cheng, 2008;Warschauer & Ware, 2006). Ullmann's (2019) study of 76 essays showed that the automated analysis was immediate, scalable, and only on average 10% less accurate than the manual analysis.
Even so, Stevenson and Phakiti's (2014) review found little evidence for positive effects of AWE on the quality students' rewrites based on AWE. Stevenson and Phakiti (2014) attributed this to little research, heterogeneity of existing research, the mixed nature of research findings, and methodological issues. Other explanations are that computers do not possess human 79 inferencing skills and background knowledge (Anson, 2006) and that AWEgenerated comments primarily focus on grammar in writing (Hyland & Hyland, 2006). This may be why AWE-generated feedback is less acceptable to students than teacher feedback (Dikli & Bleyle, 2014). Dikli and Bleyle (2014) investigated the use of an AES system on 14 advanced students from various linguistic backgrounds in a college ESL writing classroom. The findings showed that the instructor provided more and better quality feedback and the AES system.

Rationale for the Study
As reviewed, there have been many studies on the results of peer and machine feedback in relation to grading and students' compositions (Bijami et al., 2013;Cho & Schunn, 2005;Gielen et al., 2010;Kulkarni et al., 2015;Lin & Yang, 2011;Rollinson, 1998Rollinson, , 2005Topping, 1998;Xie et al., 2008). However, little has been said as to the focus of peer and machine feedback in educational designs (AbuSeileek & Abualsha'r, 2014). Few studies have simultaneously examined peer and machine feedback either. More insight into the nature of peer and machine feedback would indicate more clearly how technology and students could be more helpful in SL/FL writing and what kind of assistance teachers should preferably provide. For example, if technology and peers can provide useful feedback on grammar, teachers can direct their assistance more to textual coherence or content (AbuSeileek & Abualsha'r, 2014). Moreover, since writing accuracy is both an important and frustrating issue (Li et al., 2015), it is worthwhile to examine more specifically the focus and effects of peer and machine feedback on the quality of SL/FL learners' writing. For these reasons as well as the intent to make better use of peer and machine feedback, the present study adopted mixed methods to explore the focus and effects of peer and machine feedback on Chinese university EFL learners' rewrites of English argumentative essays. To achieve this purpose, the following research questions were formulated: (1) What is the respective focus of peer and machine feedback on students' English argumentative essays?
(2) How does peer and machine feedback impact students' rewrites of English argumentative essays?

Context
The present research was conducted in a highly accredited university in Beijing, where English reading and writing courses were compulsory to undergraduate non-English majors. Upon entering the university, all non-English majors took a standardized English placement test, the results of which put the students into three band levels (a higher band level meant higher English proficiency). Based on their band levels, the students registered in compulsory and optional English courses accordingly. The majority fell into band level 2 and were required to take the English Argumentative Reading and Writing course, which contextualized the present study. The respondents of this study were randomly selected from those registered in the course taught by the same instructor. The students met the instructor once a week for a 90-minute period, who were required to write three long argumentative essays (more than 400 words) as well as a few short ones (about 100 words) during the 16-week semester. The instructor, PhD in Applied Linguistics, had been publishing widely in international journals and teaching the course for five years. In class, the students and the instructor discussed the techniques related to English argumentative essay reading and writing such as text structure, statement of arguments, paragraph structure, argument-developing skills, use of evidence, cohesion and coherence, and use of references. Adopting the process approach to writing, the instructor stressed the importance of revision and encouraged students to revise their drafts on the same composition at least twice from different sources: teacher feedback, peer comments and machine feedback. Prior to writing, a 30-minute peer review training based on Kramer, Leggett and Mead's scheme (1995) was arranged in class, which covered both content and language errors with more focus on content errors in that students had learned English grammar systematically but had not been trained how to write English argumentative essays effectively in previous schooling. Then students practiced peer review for each subsequent assigned writing task. Once a writing assignment was finished, each student sent his/her writing to the instructor, a peer, and www.pigai.org, independently. The instructor provided feedback electronically on each draft at sentence, paragraph and text levels, then gave a 25-minute summary report of the feedback and had individual discussions about the feedback when required by the students in the subsequent class; students assessed their peers' writing either electronically or in paper and must finish it within two days upon receiving the writing; www.pigai.org generated feedback in both Chinese and English (namely, 81 machine feedback in the present research) immediately upon receiving the submission. To avoid cross impact, students were required to revise their writing separately upon receiving different types of feedback.
Participants 127 (102 male and 25 female) students participated in the present study and answered the questionnaires related to their background information and perceptions of peer and machine feedback, of whom 64 were interviewed for their verbal perceptions about peer and machine feedback. Meanwhile, the first and second drafts of the same composition of 111 students, as well as peer and machine feedback, were complete for analyses. With an age range of 16-27 and an average of 19.42, the participants were from various disciplines such as civil engineering, mathematics, chemistry, and architecture. Prior to the course, they had never taken an English Argumentative Writing course.

Instruments
The collected data in the present study included interview transcripts, peer feedback (PF), machine feedback (MF), student draft 1, PF-based draft 2, MFbased draft 2, and writing scores, as detailed below.
Student texts. Draft 1, peer feedback, PF-based Draft 2, machine feedback, and MF-based Draft 2 of the course's second composition on global warming were collected. Based on student consent and the completeness of both drafts, 111 compositions of each draft as well as peer and machine feedback were finally collected for analyses.
Writing scores. The scores of each draft were collected, which was rated by the instructor on a scale of 1-15 in terms of text structure, power of argumentation, coherence, grammar and use of words (Appendix I).
Perceptions of peer and machine feedback questionnaire. This 14-item Perceptions of Peer and Machine Feedback Questionnaire (PPMFQ) was selfdeveloped to investigate students' attitudes towards peer and machine feedback in terms of their roles and usefulness in their composition revisions. The questionnaire involved such issues as grammar, use of words, expression of viewpoints, use of evidence and references, which are crucial elements of argumentative essays (Wyrick, 2008). All the items were placed on a 7-point Likert Scale, ranging from 'Strongly Disagree' to 'Strongly Agree' with values of 1-7 assigned to each of the alternatives respectively.
Informal semi-structured interview. The informal semi-structured interview guide covered such questions concerning teacher feedback, peer and machine feedback, their advantages, disadvantages and effects on composition revisions.
The background questionnaire. The background questionnaire aimed to collect informants' personal information such as age, gender, and major.

Procedure
Data were collected during weeks 7-9 of the semester when the second argumentative essay on global warming was assigned with the instructor's consent. To help students better understand the nature of argumentative essays, prompts on the task were provided such as effects of global warming on agriculture and major cause for global warming. Draft 1 was finished and submitted to the instructor, peers and www.pigai.org online (an account was created for the class beforehand) in week 7, followed by peer feedback within two days and immediate machine feedback, respectively. Based on the feedback, students revised their Drafts 1 independently according to the peer and machine feedback they had received respectively, and then submitted the rewrites to the instructor thereafter. Piloted to two students who had took the same course in the previous semester, the questionnaire was slightly modified, and then distributed to students together with a consent form who answered them in about 10 minutes in week 9's class meeting. According to their consent forms, a total of 64 students was informally interviewed by two research assistants thereafter in week 9. Each time, two students were interviewed together, which was mainly conducted in Chinese, recorded and lasted for 15-20 minutes.

Data Analyses
Since a writer needs to utilize an established language system to organize and present ideas in a certain mode in writing, the present study analyzed student texts and feedback in terms of both grammar and content. For this purpose, this study categorized errors with reference to the revision scheme in Kramer et al. (1995). The scheme (see Appendix II) used in the present study covered four types of errors: content errors (nine aspects involving failure to show a controlling idea, improper topic sentence and failure to achieve paragraph coherence, etc.), mechanical errors (misspelling, punctuation, and capitalization errors), syntactical errors (errors involving tense, part of speech, article, verb, adjective/adverb degree, agreement, and case, etc.), and lexical errors (errors in word formation, word choice, collocation, and unclear expression). Draft 1, PF-based Draft 2 and MF-based Draft 2 were analyzed carefully according to the scheme to identify the errors students made in their writing. All the analyses were done by two research assistants with an overall inter-rater coefficient of .91. Then the number of each type of error was counted for each text. The results were then analyzed via SPSS 20 to explore the distribution of and differences in different types of errors between Draft 1, peer feedback, PF-based Draft 2, machine feedback and MF-based Draft 2. To explore the effects of peer feedback on student revisions, Draft 1 and PF-based Draft 2 were compared to count and compute the uptake of peer feedback in the corresponding rewrites, so were Draft 1 and MF-based Draft 2. Then, multiple regression analyses were run, with scores of PF-based and MF-based Draft 2s being the dependent variable and the uptake of peer and machine feedback of errors of different types being independent variables.
The survey data were computed via SPSS 20. The mean and standard deviation of each survey item were computed to determine how students perceived peer and machine feedback respectively. The interview recordings were first transcribed, double-checked and then subjected to thematic content analyses by the two research assistants respectively with an inter-rater reliability of .932 (Charmaz, 2006). The themes were then generalized, counted, and supported with excerpts from the interviewees' comments. Example themes were strengths of peer feedback, weaknesses of machine feedback, benefits of peer and machine feedback. When reporting the comments, a number was used for each interviewee for the sake of privacy and convenience.

Text Analyses Results
Distribution of errors. Preliminary analyses of peer feedback showed that students commented on content errors in specific places of their peers' writing but provided very general comments on language problems such as 'There are lots of grammatical errors in the essay' in the writing. By contrast, www.pigai.org generated fairly specific suggestions on language problems but offered no content-related suggestions in students' writing. Consequently, further analyses of PF and PF-based Draft 2 focused on content errors while those of MF and MF-based Draft 2 focused on language errors. The errors in Draft 1, PF, PF-based Draft 2, MF, and MF-based Draft 2, were coded and counted, which were then analyzed in terms of mean and standard deviation (see Table 1). As seen from Table 1, the errors with highest mean scores in Draft 1 were SE6 (article errors) (mean = 2.67), LE2 (word choice errors) (mean = 2.13), SE2 (tense errors) (mean = 1.68), SE7 (errors of plural or singular nouns) (mean = 1.49), LE3 (collocation errors) (mean = 1.25), LE4 (unclear expressions) (mean = 1.25), SE3 (agreement errors) (mean = 1.22), SE1 (errors in part of speech) (mean = 1.19), C3 (failure to provide adequate evidence) (mean = 1.19), and ME (mechanical errors) (mean = 1.07). Peer feedback predominantly focused on content errors, barely involving syntactic errors except for such comments as "there are many tense errors in the writing" or "grammatical errors are too many" (comments like these were not counted in the final analyses in the paper because they were not specific). The means of content errors ranged from 0 (C9-introducing a new topic in Conclusion) to 1.02 (C8-inconsistency between the conclusion and the main argument). On the other hand, machine feedback was solely concerned with mechanical, syntactic and lexical errors. The errors in MF ranged from 0 (SE12-illogical comparison or ill parallelism) to 1.79 (SE3), and errors with highest mean scores were SE3 (agreement errors) (mean = 1.79), LE3 (collocation errors) (mean = 1.44), ME (mechanical errors) (mean = .91), SE6 (article errors) (mean = .73), and SE7 (errors of plural or singular nouns) (mean = .524).
Since PF and MF focused on certain aspects of Draft 1, most of which were incorporated into respective rewrites, the analyses of Draft 2 focused on the type of feedback students received correspondingly. As reported in Table 1, the mean scores of content errors ranged from .02 (C9) to .54 (C3) in PF-based rewrites and from 0 (SE12) to 2.20 (SE6) in MF-based rewrites.
Comparison of mean scores of the errors across Draft 1, PF, and PF-based Draft 2 shows that all content errors scored the highest in Draft 1 and that most content errors scored higher in PF than in PF-based Draft 2. Paired samples t-test results (see Table 2) indicated that Draft 1 differed significantly from PF in all types of content errors except C2 (improper topic sentence/no controlling idea/no topic sentence), largely with a small or medium effect size. Namely, significantly more content errors of all types existed in Draft 1 than identified by peers. Table  2 also shows that PF differed significantly from PF-based Draft 2 in C2 (t = 3.97), C3 (failure to provide adequate evidence) (t = -2.50), C5 (lack of the power of the argument/weak arguments or evidence) (t = -2.65), C7 (fail to achieve paragraph coherence: poor organization/Lack or misuse of transitional markers) (t = 3.73), C8 (inconsistency between the conclusion and the main argument) (t = 4.66), and TotalC (t = 3.66). Alternatively, significantly more errors of C2, C7, C8, and TotalC (total content errors) were identified in PF than in PF-based Draft 2, but the latter had significantly more errors of C3 and C5 than in the former. Yet Draft 1 had significantly more errors of C1 (failure to show a controlling idea/More than one controlling idea) (t = 5.47), C2 (t = 3.16), C3 (t = 4.10), C7 (t = 2.31), C9 (introducing a new topic in Conclusion) (t = 2.78), and TotalC (t = 5.88) than in PF-based Draft 2.
A similar pattern was observed for Draft 1, MF, and MF-based Draft 2, as reported in Table 1. Mechanical errors and most syntactic and lexical errors scored the highest in Draft 1, and errors of some types scored higher in MF than in MF-based Draft 2 while it was reversed for errors of other types. Paired samples t-test results (see Table 3) demonstrated that Draft 1 differed significantly from MF in all syntactic errors except SE5 (adjective/adverb degree errors), SE12 (errors of illogical comparison or ill parallelism), SE13 (errors of sentence fragments/run-on sentence/dangling modifiers), SE14 (errors of mixed or confused expression and sentence structure), SE15 (missing a part of the sentence), and all lexical errors except LE1 (errors in word formation) and LE3 (errors in collocations). Namely, significantly more errors of most types were identified in Draft 1 than in MF except SE3 (errors in agreement) and LE3. Table 3 also suggests that MF identified significantly more errors of SE3 but significantly fewer errors of SE1 (errors in part of speech), SE2 (tense errors), SE6 (articles errors), SE10 (errors in word order), SE11 (errors in coordinating conjunctions and subordinating conjunctions), SE16 (overuse of a part of the sentence), TotalSE (total syntactic errors), LE2 (errors in word choice), LE4 (unclear or incomplete expressions), TotalLE (total lexical errors), and TotalE (total errors) than in MF-based Draft 2. In addition, Draft 1 had significantly more errors in SE2 (tense errors), SE3 (errors in agreement), SE6 (articles errors), SE7 (errors in the use of plural or singular forms/uncountable nouns), SE11 (errors in coordinating conjunctions and subordinating conjunctions), SE15 (missing a part of the sentence), SE16 (overuse of a part of the sentence), TotalSE, LE2, LE3, LE4, TotalLE, and TotalE than in MF-based Draft 2.   (Cohen, 1988) Effects of peer and machine feedback on students' rewrites. To explore the effects of peer and machine feedback on students' rewrites, multiple regression analyses were run, with PF-based and MF-based Draft 2 scores being dependent variables and the uptake of errors of different types being independent variables respectively. Regression analyses yielded no model for MF-based Draft 2 scores and 1 model for PF-based Draft 2 scores, as shown in Table 4. Notes: df = degree of freedom effect size of Cohen's f2: small = f2 ≤ .02; medium = f2 = .15; large = f2 ≥ .35 (Cohen, 1988) As shown in Table 4, with the change in R 2 being .068, C9 (introducing a new topic in Conclusion) was the only predictor (β = .261, t = 2.11, f 2 = .012) that positively predicted the scores of students' rewrites based on peer feedback.

Self-reported Results
Survey results. The mean and standard deviation of each survey item concerning peer and machine feedback were computed (see Table 5), [main] claims and supporting evidence) (mean = 5.29), 6 (logic of arguing) (mean = 5.26) and 5 (statement of supporting arguments) (mean = 5.24), centering on content. The five PMFQ items with the highest means were items 1 (improved ability to use grammar) (mean = 5.56), 2 (improved ability to use vocabulary appropriately) (mean = 5.54), 14 (acceptability of machine feedback) (mean = 5.24), 13 (uptake of machine feedback) (mean = 5.33), and 9 (improved ability to use vocabulary formally) (mean = 5.08), centering on the use of expressions and grammar. These findings indicated that the students were generally moderately positive toward peer and machine feedback.
Interview results. Table 6 summarizes the interviewees' perceptions of the advantages and disadvantages of peer and machine feedback. As seen in Table 6, around 20% of the interviewees commented that peer feedback provided more communication (23.4%), more chances to learn from each other (21.3%), new perspectives (21.3%) and good advice on language use and sentence polishing (17%). According to the interviewees, peers "feel more at ease and communicate frequently when reviewing each other's writing. This helps us to understand each other's writing better" (No. 34), and could "identify problems in logic" (No. 22), peer review enabled "me to know others' views of my writing" (No. 46), and "me to be aware of similar mistakes in my own writing" (No. 51). Meanwhile, since "we peers are at a similar English proficiency level, most peer comments are not much professional or appropriate" (No. 53), and "it is difficult for us to offer specific suggestions" (No. 35).
As seen in Table 6, machine feedback could "identify language and grammar mistakes effectively" (No. 31), and "better the sentences and format in my writing" (No. 18). However, because it was a machine, it could not "identify logical problems" (No. 10) or offer any content-related suggestions on aspects like "paragraph structure, statements of main and supporting arguments, and use of evidence" (No. 25). Moreover, the machine frequently "misidentified mistakes" (No. 31).
Probably because of these reasons, 72.3% and 63.9% of the interviewees reported that peer and machine feedback was helpful to the revision of their writing, respectively. On the whole, 100% and 71.7% of the interviewees reported feeling satisfied with peer and machine feedback, respectively. Self-reported Perceptions of Peer and Machine Feedback (N = 64) Feedback Advantages Disadvantages PF a) more communication (11/23.4%), b) chances to learn from each other (10/21.3%), c) new perspectives (10/21.3%), d) good advice on language use and sentence polishing (8/17%), e) suggestions being very specific (6/12.8%), f) being friendly (4/8.5%), g) feeling at ease (3/6.4%).

Focus of Peer and Machine Feedback
Analyses of the data showed that peer feedback primarily focused on content errors in the present study. Although the interviewees were intermediate-advanced learners, they were not confident enough to pinpoint language problems for their peers. This was also evident in the number of content errors they identified in PF, which was significantly lower than that in Draft 1. Apart from that, this might be partly attributed to the time-consuming nature of reviewing a text, which made the participants unwilling to provide detailed and specific suggestions. Meanwhile, as discussed in Yu and Lee (2015), EFL students' group peer feedback activities are often driven and defined by their motives, which are shaped and mediated by the sociocultural context. The learning context where the instructor emphasized content more than linguistic forms of argumentative writing might be partially accountable for the participants' performance in their PF in the present study. The students thus focused more on content errors correspondingly, which, nevertheless, needs to be further explored.
The present study also revealed that machine feedback was predominantly concerned with language errors, as found in Hyland and Hyland (2006). This might be because the so-called machine, though modeled on human intelligence, could still not detect human thinking to provide useful comments on contents of an essay. In addition, though it offered timely and generally accurate feedback on language problems, it mistook the correct use of grammar and expressions to be incorrect or provided wrong suggestions for "correctly pinpointed mistakes" "at a rather high rate" (No. 62). For example, www.pigai.org marked the part 'will in' in the sentence "It will in turn lead to the large scale release of the greenhouse gas into the atmosphere" (Writing 44, Draft 1) to be wrong. This finding partially supports the view that AWE is incapable of providing accurate feedback in certain aspects (Anson, 2006). Hence, it is necessary for both instructors and learners to be cautious when utilizing machine feedback. This is especially so for learners with lower proficiency in the SL/FL who are more unlikely to distinguish wrongly identified errors by machines. Moreover, to what extent and what type of language use is identified as errors by machines need to be further researched.

Effects of Peer and Machine Feedback
Regressional analyses indicated that the uptake of 'introducing a new topic in Conclusion' was a significant predictor for students' PF-based rewrites. This might be related to the culture of writing in Chinese, which tends to bring about something new in concluding parts of an essay. This thus deserves attention in formal classroom teaching and the effects need to be further researched as well. Analyses of self-reported data showed that the participants were generally positive about peer feedback, as found in the current literature (Liu & Chai, 2009;Miao et al., 2006). Apart from positively affecting students' rewrites, peer feedback offered students chances to communicate with and learn from each other, to become (more) aware of their own mistakes, to look at their own writing from a new perspective, as found in some existing studies (Miao et al., 2006;Wang, 2014). Miao et al.'s (2006) study indicated that peer feedback helped promote student autonomy, especially in cultures which look up teachers as authority figures.
Self-reported data indicated that the participants were generally moderately positive towards machine feedback, commenting that it was good, specific, timely, clear and convenient. This suggests that machine feedback did have positive effects on the polishing of sentences in students' rewrites, consistent with the finding in many existent studies (Cheng, 2017;Hyland & Hyland, 2006;Li et al., 2015;Philips, 2007). On the other hand, machine feedback was sometimes wrong, which frustrated the participants in the present research. Because of this, students are advised not to solely rely on machine feedback and consult peers and/or the instructor when being unsure of the comments. These findings suggest that developers of such platforms/softwares have to enhance their reliability and validity and pay more attention to providing content-related feedback, which is of central importance to an essay. They also indicate that EFL learners, especially low or low-intermediate learners, have to be cautious when using machine feedback. Writing instructors had better remind their students of this limitation of machine feedback. Otherwise, some feedback would be misleading and the uptake of such feedback would lead to (even worse) mistakes.
As illustrated in the present research, peer and machine feedback had positive effects on students' rewrites, at the same time they were not satisfactory in certain aspects. For example, peer feedback sometimes is not professional or appropriate, and superficial, as found in the present study. Thus, it is important to improve the quality of peer and machine feedback. As found in Yu and Lee (2015), student motives could have direct influence on students' participation in group peer feedback activities and their subsequent revisions. It is necessary to foster positive and constructive motives towards peer and machine feedback in students prior to revising the first drafts. Meanwhile, if peer feedback can be done anonymously, students may feel more comfortable in providing more and better feedback on different aspects of their peers' writing, as found in Lu and Bol (2007). If students become more proficient in the target language, they will be able to provide better feedback as well, so are they trained to provide peer feedback and to write (more) effectively. Integrating technology into the peer review process may also be beneficial to providing better and timely feedback (Ellis, 2011;Lin et al., 2011;Nobles & Paganucci, (2015). Nobles and Paganucci's (2015) mixed-method study of 18 high school students in a hybrid freshman English class at an independent school revealed that students perceived their writing to be of higher quality when writing with digital tools and that writing in online environments enhanced writing skill development. Kulkarni et al.'s (2015) study showed that students' final grades improved when feedback was delivered quickly, but not if delayed by 24 hours. In addition, it is equally important to train students to do peer review (Gielen et al., 2010;Liu & Carless, 2006;Rollinson, 1998). It is better for writing instructors to familiarize students with the peer review criterion and their expectations. As put in Stanley (1992, p. 230), "it is not fair to expect that students will be able to perform these demanding tasks [peer feedback] without first having been organized practice with and discussion of the skills involved." Strategies such as engaging students with criteria and embedding peer involvement within normal course processes may help promote peer feedback (Liu & Carless, 2006). Lastly, as found in Wang's (2014) investigation of 53 Chinese EFL learners' perceptions of peer feedback on their EFL writing over time, various factors affect students' perceived usefulness of peer feedback such as their knowledge of assigned essay topics, proficiency in the target language, attitudes, time constraints, and classroom environment. It is necessary for writing instructors to consider these factors when implementing peer feedback.

Conclusions
The present mixed-method study examined the focus and effects of peer and machine feedback on the rewrites of Chinese university EFL learners' English argumentative essays. The main findings were: (1) peer feedback was primarily concerned with content errors, while machine feedback mainly involved language errors, (2) significant differences occurred in errors of most types between Draft 1, PF and PF-based Draft 2, and between Draft 1, MF, and MF-based Draft 2, (3) the uptake of 'introducing a new idea in Conclusion' was a powerful predictor of PF-based Draft 2 scores, and (4) the participants generally moderately considered peer and machine feedback to be useful.
Although the present study yielded insightful findings, given that the participants were intermediate-advanced learners and the instructor was experienced in academic English writing, it is worth doing further research on different types of SL/FL learners and instructors to explore more about the focus and effects of peer and machine feedback. For example, lower proficient SL/FL learners may not be able to identify all language problems and/or distinguish correctly and incorrectly identified errors by machine; SL/FL learners with no/little training in argumentative writing may not be able to identify content errors. All these may not only lower the quality of peer feedback but also mislead learners to blindly depend on peer and machine feedback. More research on these issues with different SL/FL learner populations helps both learners and instructors to have a better understanding of peer and machine feedback. Then accordingly, peer and machine feedback may be better implemented to complement teacher feedback to improve the quality of SL/FL learners' writing as well as to alleviate writing teachers' workload.

Conflict of interest statement
On behalf of all authors, the corresponding author states that there is no conflict of interest.
• Clearly state the main idea of the paragraph,