When it comes to plagiarism in writing assignments, college students turn more frequently to news sources and paper mills, a comparative study of writing assignments reveals.
The white paper, “Plagiarism and the Web: A Comparison of Internet Sources for Secondary and Higher Education Students,” was released this week by iParadigms, a company that offers plagiarism detection services for educators and publishers.
The study classified 128 million content matches discovered in 33 million student writing assignments submitted to Turnitin between June 2010 and June 2011.
For both secondary and college papers, social and content sharing sites topped both groups in terms of total content matches, with homework and academic sites close behind. Wikipedia still leads as the most popular website for text appropriation, representing 8% of all sources for secondary students and nearly 11% for college students. Yahoo!Answers came in second for both groups.
College students, not surprisingly, show proportionally more use of news sources and paper mills than secondary school students, the study revealed.
The report, which is accompanied by an infographic, is sparse on the methodological details, requiring me to contact Chris Harrick, vice president of marketing at Turnitin, for details. Specifically, I was interested in how they defined a content match and determined whether it was a valid case of plagiarism.
According to Harrick, content matches are made using the company’s proprietary algorithm, meaning that a precise definition could not be provided. The software is designed to detect patterns, not just contiguous word sequences, which allows them to identify passages where the author edits a piece of text copied from another source.
Not everyone agrees. Economist David E. Harrington argues that Turnitin software can be gamed by the practice of what he calls, “copying, cloaking, and pasting,” a practice in which simple words and phrases are edited to obscure the original source. If students can edit well-enough to escape software detection, then the true frequency of text appropriation in the report is understated.
The report does not specify the distribution of content matches, but with 128 million content matches in 33 million papers, the overall average is somewhere around four matches per paper. We don’t know what percentage of papers contain zero matches, or the percentage of papers that are complete copies. It is also unclear how much content is appropriated and typical behavioral patterns in the text. This kind of information would help educators identify students at risk, develop better writing assignments, and train students on the appropriate use of text.
The title of the white paper, “Plagiarism and the Web,” uses the infamous “P-word” while the report makes no attempt to distinguish a content match that includes attribution to the original source with one that does not. Text within quotation marks and block quotes is treated as any other form of text and there is no attempt to search for citations, footnotes, endnotes or bibliographies in a paper.
For example, the following text, while being presented in block quote and followed by a citation would be classified as a case of plagiarism for the simple fact that it exists in Wikipedia as well as other free websites:
Was this fair paper, this most goodly book,
Made to write “whore” upon? What committed?
Othello (IV, ii)
Without attempting to verify whether a piece of matched text is devoid of attribution, it is impossible to separate good scholarship from academic misconduct, which is why I am sensitive to using the word “plagiarism” in the context of this study.
Harrick concedes on this point, but offers that sources like Yahoo!Answers, paper mills, and cheat sites are not usually cited as reputable sources.