Image via Wikipedia

Wikipedia is the most popular site for copying and pasting content into student papers, a new study reports, and social media and content-sharing sites are not far behind.

The report is called Plagiarism and the Web: Myths and Realities.” It’s written by Chris Harrick, vice president of marketing for Turnitin, a popular service designed to detect content matches from other student papers, Web sites, as well as an entire library of academic journal and book material.

Analyzing nearly 40 million submitted high school and college papers between June 2010 and March 2011, the company detected 140 million content matches and classified them based on their source:

  1. Social Networking and Content Sharing (33.0%)
  2. Homework and Academic sites (25.0%)
  3. Paper Mills and Cheat sites (14.8%)
  4. News and Portals (13.6%)
  5. Encyclopedias (9.5%)
  6. Other (4.1%)

Overall, the user-generated encyclopedia, Wikipedia, ranked first as the most popular site to copy content. It was followed by content farms such Yahoo! Answers.

Harrick believes that digital media has created a cultural shift among our youth, who need to be educated to value originality in academic thought and writing. He writes:

A digital culture that promotes sharing, openness and re-use is colliding with one of the fundamental tenets of education – the ability to develop, organize and express original thoughts. For many students who have grown up sharing music, retweeting thoughts and downloading free software, the principle of originality in research and writing can seem antiquated.

For a study of 40 million papers and 140 content matches, I found this report excessively lean. We are not told how much content is typically matched in student papers, the typical number of content sources per paper, or the percentage of papers that don’t suffer from any content match, nor do we have a breakdown based on education type, subject, or grade level.

More importantly, while the company is clear about distinguishing plagiarism (a deliberate attempt to claim  ownership of another’s intellectual contribution) from content matching (the simple overlap of text), Harrick often equates the two in the report, and most explicitly in its title (“Plagiarism and the Web”).

The service does not detect nor determine plagiarism – it detects patterns of matching text to help instructors determine if plagiarism has occurred

Unfortunately, stories using the “P-word” have already started showing up in the media: The Chronicle of Higher Education (“Plagiarism Goes Social“), Inside Higher Ed (“The Sources of Plagiarism“), PR newswire (“Turnitin Debunks Myths Surrounding Plagiarism on the Web“), The Washington Post (“Study: 8 top sites for potential plagiarism“), and US News & World Report (“Plagiarists Turn to Academic Sites, Not Paper Mills“)

While I have no doubt that plagiarism is present in many of the 140 million content matches in the dataset, the study did not attempt to investigate plagiarism, which is why I take issue with the use of the “P-word” in this context.  Many of the content matches may simply be attributed pieces of text (such as a quotation that is found on Wikipedia), or a block of text that is followed with a citation or footnote. Even academics charged with plagiarism find ways of attributing the match to a missing quotation mark or the loss of a few footnotes, as in the case of the late American historian, Stephen Ambrose. In academia, the P-word is more offensive than the F-word, which is why it should be used carefully.

At present, the Turnitin study provides some novel and informative data about where students are getting content for their papers. For this reason alone, the report is valuable to high-school and college educators, librarians, and academic publishers who may all attempt to steer their students to more authoritative sources of content (or alternatively, start populating these sites with the content they want students to read). But without knowledge of whether the content they use has been properly attributed with a reference, it’s a stretch to make strong claims about student plagiarism.

Students, you can quote me on this: Just make sure to include a full citation.

Enhanced by Zemanta
Phil Davis

Phil Davis

Phil Davis is a publishing consultant specializing in the statistical analysis of citation, readership, publication and survey data. He has a Ph.D. in science communication from Cornell University (2010), extensive experience as a science librarian (1995-2006) and was trained as a life scientist. https://phil-davis.com/


9 Thoughts on "The P-Word: Is Matched Text the Same as Plagiarism?"

Good post. I think student should be encouraged not to plagiarize since earlier in their academic career. Often even grad students don’t have a full understanding of what the meaning of the P-word and its moral implication. Raising awareness on the topic is good.

Since when is education about original thought? I thought the idea was to learn, which by definition means understanding something that someone else already understands. The originality lies with the person (it is new to them) not with the thought. Genuine originality usually only comes at an advanced stage, something many students have a hard time understanding.

But your closing point is well taken. Educators and publishers should be putting the good stuff where the students are looking, rather than trying to get the students to look where they want them to. That long standing element of eyeball control has been swept away. To a considerable degree anyway, it being so easy to look elsewhere.

I’ll agree that some of education isn’t about original thought as much as about mastering pre-established, fundamental concepts…but that doesn’t mean education is all about memorization and regurgitation, either! Education is supposed to be about teaching the use of information–the application and synthesis of various principles as applied to new problems, both in and outside of an academic situation. If a teacher’s assignments are simply for a student to look something up and regurgitate it, then s/he’s doing the student a disservice; the ability of students to just look something up on Wikipedia, or anywhere else for that matter, is undesirable because without the ability to think, they have no way of knowing whether or not the information they are reading and regurgitating is good or bad information.

I have become quite sick of this nonsense that traditional learning is regurgitation. Students are becoming people and everything is individual to them, even if it is 400 years old. There is no such thing as reading without thinking. The education community seems to be in the grip of this wild fantasy that there is some fundamental alternative to learning as we know it, if only we could find it, or something. Maybe it is Education 2.0! Hype never dies.

By the same token, there is such a thing a scientific literacy, which means knowing how the world works, as best we understand it. This is the basic goal of science education. Original thought is not the goal because undergrads or high school students are not going to make fundamental advances in science. Sorry but no. Would that they could.

I find this interesting because as an academic librarian I use turnitin frequently and often find that it does not do a very good job of finding wikipedia entries. It’s better at finding other students and “social networking” who have also used the wikipedia entry. Turnitin is not very forthcoming on exactly how its algorithms work. It certainly does not catch everything so I would be hesitant to put too much stock in any conclusions reached using this data alone.

This reminds me of the letter posted in Nature (9 Sept 2010) with the alarmist title “Chinese journal finds 31% of submissions plagiarized”. On talking to the author the vast majority of apparently copied material was in fact not plagiarised – the automated systems which identified copied text can’t make subjective judgements to determine the legitimacy of apparent duplication. The alarmist title had been added by Nature and the letter had been edited to increase the shock-factor.

Having just read a copy of the report myself, I agree wholeheartedly that it is very short on analysis, or raw data. I would like to dismiss it as a marketing instrument, but unfortunately, it hasn’t been received as such in the press. There is a potential here for some very interesting academic work on the nature of student writing in a digital and social environment, what a shame that iParadigms keep their data closed and inaccessible to the academic community that created it.

Comments are closed.