too many
Image by Venaslarguisimas via Flickr

It all started with email. Targeted emails from friends were a revelation, and are still welcomed, but spam emails quickly became an equally familiar reality of the modern age. Entire industries have been launched to create and deliver them, with counterparts created to battle and suppress them. Law enforcement efforts to take down spammers have had limited success, and filters still deliver false-positives and false-negatives, often to the chagrin of the recipient — who hasn’t missed a vital email because a spam filter trapped it in error?

Spam is an approach to reaching a large customer base, but a way that leverages abundance by compensating for low-yield with high capacity. Content farms have sprung up to essentially spam search engines, and one could argue that even scholarly publishing is tending to grow in this direction — new journals with unlimited capacity, the majority of papers going uncited and unread, and other signs of embracing abundance with abandon.

Now, the spam approach to publishing is hitting the Kindle and in a significant way, as content farmers move in to the e-book space.

The practice was apparently first examined in-depth by Mike Essex in the UK, in a blog post from March 2011 on the Koozai blog. Scraping Web sites, plagiarizing blog posts, and simply stealing content and courting objections — all are valid ways for content farms to harvest content and distribute it through the Amazon store. One “author” — Manuel Ortiz Braschi — created 2,879 eBooks in just a couple of years. As Essex points out

. . . having an eBook on Amazon’s domain (PageRank 8), or Apple’s (PageRank 9) is a sure fire way to ensure your content gets found, and if you have a keyword rich book title it certainly will do so. . . . Even pricing an eBook at a dollar or 99p a scammer can make a 30% royalty on Amazon. This is more than the average AdWords click – the financial factor in content farm growth – and is the payout for scammers using the eBook platform to hold their content.

The story has become a little bigger and more complicated since March. As Alistair Barr wrote recently in the Globe and Mail:

Spam has hit the Kindle, clogging the online bookstore of the top-selling eReader with material that is far from being book worthy and threatening to undermine Inc’s publishing foray. Thousands of digital books, called ebooks, are being published through Amazon’s self-publishing system each month. Many are not written in the traditional sense.

It turns out that in the era of information abundance, creating a spam business model for long-form content isn’t significantly more difficult than creating it for short-form content. The spammers are apparently using Private Label Rights (PLR), a way of buying or acquiring rights to a book at very low cost. Once the rights are acquired, the book can be formatted and published in minutes. One such publisher decries the allegations that the Kindle store is being spammed:

I completely disagree. . . . When someone wants to purchase a book with rights such as “resell rights”, “master resale rights” or “private label rights”, they are given the right to redistribute the work. In many cases, they can redistribute the ebooks for any price they choose as well as rebrand in some cases with their own name, URL and images within the source documents and then re-render the doc files into an adobe PDF or similar document. . . . I have sold thousands of books . . . that are accompanied with re-distribution rights as outlined above. Many sites offer these books as well as many other products to include website scripts and website templates, audio books and video tutorials in redistributable packages that are built for people who want to resell them for 100% pure profit over and over again.

The number of books being sold in this manner is truly staggering. As the Globe and Mail story details:

In 2010, almost 2.8 million nontraditional books, including ebooks, were published in the United States, while just more than 316,000 traditional books came out. That compares with 1.33 million nontraditional books and 302,000 conventional books in 2009, according to Albert Greco, a publishing-industry expert at Fordham University’s business school. In 2002, fewer than 33,000 nontraditional books were published, while over 215,000 traditional books came out in the United States, Greco noted.

Albert Greco interviewed Mike Essex on his blog, where Essex added this worrisome point about what else he’s found out about book spammers:

The thing that scared me the most was an email promoting a viral ebook Automatic Submitter. Anyone who purchases it gets 149,000 articles that they can use for any purpose, so they can make their own ebooks very quickly.

Part of the problem seems to be Amazon’s quick turnaround time for both publication and payment — since fast cash is the incentive spammers like best, building in delays can demotivate a spammer. For instance, Smashwords — another major e-book platform — pays only quarterly, so spammers can be caught before they get paid. Apple has a six-week process for posting a book to the iBook store. Smashwords also has an iterative process for determining ownership of content, one that requires you to keep track of things, another hassle for the fast-cash spammer crowd.

Slowing down things with an approval process is one solution proposed by Joe Wikert in a recent blog post. Another is to leverage the power of the Amazon community. While poor reviews are used to warn other customers about spam publications, perhaps an explicit “spam” flag could be used to not only warn off others but also to alert Amazon. Right now, poor reviews on spam books signal nothing more than a book somebody didn’t like.

The danger to Amazon’s reputation is significant — finding good-quality content becomes a challenge in a spam environment, and safer environments gain appeal. When readers are ripped off by spam books, it’s not good for anyone.

While spam is an approach to abundance, another approach is targeting (or, as Kevin Kelly and others call it, “prosuming”). It’s an approach Amazon famously uses for content-matching (readers who bought this also bought, etc.), and one that is currently being debated in online advertising circles.

Just like spam, targeting isn’t anything especially new, as Kelly writes:

Technology allows us to target the specifications of a product to a smaller and smaller group of people. First we can make Barbie dolls in the millions. Then with more flexible machinery and computer-generated target marketing we can make ethnic Barbies, in the hundreds of thousands. Then with improved market research and advanced communications we can make subculture Barbies, biker and grunge Barbies in the thousands. Eventually, with the right network technology, we can make the personal Barbie, the Barbie of you. In fact there is a company in Littleton, Colorado, that currently makes the “My Twinn” baby doll to look like the doll’s owner. The doll’s eye and hair color and hair style are matched to a photo of the child who will own it.

Targeting reduces the amount of noise by using customer information to tailor information presentation based on data a computer system can derive and manipulate through myriad factors emanating from a user’s system and transactions. However, privacy advocates warn us that targeting could shade into something more nefarious.

Recently, Michael Rosenwald of the Washington Post undertook a project he called “Operation Track Me More,” an effort to see if exposing himself to all the ad targeting technology possible resulted in a better or worse online experience. Google’s recent tagline for its ads — “There’s a perfect ad for everyone” — served as an inspiration. As he put it:

My goal: Stop the Internet from frequently showing me ads for products I don’t care about or need, such as Preparation H or Gillette’s new Venus Bikini Trimmer, which sounds positively terrifying and totally useless to me because, among other reasons better left unsaid, I only buy swimsuits that reach my knees.

Rosenwald offered himself up for ad targeting whenever he could, actively like and disliked ads now and then, and waited for the results. It didn’t take long. As he wrote:

. . . relevant ads can grab attention. I played a little turn-it-on/turn-it-off game with AOL. I clicked over to its ad preference settings and saw what it already discerned about me: that I liked gadgets, news and other consumer products. So they had me somewhat figured out already. Then I spent a lot of time on AOL’s various Web sites studying the ads. I kept noticing one particular ad for a wireless charging gadget for cellphones. The ad followed me around AOL’s sites. I eventually I clicked on it to see more. A study performed by Yahoo explains what was happening: Consumers spend 25 percent more time fixating on relevant ads than those that aren’t relevant. How’s this for spooky: Their pupil dilation actually increases 27 percent. I can’t see my own pupils, though I bet they dilated while I was following this ad around. When I went into AOL’s settings and opted out of ad targeting, the new ads that turned up were useless for me. Car insurance. Banking. LivingSocial deals I didn’t want. Leave me alone! Please show me something that will cure my volatile sneezing spasms every spring.

The beauty of targeting is that it requires abundance of both information and customers, but eliminates the spam parts of abundance for each user, making each experience — if done right — useful and manageable. The trouble with targeting is, as Eli Pariser explored in the TED talk we shared recently, it can create a filter bubble for each person, making the “you” of one role, one time, and one place essentially the baseline “you” from now on, while only using the most superficial measures of transaction — the click — to inform its refashioning of this baseline.

It seems at times that between these two extremes — overwhelming and unwelcome information onslaughts versus tightly controlled and personalized information — the age of abundance will push and pull us before finding some clever balance.

As Clay Shirky once said, “Abundance breaks everything.” Whether abundant information is targeted or spammed, it certainly seeks to break some of our boundaries.

Enhanced by Zemanta
Kent Anderson

Kent Anderson

Kent Anderson is the CEO of RedLink and RedLink Network, a past-President of SSP, and the founder of the Scholarly Kitchen. He has worked as Publisher at AAAS/Science, CEO/Publisher of JBJS, Inc., a publishing executive at the Massachusetts Medical Society, Publishing Director of the New England Journal of Medicine, and Director of Medical Journals at the American Academy of Pediatrics. Opinions on social media or blogs are his own.

View All Posts by Kent Anderson


6 Thoughts on "Spam versus Targeting — Which Approach Will Define the Age of Abundance?"

Here’s another example from Barnes and Noble, where spammers are putting together a set of Wikipedia pages about an author’s work and selling it. The problem for buyers is that when one searches on a particular author, these titles are mixed in with the real books from that author:
Barnes and Noble did fix this particular author’s issue, but it’s unclear if they’re doing the same systematically for other authors:

I don’t see this as spam, because the value added is that it might take me hours or days to track down those W pages manually, if I am doing research on the author in question. If it is mislabeled that may be false marketing or something but that does not make the content spam.

In fact after reading this post I am thinking about the following e-book product. Tell me if it is spam, and if so why? I just developed an algorithm that finds all-and-only the core scientific literature on any given, very specific topic. Suppose I use it across the open access literature for a popular topic X, and compile the core articles into an e-book. I did not write any of the articles but I found and complied them, in a way that would take anyone else many, many hours to do. Sounds like a real book to me.

Have to disagree, these are spam, or perhaps a better word is “scam”. A typical example is here:
This looks, for all intents and purposes, like a compilation of 3 of the author’s books, given that the title is the author’s name and the titles of 3 of his books. There’s no indication that it is just a compilation of Wikipedia pages about the author and the books. The book description is an excerpt of the author’s Wikipedia biography. To me, this is pretty clearly set up to trick people into buying it by making it appear as if it is the actual books, not material about the books.
It gets the “spam” tag because these sorts of things are usually done in bulk in an automated manner, putting out as much material as possible in order to profit from volume by catching the rare unaware victim.

Your proposed book would only be considered spam/scam if you represented it as something other than what it was. However, that’s not its largest problem. The key problem as I see it, as that “open access” does not mean “free from copyright restrictions.” Many journals publish open access articles but retain the copyright (or that copyright is held by the author of the article). That means you can’t just reuse it and re-sell it at will. Some journals and archives use Creative Commons licenses that restrict what you can do with it (many disallow any commercial re-use). So to legally do your book, you’d need to determine the copyright status of every piece you’re including and secure the correct permissions.

We do not disagree. It is false marketing. The title, or subtitle, should be something like “Wikipedia Compendium for Author X”. But the content might be quite valuable (I spend a lot of time navigating Wikipedia).

Perhaps there is value, though likely not enough value to warrant a $14.14 price for 32 pages of text directly from Wikipedia. I do like the idea of a broader compilation and can see the value in saving a buyer the time required to put it together. But just grabbing 3 or 4 Wikipedia pages from a simple search (author name and title of 3 books) has minimal value. If you can navigate to the B&N page and search for the author or the title to buy the book, can’t you just do the same on Wikipedia and print/copy and paste the text from there?

Set the clock back about 20 years, and there was a maturing discussion in information science regarding search precision vs. search recall. Traditional search technologies (i.e., Boolean search) could be used to emphasize one or the other, but almost never both simultaneously. Once the relevance-ranking search technologies (Google et al.) emerged, then we were able to get the best of both worlds – i.e., the best results early in the search undertaking. Seems that good targeting technologies will similarly emerge, as well as smart agents that will give me not only what I’m asking for, but also more peripheral discoveries that will end up being of value to me (aka serendipity).

Comments are closed.