Fixing Instead of Breaking, Part One - Open Citations

The mantra of the Digital Age — “Move fast and break things. Unless you are breaking stuff, you are not moving fast enough” — was coined by Mark Zuckerberg in the early days of Facebook. The quote’s underlying message had already consumed Silicon Valley and others via its shorthand equivalent, “disruption.” Disruption was deemed to be a good thing, shaking the cobwebs off the status quo and putting necessary revolutionary pressure on slow-moving enterprises and weak ideas.

What transpired from there has been akin to other disruptions of society caused by new technologies, whether the printing press or the Industrial Revolution. As with those and other convulsions, there is a beginning, a middle, and a resolution. We may be approaching the end of the beginning of this one.

After years of disruption, we have a polarized society with less scientific literacy and less cohesion, international alliances under strain, greater income inequality, tense race and sex relations, and a Doomsday Clock that has advanced a minute closer to its symbolic midnight in just over a year. We have depressed and stressed children and teens showing how “social media” is deeply antisocial, heaps of misinformation (including Facebook enabling yet again misinformation about vaccines), and a raft of other maladies that all seem somehow related to the fracturing and exploitation — the disruption — of our information environment.

At Davos last week, George Soros labeled Google and Facebook “obstacles to innovation” and described them as powerful monopolies that harm individuals, market innovation, and democracy itself.

Clearly, the disruption has not delivered a panacea as expected. Now, the tide seems to be turning, as more and more experts are saying the time has come for us to fix what has been broken.

Andrew Keen in his new book How to Fix the Future writes about this while also indicating how we can get our bearings again:

The future isn’t working. There’s a hole in it. . . . Ourselves. We are forgetting about our place, the human place, in this twenty-first century networked world. That’s where the hole is. And the future, our future, won’t be fixed until we fill it.

Yet, the momentum created by moving fast may cause some things to be broken still. One new example may be the unanticipated harm that could be created by moving fast into the realm of open citations.

Say what you will about the Impact Factor and its limitations, one thing it doesn’t do is foster mob or swarm mentality. As constructed and executed currently — as a lagging indicator, with opacity given the sheer volume of citations occurring across a large number of sources, and given its significant lag in numbers being announced, along with a lack of specificity at the article level — the metric does not encourage any particular article’s citations to “go viral.” If you look at the citation curve for nearly any journal, the Impact Factor consists of a small percentage of highly cited articles and a larger number of moderately or minimally cited articles. By not providing immediate data or tools users can access while responding to known citation patterns, the act of citation is nearly organic, with other factors — brand, distribution, reputation — certainly contributing to some extent. Cheating does occur, but mechanisms are built in to discourage and detect it.

The approach taken with the Impact Factor can seem anachronistic when nearly all modern media is reliant on business models built on immediacy, network effects, and the particular psychological games these approaches have become known for — exploiting insecurity, groupthink, clickbait, and swarm behavior.

Recent calls to make citation data “open” could move citations into this dubious modern age, and there is a good amount of enthusiasm for the innovation. But are there potential downsides? Could open citations inadvertently foment herd mentality and swarm behavior around citations? Could it feed the dominance of top journals by reinforcing their position with fast feedback loops? Could it increasingly feed the surveillance economy that’s been built around platforms and free content? Could it entice authors and editors to find new ways to cheat their way up the ladder?

Revealing citations in more-or-less realtime may change how articles are cited, and not in a legitimate or informative way. When metrics are so visible, accessible, and responsive, they can create feedback loops that promulgate swarm behavior. Popularity becomes a self-fulfilling prophecy. Seeing that something is cited a lot might make you more likely to cite it. Algorithms and discovery services may surface such articles more often in data interfaces. It’s the availability error on a networked scale. Quality works may remain hidden under these waves of clicks and self-reinforcing awareness.

Digital Science’s latest initiative, Dimensions, injects disruption into the citation space. You get a hint of this in Roger Schonfeld’s recent post analyzing Dimension’s launch:

. . . [Dimensions] collapses the product categories of citation database and analytics suite into a single new product category. . . . Dimensions is inclusive in terms of content coverage, rather than curated as is the case for Scopus and Web of Science. Of course, what reads to some as more inclusive can be seen by others as less rigorous selection, given the ways that citation databases have been policed to minimize exploitation of bibliometrics.

Instead of creating a bulwark against groupthink, Dimensions’ approach to citations may be more susceptible to it. The badging they are leveraging, an extension of the Altmetric badge, which has limitations it carries forward, as outlined in a post by Phil Davis from 2013. Now, an elaboration on the flower motif — more of a 3D hexagon — is being used to encourage display of citation data as sheer numbers based fundamentally on a competition paradigm — popularity, relative ratio, etc. — all at the article level:

#Researchers! Display citation metrics on your webpage or CV using free @DSDimensions badges developed by @altmetric. Get them today at https://t.co/ndiXYsGiSp

— Altmetric (@altmetric) January 22, 2018

These badges and their integration into web sites have the potential to be exploited. Imagine editing a journal when there is a paradigm that causes people to chase citations at the article level. I’m going to prioritize papers that cite recent articles in my journal — an author cites an article, her paper moves to the front of the queue. We know some journals already unethically demand authors add more citations to the same journal to their paper. Once this is working at scale, we may see the incentives coalesce, and citation hacking will take off like wildfire.

At a more prosaic level, this disruption also has the potential to make highly cited articles even more highly cited, while academics may be reluctant to cite less-cited articles because they may deem them as too obscure or somehow lacking in community acceptance.

What Schonfeld described as “the exploitation of bibliometrics” cuts both ways, which seems a hallmark of the “move fast and break things” philosophy — small divisions and differences become exaggerated by the sheer speed and scale inherent in these platforms.

Metrics that claim to move authority to the crowd may actually decrease individual autonomy by concealing alternatives or preying on psychological biases in real-time, while metrics that place curatorial authority over data may actually increase the autonomy of individuals in the system by buffering or thwarting the swarm effect, leaving users with more control.

In 2015, Phil Davis analyzed initiatives Scopus took to improve its authority, writing:

. . . they [Scopus] are now willing to invest in the most costly resource in building authority – people. In a world of abundant and cheap data, there is a real and growing demand for authority.

We’ve seen the negative effects of letting crowds manage information — it’s hard to detect abuse; divisiveness and abuse seem predictable outcomes when people are given tools that add up sentiment quickly and obviously; and, the economics that make it work incentivize turning users into the products, which flips the script on the relationship most of us want with information products.

Predatory publishers are already a major, poorly managed problem in an industry that has embraced perhaps more disruption than it should. A recent story in the Guardian outlined how sketchy journals can support climate change denialism, lending it a false legitimacy. This is nothing new, but the fact that it continues and seems unstoppable is worrying. Could open citations expose citation data to some sort of predation? You can probably already imagine the algorithm designed to scrape open citation data to identify highly cited papers to add to a manuscript so it’s more likely to be accepted.

Algorithms are often loopholes in disguise, which clever people can easily exploit, in ways that are difficult to detect for technical (sneaky) and psychological (arrogance) reasons.

The model has been abused before, and this abuse continues to this day, although it has morphed. Google’s PageRank, which was built on the impact factor model and brought to scale, has become unrecognizable, as clever people with something at stake found ways to exploit the algorithm, game the citation paths, and tilt the playing field their way. Google has spent the better part of the last decade modifying their approach, first with what seemed like integrity and more lately with what seems like commercial opportunism. A recent search of the term “hematology journals” delivered results dominated by paid search results and predatory publishers, with OMICS leading the pack. Once people saw how the system worked, they could make it work for them.

Yielding authority to the crowd, or to some algorithm that purports to be able to moderate the crowd, can work pretty well for a time, especially in the early days when user diversity is low and a large share of the crazy or nefarious in the world has not yet interacted with the algorithms. However, if the system reaches sufficient scale, the algorithm’s limitations show, exploits become apparent, or the algorithm simply goes rogue under the stress. Google discovered this with its numerous controversies about auto-fill search terms. Facebook and Twitter learned this with Russian poaching of their systems via fake accounts. We may learn it via shady citations generated by exploits we may not yet be able to imagine. The cookbook is already posted.

Different approaches have emerged in scholarly publishing broadly over the past decade, with some journals exhibiting tinges of the “move fast and break things” philosophy with editorial standards that shed some of the perceived barriers to publication (novelty, significance, relevance) while encouraging critiques of proprietary publishing as elitist or misguided. Preprint servers are another “move fast and break things” approach that will bear further scrutiny from those who may be more in the “slow down and fix things” camp. I’ve speculated before that their presence could allow journals to resume their roles more clearly without having to drift into supporting the speedy aspects of publication.

By its very nature, the culture of scientific and scholarly publishing imposes aspects of “slow down and fix things” — retractions, corrections, and other integrity measures don’t let publishers cut the cord or move unburdened on to the next thing, the pivot, and reinvention. The closest we’ve seen a publisher come to fully abdicating responsibility is F1000 Research, which has a peer review and approval practice that still defies explanation for deposit into PubMed and potential retraction in cases of later disagreements or problems. F1000 Research is a case of a publisher wishing to behave as a disruptor, shedding certain obligations by enabling the crowd. Hailed by some as innovative, we now might sense it is more fraught.

As an industry, we’ve specialized in “slow down and fix things” practices — curating content, selecting the best, selecting relevant content for developed audiences, and slowing things down just enough to help smart people separate the wheat from the chaff. Publishers, editors, reviewers, and librarians all developed their professional habits and reputations via this model. Over the past 10-15 years, we have tilted more and more toward the “move fast and break things” model, which makes editors, reviewers, publishers, and librarians understandably uncomfortable, moving their roles into uncertain positions relative to the information stream. It purports benefits using attractive vocabulary (open, free, democratic), but often what breaks only feels free or open or democratic for a brief period, until the shackles of uncertainty and unreliability bring an Orwellian twist to the words.

Perhaps a contra move is in order — instead of more information loosely managed with responsibility deferred so we can move fast and break things, it might be wiser to embrace high editorial standards and more active curation in order to get ahead of the fact that platform providers are going to undergo a decade or more of fixing what they’ve broken.

I believe we are seeing an essential tension between disruption and repair in the realm of information purveyance. How organizations strike the proper balance will inform a lot of what we talk about in the coming years. In the case of open citations injected into interfaces in ways that might skew behavior, we might want to pause a beat and think again about the benefits of separating action from reaction, adding smart, informed human circuit breakers, and letting some important measures lag. The care and adjudication of the data underlying the Impact Factor, despite its flaws, may hold clues about some essential firebreaks.

(Tomorrow, Part Two explores slowing down and fixing the business model elements in ways that bear on issues of integrity, governance, and revenues.)

Kent Anderson

@kanderson

Kent Anderson is the CEO of RedLink and RedLink Network, a past-President of SSP, and the founder of the Scholarly Kitchen. He has worked as Publisher at AAAS/Science, CEO/Publisher of JBJS, Inc., a publishing executive at the Massachusetts Medical Society, Publishing Director of the New England Journal of Medicine, and Director of Medical Journals at the American Academy of Pediatrics. Opinions on social media or blogs are his own.

Discussion

10 Thoughts on "Fixing Instead of Breaking, Part One — Open Citations"

The only thing open citations breaks is the business model of certain monopolies in our industry. Did you know Impact Factors are not only slow, they are routinely wrong when first issued (a correction comes later in the year)? Open citations fixes that and so much more. If you want to calculate citation metrics based on a whitelist of well-managed journals, open citations *enables* that – everybody can do the analysis with their own whitelistt. If you want to consider more realistic metrics than the average that IF depends on, you can do a full statistical analysis and look at the distribution of citation data in detail. If you think longer-term citation metrics are Kobe important – say 10-year impact in some sense, you can do that with open citations but 5ere is no way to do so with proprietary IF. This article is full of straw man references to closed social media platforms and the harm they do, but what we are talking about is *open* data that anybody can work with. Nefarious editors can try to game the system in one way or another, but now their actions will be fully in the open for all to see. Claiming any harm from this initiative is a real stretch. As to “fast” – this change could have been made a decade ago; rather than moving fast and breaking things, it seems like an abundance of caution has been exercised here. But yes it will break some things, and that’s good.

By Arthur Smith
Jan 29, 2018, 7:20 AM

An interesting perspective, and exactly why this needs more thought.

You say open citations will break up monopolies (a term we use with great imprecision and rather thoughtlessly, see here: http://www.caldera-publishing.com/blog/2018/1/16/not-many-things-are-actually-monopolies), but in fact most “open” initiatives have fed industry consolidation. This is the sad paradox of such efforts — open means the competition moves from what you own to what you can build, market, sell, and commercialize, and big firms do better at all four. When the competitive advantage is ownership of data or intellectual property, the playing field is actually more level. Imagine if homes were allocated based on how quickly you could build them and sell them, not on buying and owning them. A couple of homebuilders would own nearly all the homes, rather than individuals in the market who can transact them via ownership transfer.

You mention a whitelist, and how these whitelists could be variable — you have yours, I have mine, they have theirs. How that helps anything is beyond me. Not only that, but creating and maintaining a whitelist is expensive and time-consuming and difficult, and how you have accountability is unclear. I also don’t know how competing whitelists make for more “realistic metrics” as you so claim. We’ve had an explosion in metrics over the past 5 years (there are now more ways to measure the US stock exchange than there are stocks in the exchange), scholarly publishing included — h-index, SCImago, CiteScore, IPP, SNIP, H5, Eigenfactor, Altmetric. How these have helped is unclear to me. How they are more “realistic” is unclear to me (see: https://scholarlykitchen.sspnet.org/2017/12/13/news-fits-whats-really-driving-altmetrics-top-100-articles-list/).

You downplay the effect of making article-level citations obvious and pursuable, something that needs more careful consideration, in my view. Incentives are to be treated with care, and changing them so that an incentive for creating citations to your own article suggests to me a poor and selfish incentive.

The main point of this post is that the time to break things just to break things is passing. The damage has been assessed, and it’s significant. Many thoughtful people are saying it’s time to stop, think, and see how we can fix things. So, it’s not encouraging when your last point is “keep breaking things.”

By Kent Anderson
Jan 29, 2018, 7:46 AM

By reading this post one would think that article-level citation information is a fairly new development to which researchers haven’t had access until recently. This is, of course, far from the truth. The Web of Science has provided access to this information for a long time (not only to Journal Impact Factors). Scopus and Google Scholar have also offered article-level citation data for over 13 years now (Google Scholar even does it for free). In addition to that, researchers in many cases have been forced to become aware (and proficient in the use) of this information because their careers depended on them.

I do not argue that placing too much importance on citation counts might be creating problems in the system, but I am curious as to why it is fine that big multinationals provide citation information (while filling their pockets with public funds), but it is a threat if it becomes an open resource.

Furthermore, the post seems to want to make the case that a more open and transparent scholarly infrastructure would in fact be harmful to the scholarly community, because it would diminish the autonomy of researchers. Researchers should not have access to detailed analytics about the information they themselves produce, because after being aware of this information they would not be able to help making questionable decisions in their work. This, to me, seems to be quite a leap of reasoning and I profoundly disagree with this paternalistic rationale. I don’t think researchers need blinders like the ones people put on horses so that they are not distracted or spooked by their surroundings. Nevertheless, any questionable practices or cases of misconduct that may occur should not be pinned on the availability of the data, but on the use that individuals make of the data.

By Alberto Martín
Jan 29, 2018, 8:42 AM

Article-level citations have been available for a long time, but they have been infrequently applied because they have been costly to license or of uneven reliability (Google Scholar). Now, with the push to “open,” there may be more enticement to put them at the article level, which could change behaviors. In addition, by making them open, the input side is more vulnerable to hackery, which is a concern if the incentives have shifted.

Imposing qualification on information is not “paternalistic” but normal and prudent. Even in a fairly well-managed information space, we have instances of members of the scholarly community doing all sorts of shady things to advance their careers — fake data, padded citation lists, honorary authorship, fake reviews. Those in place to moderate and filter the raw materials are often themselves academics, or people who are just damn good at sniffing out problems and solving them. I think it’s actually more paternalistic to think that anything generated by the scholarly community is automatically blessed with integrity and does not require management.

I’m glad you agree that overemphasis on citation counts might create problems in the system. You might also see how buffering these effects, slowing down to assess their potential downsides, and so forth, could help prevent damage from simply leaping to the binary position of “open.”

By Kent Anderson
Jan 29, 2018, 9:40 AM

I never intended to transmit the idea that the scholarly community should not be managed. What I believe is that the scholarly community should have the tools to manage itself without having to rely on proprietary data. I think that the shady behaviours you are referring to are already ocurring, but there is difficulty in discovering them and calling them out precisely because research metadata is still under the tight control of these corporations (see how Elsevier is resisting to publicly release even raw references in CrossRef). By putting these data in the open, it is very likely that an ecosystem of open tools will arise (perhaps similar to the open source software ecosystem) to, among other things, detect these improper behaviours.

By Alberto Martín
Jan 29, 2018, 10:15 AM

In my opinion, “slowing things down” usually results in needless delay. When I review a manuscript, I do it shortly before it’s due. If the editor gives me three months or a month, I just put a reminder in my calendar to complete it a few days before, which raises the question why pretend that referees are oh-so-busy that they need a long time to turn something around? Moving fast and moving slow are not dichotomous. What about a middle ground? Taking pride in being slow is no better than worshipping quickness. #virtuesignalling

By DF
Jan 29, 2018, 10:14 AM

“Slowing down” is used in a relative sense, conveying that maybe technology firms have been moving so fast that they’ve been causing damage. This is a growing perception, and many smart people are articulating that slowing down might be wise.

Slowing down doesn’t mean “slow.” It just means “slower than before.”

By Kent Anderson
Jan 29, 2018, 10:22 AM

Interesting read, Kent. I have been thinking a lot about how social media platforms have “broken” things. Not just social media, but many silicon valley “disruptors.” The context in which I am pondering these issues is around calls for massive change in scholarly publishing. Facebook, Twitter, Google, Uber, AirBnB did not consider the negative consequences of their actions either out of naivete or ambivalence.

I do worry that turning over the apple cart in the scholarly communications area will net serious consequences that no one is talking about. It does not mean that initiatives should not happen or community based innovation should stop, it just means that the consequences need to be considered and the larger community needs to be aware of the pros and cons before adopting an entirely new system. Sadly, the “crowd” reacts to new with jumping in the back seat of a random car whose driver is attached to an app in order to get a cheaper ride from point a to point b. It’s only after horrible experiences happen or “unintended” economic issues are uncovered that the “crowd” stops to wonder if this is a good idea. By then, it may be too late.

By Angela Cochran
Jan 29, 2018, 12:23 PM

Disruption & consolidation will always exist in a dynamic balance. If there’s more call for disruption now, it just reflects growing sentiment that the current market leaders aren’t living up to their end of the bargain and providing the integrity & innovation that the market wants.

As I see it, if you’re an existing company concerned about disruption, the best thing you can do is to figure out what about your company is ripe for disruption & use your existing resources to disrupt yourself. Easier said than done, of course, but let’s not fool ourselves into thinking we can (or in some cases, should) stop it. The market needs both forces to create the best outcome for those whom it serves.

By William Gunn
Jan 29, 2018, 2:18 PM

Quoting Clay Shirky from 2009:
http://sustainablejournalism.org/future-of-journalism/clay-shirky-on-journalisms-future-revolutions-get-worse-first

One of the things, it’s sort of the paradox of the revolutionary, when you believe that you’re undergoing a change that is really dramatic, you have to admit that your ability to predict the future is limited, and in real revolutions things get worse before they get better. If they’re not, then they’re not revolutions. That doesn’t happen, they’re not revolutions.

By David Crotty
Jan 29, 2018, 3:43 PM

The Scholarly Kitchen

Fixing Instead of Breaking, Part One — Open Citations

Innovation Showcase Highlights Cutting-Edge Publishing Solutions

View photos from the 46th Annual Meeting!

Kent Anderson

Related Articles:

Next Article: