What the decentralized web can learn from Wikipedia

with Andrew Dickson & Ankur Shah Delight

In this post, we analyze Wikipedia -- a site that has achieved tremendous success and scale through crowd-sourcing human input to create one of the Internet’s greatest public goods. Wikipedia’s success is particularly impressive considering that the site is owned and operated by a non-profit organization, and that almost all of its content is contributed by unpaid volunteers.

The non-commercial, volunteer-driven nature of Wikipedia may cause developers from the “decentralized web” to question the site’s relevance. However, these differences may be merely cosmetic: IPFS, for example, has no inherent commercial model, and most of the open source projects that underlie the decentralized web are built, at least in part, by volunteers.

We believe that a site that has managed to coordinate so many people to produce such remarkable content is well worth a look as we search for solutions to similar problems in the emerging decentralized web.

To better understand Wikipedia’s success, we first survey some key features of Wikipedia’s battle-tested (to the tune of 120,000 active volunteer editors) coordination mechanisms. Next, we present some valuable high-level lessons that blockchain projects interested in human input might learn from Wikipedia’s approach. Finally, we explore vulnerabilities inherent to Wikipedia’s suite of mechanisms, as well as the defenses it has developed to such attacks.

Wikipedia: key elements

While we cannot hope to cover all of Wikipedia’s functionality in this short post, we start by outlining a number of Wikipedia’s foundational coordination mechanisms as background for our analysis.

User and article Talk Pages

While anyone can edit an article anonymously on Wikipedia, most regular editors choose to register with the organization and gain additional privileges. As such, most editors, and all articles, have a public metadata page known as a talk page, for public conversations about the relevant user or article. Talk pages are root-level collaborative infrastructure: they allow conversations and disputes to happen frequently and publicly.

Since talk pages capture a history of each editor’s interaction -- both in terms of encyclopedia content and conversational exchanges with other editors -- they also provide the basis for Wikipedia’s reputation system.

Clear and accessible rules

If we think of the collection of mechanisms Wikipedia uses to coordinate its editors as a kind of “social protocol”, the heart of that protocol would surely be its List of Guidelines and List of Policies, developed and enforced by the community itself. According to the Wikipedia page on Policies and Guidelines:

“Wikipedia policies and guidelines are developed by the community… Policies are standards that all users should normally follow, and guidelines are generally meant to be best practices for following those standards in specific contexts. Policies and guidelines should always be applied using reason and common sense.”

For many coming from a blockchain background, such policies and guidelines will likely seem far too informal to be of much use, especially without monetary or legal enforcement. And yet, the practical reality is that these mechanisms have been remarkably effective at coordinating Wikipedia’s tens of thousands of volunteer editors over almost two decades, without having to resort to legal threats or economic incentives for enforcement.

Enforcement: Peer consensus and volunteer authority

Upon hearing that anyone can edit a Wikipedia page, no money is staked, no contracts are signed, and neither paid police nor smart contracts are available to enforce the guidelines, an obvious question is: why are the rules actually followed?

Wikipedia’s primary enforcement strategy is peer-based consensus. Editors know that when peer consensus fails, final authority rests with certain, privileged, volunteer authorities with long-standing reputations at stake.

Peer consensus

As an example, let’s consider three of the site’s most fundamental content policies, often referred to together. “Neutral Point of View” (NPOV), “No Original Research” (NOR), and “Verifiability” (V) evolved to guide editors towards Wikipedia’s mission of an unbiased encyclopedia.

If I modify the Wikipedia page for Mahatma Gandhi, changing his birthdate to the year 1472, or offering an ungrounded opinion about his life or work, there is no economic loss or legal challenge. Instead, because there is a large community of editors who do respect the policies (even though I do not), my edit will almost certainly be swiftly reverted until I can credibly argue that my changes meet Wikipedia’s policies and guidelines (“Neutral Point of View” and “Verifiability”, in this case).

Such discussions typically take place on talk pages, either the editor’s or the article’s, until consensus amongst editors is achieved. If I insist on maintaining my edits without convincing my disputants, I risk violating other policies, such as 3RR (explained below), and attracting the attention of an administrator.

Volunteer authority: Administrators and Bureaucrats

When peer consensus fails, and explicit authority is needed to resolve a dispute, action is taken by an experienced volunteer editor with a long and positive track record: an Administrator.

Administrators have a high degree of control over content, include blocking and unblocking users, editing protected pages, and deleting and undeleting pages. Because there are relatively few of them (~500 active administrators for English Wikipedia), being an administrator is quite an honor. Once nominated, adminship is determined through discussion on the user’s nomination page, not voting, with a volunteer bureaucrat gauging the positivity of comments at the end of the discussion. In practice, those candidates having more than 75% positive comments tend to pass.

Bureaucrats are the highest level of volunteer authority in Wikipedia, and are also typically administrators as well. While administrators have the final say for content decisions, bureaucrats hold the ultimate responsibility for adding and removing all kinds of user privileges, including adminship. Like administrators, bureaucrats are determined through community discussion and consensus. However, they are even rarer: there are currently only 18 for the entire English Wikipedia.

Since there is no hard limit to the number of administrators and bureaucrats, promotion is truly meritocratic.

Evolving governance

Another notable aspect of Wikipedia’s policies and guidelines is that they can change over time. And in principle, changing a Wikipedia policy or guideline page is no different than changing any other page on the site.

The fluidity of the policies and guidelines plays an important role in maintaining editors’ confidence in enforcing the rules. After all, people are much more likely to believe in rules that they helped create.

If we continue to think of the policies and guidelines for Wikipedia as a kind of protocol, we would say that the protocol can be amended over time and that the governance for its evolution takes place in-protocol -- that is, as a part of the protocol itself.

Lessons for the decentralized web

Now that we have a little bit of background on Wikipedia’s core mechanisms, we will delve into the ways that Wikipedia’s approach to coordination differs from similar solutions in public blockchain protocols. There are three areas where we believe the decentralized web may have lessons to learn from Wikipedia’s success: cooperative games, reputation, and an iterative approach to “success”.

We also hope that these lessons may apply to our problem of generating trusted seed sets for Osrank.

Blockchain should consider cooperative games

Examining Wikipedia with our blockchain hats on, one thing that jumps out right away is that pretty much all of Wikipedia’s coordination games are cooperative rather than adversarial. For contrast, consider Proof of Work as it is used by the Bitcoin network. Because running mining hardware costs money in the form of electricity and because only one node can get the reward in each block, the game is inherently zero-sum: when I win, I earn a block reward; every other miner loses money. It is the adversarial nature of such games that leaves us unsurprised when concerns like selfish mining start to crop up.

As an even better example, consider Token Curated Registries (TCRs). We won’t spend time describing the mechanics of TCRs here, because we plan to cover the topic in more detail in a later post. But for now, the important thing to know is that TCRs allow people to place bets, with real money, on whether or not a given item will be included in a list. The idea is that, like an efficient market, the result of the betting will converge to produce the correct answer.

One problem with mechanisms like TCRs is that many people have a strong preference against playing any game in which they have a significant chance of losing -- even if they can expect their gains to make up for their losses over time. In behavioral psychology, this result is known as loss aversion and has been confirmed in many real-world experiments.

In short, Proof of Work and TCRs are both adversarial mechanisms for resolving conflicts and coming to consensus. To see how Wikipedia resolves similar conflicts using cooperative solutions, let’s dive deeper into what dispute resolution looks like on the site.

Dispute resolution

So how does a dubious change to Mahatma Gandhi’s page actually get reverted? In other words, what is the process by which that work gets done?

When a dispute first arises, Wikipedia instructs the editors to avoid their instinct to revert or overwrite each other’s edits, and to take the conflict to the article’s talk page instead. Some quotes from Wikipedia’s page on Dispute Resolution point to the importance of the Talk pages:

“Talking to other parties is not a mere formality, but an integral part of writing the encyclopedia”

“Sustained discussion between the parties, even if not immediately successful, demonstrates your good faith and shows you are trying to reach a consensus.”

Editors who insist on “edit warring”, or simply reverting another editor’s changes without discussion, risk violating Wikipedia’s 3RR policy, which prohibits editors from reverting 3 changes on a given page in 24 hours. Editors who violate 3RR risk a temporary suspension of their accounts.

If initial efforts by the editors to communicate on the Talk Page fail, Wikipedia offers many additional solutions for cooperative coordination, including:

Editor Assistance provides one-on-one advice on how to conduct a civil, content-focused discussion from an experienced editor.
Moderated Discussion offers the facilitation help of an experienced moderator, and is only available after lengthy discussion on the article’s Talk page.
3rd Opinion, matches the disputants with a third, neutral opinion, and is only available for disputes involving only people.
Community Input allows the disputants to get input from a (potentially) large number of content experts.

Binding arbitration from the Arbitration Committee is considered the option of last resort, and is the only option in which the editors are not required to come to a consensus on their own. According to Wikipedia’s index of arbitration cases, this mechanism has been invoked only 513 times since 2004 -- a strong vote of confidence for its first-pass dispute resolution mechanisms.

A notable theme of all of these dispute resolution mechanisms is how uniformly cooperative they are. In particular, it is worth observing that in no case can any editor lose something of significant economic value, as they might, for instance, if a TCR was used to resolve the dispute.

What the editor does lose, if their edit does not make it into the encyclopedia, is whatever time and work she put into the edit. This risk likely incentivises editors to make small, frequent contributions rather than large ones and to discuss major changes with other editors before starting work on them.

“Losing” may not even be the right word. As long as the author of the unincluded edit believes in Wikipedia’s process as a whole, she may still view her dispute as another form of contribution to the article. In fact, reputation-wise, evidence of a well-conducted dispute only adds credibility to the user accounts of the disputants.

Reputation without real-rorld identity can work

Another lesson from Wikipedia relates to what volunteer editors have at stake and how the site’s policies use that stake to ensure their good behavior on the system.

Many blockchain systems require that potential participants stake something of real-world value, typically either a bond or an off-chain record of good “reputation”. For example, in some protocols, proof-of-stake validators risk losing large amount of tokens if they don’t follow the network’s consensus rules. In other networks, governors or trustees might be KYC’d with the threat of legal challenge, or public disapproval, if they misbehave.

Wikipedia appears to have found a way to incentivize participants’ attachment to their pseudonyms without requiring evidence of real-world identity. We believe this is because reputation in Wikipedia’s community is based on a long-running history of small contributions that is difficult and time-consuming to fake, outsource, or automate.

Once an editor has traded anonymity for pseudonymity and created a user account, the first type of reputation that is typically considered is their “edit count”. Edit count is the total number of page changes that the editor has made during his or her history of contributing to Wikipedia. In a sense, edit count is a human version of proof-of-work, because it provides a difficult-to-fake reference for the amount of work the editor has contributed to the site.

If edit count is the simplest quantitative measure of a user’s total reputation on the site, its qualitative analog is the user talk pages. Talk pages provide a complete record of the user’s individual edits, as well as a record of administrative actions that have been taken against the user, and notes and comments by other users. The Wikipedia community also offers many kinds of subjective awards which contribute to editor reputation.

Reputable editors enjoy privileges on Wikipedia that cannot be earned in any other way -- in particular, a community-wide “benefit of the doubt”. Wikipedia: The Missing Manual’s page on vandalism and spam provides a good high-level overview, instructing editors who encounter a potentially problematic edit to first visit the author’s talk page. Talk pages with lots of edits over time indicate the author should be assumed to be acting in good faith, and notified before their questionable edit is reverted: “In the rare case that you think there's a problem with an edit from this kind of editor, chances are you've misunderstood something.”

On the other hand, the same source’s recommendations for questionable edits by anonymous editors, or editors with empty talk pages, are quite different: “If you see a questionable edit from this kind of user account, you can be virtually certain it was vandalism.”

Blockchains which adopt similar reputation mechanisms might expect to see two major changes: slower evolution of governance and sticky users. And while no public blockchains that we’re aware of have made significant use of pseudonymous reputation, it’s worth noting that such mechanisms have played a significant role in the increasing adoption of the Dark Web.

Assigning power based on a long history of user edits means that the composition of the governing class necessarily changes slowly and predictably, and is therefore less subject to the “hostile takeovers” that are a fundamental risk for many token-voting-based schemes.

Sticky users are a consequence of the slow accretion of power: experienced users tend to stick to their original pseudonym precisely because it would be time-consuming to recreate a similar level of privilege (both implicit and explicit) under a new identity.

All in all, Wikipedia’s reputation system may represent an excellent compromise between designs offering total anonymity on one hand and identity models built on personally identifying information on the other. In particular, such a system has the benefit of allowing users to accrue reputation over time and resisting Sybil attacks by punishing users if and when they misbehave. At the same time, it also allows users to preserve the privacy of their real-world identities if they wish.

Iteration over finality

Wikipedia’s encyclopedic mission, by its very nature, can never be fully completed. As such, the site’s mechanisms do not attempt to resolve conflicts quickly or ensure the next version of a given page arrives at the ultimate truth, but rather, just nudge the encyclopedia one step closer to its goal. This “iterative attitude” is particularly well-suited to assembling human input. Humans often take a long time to make decisions, change their minds frequently, and are susceptible to persuasion by their peers.

What can Radicle, and other p2p & blockchain projects, learn from Wikipedia in this regard? Up to this point, many protocol designers in blockchain have had a preference for mechanisms that achieve “finality” -- that is, resolve to a final state, with no further changes allowed -- as quickly as possible. There are often very good reasons for this, particularly in the area of consensus mechanisms and yet, taking inspiration from Wikipedia, we might just as easily consider designs that favor slow incremental changes over fast decisive ones.

For instance, imagine a protocol in which (as with Wikipedia) it is relatively easy for any user to change the system state (e.g. propose a new trusted seed), but such a change might be equally easily reverted by another user, or a group of users with superior reputation.

Or consider a protocol in which any state change is rolled out over a long period of time. In Osrank, for instance, this might mean that trusted seeds would start out as only 10% trusted, then 20% trusted one month later, and so on. While such a design would be quite different from how Wikipedia works today, it would hew to the same spirit of slow, considered change over instant finality.

Attacks and defenses

While the previous section covered a number of ways in which Wikipedia’s mechanisms have found success up to this point, the true test of a decentralized system is how vulnerable it is to attacks and manipulation. In this section, we introduce Wikipedia’s perspective on security. We then examine some of Wikipedia’s vulnerabilities, the attacks that play upon them and the defenses the Wikipedia community has evolved.

How Wikipedia Works: Chapter 12 discusses the fact that nearly all of the security utilized by Wikipedia is “soft security”:

“One of the paradoxes of Wikipedia is that this system seems like it could never work. In a completely open system run by volunteers, why aren't more limits required? One answer is that Wikipedia uses the principle of soft security in the broadest way. Security is guided by the community, rather than by restricting community actions ahead of time. Everyone active on the site is responsible for security and quality. You, your watchlist, and your alertness to strange actions and odd defects in articles are part of the security system.”

What does “soft security” mean? It means that security is largely reactionary, rather than preventative or broadly restrictive on user actions in advance. With a few exceptions, any anonymous editor can change any page on the site at any time. The dangers of such a policy are obvious, but the advantages are perhaps less so: Wikipedia’s security offers a level of adaptability and flexibility that is not possible with traditional security policies and tools.

Below, we discuss three kinds of attacks that Wikipedia has faced through the years: Bad Edits (vandalism and spam), Sybil Attacks, and Editing for Pay. For each attack we note the strategies and solutions Wikipedia has responded with and offer a rough evaluation of their efficacy.

Bad edits: Vandalism and spam

The fact that anyone with an internet connection can edit almost any page on Wikipedia is one of the site’s greatest strengths, but perhaps may also be its greatest vulnerability. Edits not in service of Wikipedia’s mission fall into two general categories: malicious edits (vandalism) and promotional edits (spam).

While Wikipedia reader/editors are ultimately responsible for the clarity and accuracy of the encylopedia’s content, a number of tools have been developed to combat vandalism and spam. Wikipedia: The Missing Manual gives a high-level overview:

Bots. Much vandalism follows simple patterns that computer programs can recognize. Wikipedia allows bots to revert vandalism: in the cases where they make a mistake, the mistake is easy to revert.
Recent changes patrol. The RCP is a semi-organized group of editors who monitor changes to all the articles in Wikipedia, as the changes happen, to spot and revert vandalism immediately. Most RC patrollers use tools to handle the routine steps in vandal fighting.
Watchlists. Although the primary focus of monitoring is often content (and thus potential content disputes, as described in Chapter 10: Resolving content disputes), watchlists are an excellent way for concerned editors to spot vandalism.

Given the incredible popularity, and perceived respectability, of Wikipedia, it’s safe to say that the community’s defenses against basic vandalism and spam are holding up quite well overall.

Sybil attacks

Sybil attacks, endemic to the blockchain ecosystem, are known as “Sockpuppets” in Wikipedia, and are used to designate multiple handles controlled by the same person. They are usually employed when one person wants to seem like multiple editors, or wants to continue editing after being blocked.

While Sockpuppets are harder to detect in an automated fashion than vandalism and spam, there is a process for opening Sockpuppet investigations and a noticeboard for ongoing investigations. Well-thought-out sockpuppetry attacks are both time-consuming to mount and defend against. While dedicated investigators (known as clerks) are well-suited to the task, it is impossible to know how much successful Sockpuppetry has yet to be discovered.

Hired guns — Editing for pay

Hired guns -- editors who make changes to in exchange for pay -- are becoming an increasingly serious concern for Wikipedia, at least according to a 2018 Medium post, “Wikipedia’s Top-Secret ‘Hired Guns’ Will Make You Matter (For a Price)”, in which Author Stephen Harrison writes,

“A market of pay-to-play services has emerged, where customers with the right background can drop serious money to hire editors to create pages about them; a serious ethical breach that could get worse with the rise of—wait for it—cryptocurrency payments.”

In the post, Harrison draws on a number of interviews he conducted with entrepreneurs running businesses in this controversial space. According to Harrison, businesses like What About Wiki, operate in secret, utilizing large numbers of sockpuppet accounts and do not disclose the fact that that their edits are being done in exchange for pay.

In the past, Wikipedia has prohibited all such activities and in fact, businesses like What About Wiki violate Wikipedia’s Terms of Use -- a legally binding agreement. However that seems to be changing. According to Harrison,

“A 2012 investigation discovered that the public relations firm Wiki-PR was editing the encyclopedia using multiple deceptive sock-puppet accounts for clients like Priceline and Viacom. In the wake of the Wiki-PR incident, the Wikimedia Foundation changed its terms of use in 2014 to require anyone compensated for their contributions to openly disclose their affiliation.”

The upshot is that since 2014, paid editing is now allowed on the site so long as the relationship is disclosed.

And yet, major questions remain. For one thing, at least according to Harrison’s analysis, companies acting in compliance with Wikipedia’s disclosure policy represent just a small fraction of the paid editors working (illegitimately) on the site. For another, he argues that complying with Wikipedia’s policies leads to paid editors making less money, because there’s a lower chance their edits will be accepted and therefore less chance the clients will be willing to foot the bill.

This leads to a final question, which is whether paid edits can ever really be aligned with the deep values that Wikipedia holds. For instance, one of Wikipedia’s main behavior guidelines is a prohibition against editors who have a conflict of interest in working on a given page. It’s hard to imagine a clearler conflict of interest than a paid financial relationship between the editor and the subject of a page.

DAOs

Wikipedia’s success is inspirational in terms of what can be accomplished through decentralized coordination of a large group of people. While we believe that the decentralized web still has many lessons to learn from the success of Wikipedia -- and we’ve tried to touch a few in this post -- a great deal of work and thinking has already been done around how a large organization like Wikipedia could eventually be coordinated on-chain.

Such organizations are known as Decentralized Autonomous Organizations (DAOs), and that will be the topic of a future post.