Weighing Consensus – Building Truth on Wikipedia

In a recent piece in the Chronicle of Higher Education's “Chronicle Review” section, Timothy Messer-Kruse criticizes the editing practices of the Wikipedia community. He describes his attempt to correct what he understood to be a factual error in Wikipedia's article on the Haymarket affair, and argues that his experience demonstrates that Wikipedia limits the ability of expert editors, such as himself, to correct factual errors on the site. While Dr. Messer-Kruse believes his experience demonstrates Wikipedia's lack of respect for scholars, I believe it actually demonstrates that Wikipedia holds a deep respect for a collaborative scholarly process that is collectively more capable of producing “truth” than any individual scholar.

Wikipedia's privileging of the collaborative scholarly process has practical implications for how scholars should, and should not, interact with Wikipedia. Academic Wikipedia editors might have more satisfying Wikipedia editing experiences in the future if they respect this fact.

To understand how and why Wikipedia functions the way it does, we must first understand the day-to-day realities of Wikipedia's editing process. Because they have the responsibility of securing the free encyclopedia against vandals and other bad actors, editors are always on the lookout for certain patterns that, for them, indicate likely vandalism or mischief. For academics, the best analogy might be the way that we scan student papers for patterns that indicate likely plagiarism. This sort of rough pattern recognition is deeply imperfect, and in the case of Messer-Kruse's edits, Wikipedians suffer from a false positive. Nevertheless, just as teachers use rough patterns when scanning giant stacks of student assignments, so too Wikipedia editors have a clear need to be able to quickly detect likely bad actors.

If we look at Messer-Kruse's first interaction with the Wikipedia community, we can see some of the patterns that likely flagged him (incorrectly) as what Wikipedian's sometimes call a “POV pusher.” That is to say, a person with an ax to grind, looking to utilize Wikipedia as a free publishing platform for their own particular pet theories. He starts his engagement with a post to the article's talk page (a special Wikipedia page that permits editors to discuss the process of creating and revising a particular article), writing:

The line in the entry that reads: “The prosecution, led by Julius Grinnell, did not offer evidence connecting any of the defendants with the bombing…” is inaccurate. The prosecution introduced much evidence linking several of the defendants to the manufacture of the bomb, the distribution of the bombs, and an alleged plot to attack the police on the evening of Tuesday, May 4. An eye-witness was put on the stand who claimed to have seen Spies light the fuse of the bomb. Police officers testified that Fielden returned their fire with his revolver. Now these witnesses and this evidence may be disputed, but it is historically wrong to claim it was not introduced. For more specific information, see http://blogs.bgsu.edu/haymarket/myth-2-no-evidence/ (http://en.wikipedia.org/w/index.php?title=Talk:Haymarket_affair&diff=prev&oldid=265725190)

By starting with a post to the talk page, Messer-Kruse follows good Wikipedia etiquette, which encourages new editors to discuss substantial changes they wish to make to pages before making them. (Those who wish to review the full record of Dr. Messer-Kruse's Wikipedia activity may do so, here.)

However, in crafting this talk message, Messer-Kruse has unintentionally engaged in rhetorical patterns that flag him as a potential bad actor in the eyes of experienced Wikipedians. His most significant error is citing a self-published source, his Bowling Green State University blog, in support of his desired changes to the article. This is, as several Wikipedia editors quickly point out, in violation of Wikipedia's reliable source guidelines.

In Messer-Kruse's defense, the tone taken by some editors in these early exchanges represents a serious misstep on their part. They tended to engage in what is known as “wiki-lawyering,” simply spitting the Wiki-shorthand code for the policy he has violated (WP:RS) at him, with little attempt to explain why he has made an error, and no attempt to offer constructive ways in which a compromise solution might be reached. These editors have since been called on the carpet for being unnecessarily hostile to newcomers, or “Biting the Newbies” in Wikipedia-speak, on the article's talk page.

Messer-Kruse, for his part, does not seem to absorb the reason why his blog is unacceptable as a source. After being directed to the reliable source policy by one editor, he retorts, “I have provided reliable sources. See my discussion of the McCormick's strike above in which I cite the primary sources for this information. By what standard are you claiming that http://blogs.bgsu.edu/haymarket/myth-2-no-evidence/ is not a 'reliable source.' It clearly cites primary sources in its rebutal of this myth. Perhaps its [sic] not 'reliable' sources you want but ideologically comfortable ones” (http://en.wikipedia.org/w/index.php?title=Talk:Haymarket_affair&diff=prev&oldid=265740457).

What Messer-Kruse is missing is how the reliable source policy allows Wikipedia to use the larger scholarly process of peer review for its own benefit. By preventing the use of self-published sources, and preferring secondary sources to primary sources, Wikipedia attempts to ensure that information has been subjected to the most vigorous review possible by scholars before being included in the encyclopedia. Does Messer-Kruse really believe that we should abandon this process, and simply allow any individual scholar to make novel claims about truth, regardless of their ability to convince scholarly peers? It is not some faceless herd of editors that Wikipedia defers to when evaluating truth-claims, it is the scholarly process itself. Even now, discussion is unfolding on the Haymarket affair article talk page concerning the larger scholarly response to Messer-Kruse's book. At issue is whether or not, in the eyes of the experts, this very recent book has indeed significantly revised our understanding of the Haymarket affair.

Wikipedia's policies here seem to have frustrated an attempt to add well-researched points to the encyclopedia, which is unfortunate. However, it is important to understand that Wikipedia editors are, every day, confronted by vast numbers of self-styled experts, many claiming academic credentials, referring to a blog or other self-published source that purports to upend this field or that based on a novel review of primary evidence. Climate science, evolutionary biology, and “western medicine,” are all particularly common targets, though I have also witnessed claims to such unlikely discoveries as a grand unified field theory. While Messer-Kruse's claims are not outrageous, his use of a self-published source, and claims to a unique interpretation of historical events flag him in the eyes of Wikipedia editors as a potentially disruptive editor. They thus use the reliable source policy to defer the responsibility for deciding whether or not his claims are true to the larger process of scholarly peer review.

This deference may indeed, as Messer-Kruse points out, render Wikipedia resistant to change. It can also, as I have argued in my previous case study of Wikipedia's coverage of the Gaza conflict (see chapter 6 here for more detail), privilege points of view with greater access to the means of producing “reliable sources.” This is an important potential problem for Wikipedia. It is an even more critical problem for a web-using public that too often allows Wikipedia to serve as their primary, or only, source of information on a given topic. More must be done to ensure the greater visibility of minority opinions on the web, and to prevent so-called “filter bubble” effects that may prevent web users from consuming a diverse set of information sources.

However, I don't think that Messer-Kruse's critique of the “undue weight” policy of Wikipedia, which holds that Wikipedia should base its coverage on academic consensus on a given topic, is the best way of correcting for this potential problem. It is interesting to note that Messer-Kruse himself, in discussing a related edit to the Haymarket affair article, makes a sort of “undue weight” argument of his own. He argues that the article's casualty count for the McCormick riot (an event that would help set the stage for the later events at the Haymarket) should be changed because, “The claim that six men were killed at the McCormick riot is inaccurate. This claim comes from the report written for the anarchist newspaper by August Spies. Chicago cornorer's records and all the other daily newspapers finally settled on two deaths as the correct number” (http://en.wikipedia.org/w/index.php?title=Talk:Haymarket_affair&diff=prev&oldid=265729292). Here, Messer-Kruse is effectively arguing for the exclusion of a fringe opinion, in deference to the weight of consensus found in other sources.

Consensus, then, is an important mechanism by which we judge the validity of certain truth-claims. I believe that one reason academics, like Messer-Kruse, and Wikipedia editors may not see eye to eye is that they have been trained to evaluate consensus in very different ways. Academic training, especially in fields that stress the production of mongraphs like history, tends to privilege the scholar's own individual judgement of the consensus of evidence. Wikipedians, by necessity of the situation Wikipedia finds itself in, understands consensus to be an ongoing process involving a vast number of both scholarly and non-scholarly actors. Rather than asking Wikipedia to hew closer to any one academic's evaluation of “truth,” I would posit that we can more readily improve Wikipedia's accuracy and respect for evidence by engaging with and respecting this ongoing process. By offering our scholarly findings to the Wikipedia community as peers in a larger process of negotiating the truth, we have the best chance of helping to build a Wikipedia that truly reflects the fullest and best picture possible of the always fraught and diverse process of establishing what we know.

November 11, 2011

(Meta)Aggregating Occupy Wall Street

So, a few weeks ago, as the Occupy Wall Street movement started picking up steam and spreading beyond the initial occupation site at Zuccotti park, I noticed that news about the various occupations, which was predominantly being spread via social media channels, often seemed fragmentary and hard to get a hold of in any sort of holistic way. This, it occurred to me, was basically an aggregation and meta-data problem to be solved, so I suggested as much to a group of fellow academics with an interest in the digital humanities. Sadly, we're all busy teachers and academic professionals, and only one of us was an experienced coder, so we didn't produce the grand aggregation of public data on OWS I had imagined. We did, however, start to collect a database of tweets that will hopefully become a fruitful source for future research.

In the meantime, however, others have done what I suggested. This is the Web 2.0 version of the “procrastination principle,” if you have a good idea, just wait. Someone else will implement it. In this blog post, I attempt make my own (very) small contribution to this process by providing an annotated list of the available aggregation projects: a sort of meta-aggregation, if you will.

OWS Aggregation Sites:

OccupationAlist is an attempt to be a single page portal to the entirety of the media coming out of the occupy movements. It includes recent updates from the “We Are The 99 Percent” tumblr arranged by date in a horizontal format that seems to have been inspired by something like iTunes' coverflow. They use foursquare check-ins to provide a visual representation of activity at occupation sites, and a map of occupation related meetups. Recent video posts are on the right hand side and recent results of twitter searches of relevant hashtags sprawl across the bottom of the page. The attempt to be everything to everyone is ambitious, and I'm curious to see how they refine the site.
Occupy Together, an early hub site for online organization of the movement, provides a hand-edited daily news blog of events they believe to be significant to the movement, as well as organizing information and a directory of actions including action websites.
OccupyStream provides a handy way to access dozens of occupation LiveStream channels, which have often been the source of important citizen-documentation of events as they unfold. Sadly, the site does not currently give the user any way of knowing if a given channels is broadcasting, or even active, without clicking through to the channel. I'm not sure if the LiveStream site provides any way of doing this, but being able to see who was broadcasting live at a glance would be great.
Researchers may be interested in participating in occupyresearch, an interdisciplinary hub wiki for research projects investigating the movement.

The aggregation sites above are interesting, but I have found that the best way to keep abreast of news about the Occupy movement is via social media, especially twitter. Cultivating a good list of twitter sources is, of course, essential to using the medium effectively. Here are some useful twitter sources I have discovered:

R. Kevin Nelson's Occupy Wall Street list is New York-centric, but very good.
Andrew Katz is a Columbia J-school student who has been a great first-person source of Occupy Wall Street news, and a strong curator of messages from other occupations as well.
David Graeber is an anthropologist and anarchist theorist who played an important role in fomenting the initial Wall Street occupation.
Xeni Jardin is a boingboing contributor, and tweets prolifically on a wide variety of topics. I have, however, found her a useful retweet relay of occupation news, as well as other nerdy news items.

I'm sure I don't have a comprehensive list here. What am I missing? Let me know in the comments!

September 20, 2011

Netflix, Strategy, and the First Sale Doctrine

When I was a younger man, I used to fancy strategy wargames. I thought I was pretty good at them too, until I played Stea. Stea was a hacker's hacker, the man who first taught me Unix, a person for whom logical forms of abstraction and analysis were as natural as breathing. By the time I started my second turn of our game, I had already lost. There were better moves I could make, and worse moves, but all the moves I could make lead to me losing. That, I learned, was what the art of strategy was: the practice of giving your opponent only losing moves.

To me, that's what Netflix announcing it will be spinning off its DVD-by-mail service looks like, a losing move made by a desperate player. The best analysis I've seen as to why Netflix would take this seemingly counter-intuitive move argues that Netflix is intentionally throwing by mail DVD distribution overboard, ridding itself of the expensive baggage of distribution centers, warehouses and (paging Nicholas Carr) workers to move forward into a future dominated by digital streaming. Discs are dead. Burn the boats.

This logic makes sense, but can Netflix survive on the ground it is moving forward onto? As a distributor of physical discs, Netflix enjoyed the protection of the first sale doctrine, which holds that purchasers of books, video cassettes, DVDs, Blu-ray discs, or other tangible copies of media, have the right to do as they please with that particular copy. The first sale doctrine meant that Netflix was free to rent the same discs sold to consumers, and that publishers couldn't easily stop them from running a rental business without withholding content from the general public. In a sense, Netflix got its start by being a bit of a clever hack, leveraging the first sale doctrine and business reply mail rules to build an innovative and inexpensive way for consumers to access a vast library of video recordings.

In the streaming environment, things are different. Netflix must obtain permission from publishers to stream movies to consumers. If it wants, say access to NBC Universal content, it has to deal with Comcast. Why should a vertically-integrated entity like Comcast allow Netflix to take a piece of the action for streaming content it owns across a network it also owns large pieces of (and which it has already attempted to limit Netflix's access to)? I don't see how that equation works. All Netflix has to bring to the table here is the good will of its customers, good will it hasn't exactly been cultivating of late.

That said, they may retain me, at least, as a customer for a little while longer. The reason? They are keeping the envelopes red. The other thing I learned, all those years ago, watching Stea march his armies toward me across the game board in impeccable order, is that I am a hopeless romantic. I was too busy building beautiful bomber formations to bother with actually winning the game. As long as I can get red envelopes in the mail, I'll probably stick with Netflix (or qwickster, or whatever) until the end of their losing game.

September 11, 2011

Computer Generated News

An article in yesterday's New York Times reports on recent advances in using software to automatically generate sports reporting. The software, created by a firm called Narrative Science, reportedly generates human-like text, and has already had one big operational success:

“Last fall, the Big Ten Network began using Narrative Science for updates of football and basketball games. Those reports helped drive a surge in referrals to the Web site from Google’s search algorithm, which highly ranks new content on popular subjects, Mr. Calderon says.”

The role of Google here cannot be stressed enough. Once again, the preferences of the search engine giant are shaping our contemporary media environment in profound ways, perhaps without much conscious reflection on our part.

My biggest anxiety in cases like this is always the one expressed by Norbert Wiener at the close of his 1947 volume Cybernetics:

“The modern industrial revolution is similarly bound to devalue the human brain at least in its simpler and more routine decisions. Of course, just as the skilled carpenter, the skilled mechanic, the skilled dressmaker have in some degree survived the first industrial revolution, so the skilled scientist and the skilled administrator may survive the second. However, taking the second revolution as accomplished, the average human being of mediocre attainments or less has nothing to sell that is worth anyone’s money to buy.”

September 10, 2011

What should the humanities be?

A while ago I wrote a blog post expressing my frustrations with the available definitions for the collection of disciplines known as “the humanities.” You can read it on TechStyle, the group blog for Georgia Tech's Brittain Fellows, here. I explained how I didn't think defining the humanities in terms of the canon of “literature,” the method of “reading,” or the advancement of “values” could adequately provide a framework for an academic discipline. Today, briefly and humbly, I would like to propose a definition I think could serve as a framework for the humanities.

The definition I propose is quite simple: the humanities are the disciplines concerned with the production, distribution, and interpretation of human readable texts.

I'm borrowing my use of the term human readable from the Creative Commons project. Creative Commons builds on the distinction, likely familiar to digital humanists and computer scientists alike, between machine readable codes, which are designed to be interpreted by a computer, and human readable codes, which are designed to be interpreted by a person. Creative Commons, for example, creates a machine readable version of their licenses, designed to permit search engines to automatically discover works that have been released with particular re-use rights, and a human readable version of their license, designed to permit “ordinary people” to understand the terms of a particular license and what these terms mean. However, Creative Commons further distinguishes the human readable version of the license from the technical legal code of the license itself. This legal code has sometimes been dubbed the “lawyer readable version.” To fully appreciate the difference between human readable and lawyer readable, you can compare the human readable version of the Creative Commons Attribution license to the full legal code of the same license.

My suggestion then, is that the humanities should focus on texts that are human readable in the sense that Creative Commons human readable licenses are intended to be. That is to say, texts that are written to be read by a varied audience, rather than a narrow group of professionals with intensive and explicit training in interpreting these texts. Texts that are meant to serve as contact zones, where a variety of constituencies might negotiate common understandings of shared issues.

I propose that we focus on the human readable, but not that we limit ourselves to it. Clearly the human readable is always deeply interlinked with a wide variety of other actors: legal and machine codes, media technologies, economic entities, human biology. I only suggest that we make the human readable our point of entry. I believe it is an important point of entry. After all, for all of the specialized knowledge produced by our highly technical and segmented culture, we still rely on human readable texts to build political and economic coalitions that span these specialized forms of knowledge. The science of climate change, for example, cannot impact the political and economic processes that shape the human influence on the climate without the production of human readable texts that explain the significance of the science. Furthermore, these texts do not operate in a vacuum, rather their reception is shaped by earlier texts.

So, that is my modest proposal. The humanities as the study of human readable texts. What do people think?

May 12, 2011

Friends Don't Let Friends CC-BY

In the last day or so it seems the old Creative Commons license debate has flared back to bright and vibrant life. Perhaps it is a rite of spring, since a bit of googling reveals that LAST spring Nina Paley and Cory Doctorow embarked on a long debate on this very issue. From where I'm sitting (on my twitter feed) there seems to be an emerging consensus in this spring's license campaign that the simple CC-BY license is the best choice for scholars and artists interested in contributing their work to a vibrant intellectual commons. This point of view seems best summarized by Bethany Nowviskie in her blog post “why, oh why, CC-BY.” In her post, Nowviskie notes that the CC-BY license allows for the broadest possible redistribution of work by clearing all possible restrictions on re-use. This openness, she argues, has several benefits: it gives her work the potential to be “bundled in some form that can support its own production by charging a fee, [and help] humanities publishers to experiment with new ways forward,” it removes the potential that her material could someday become part of the growing body of “orphaned work,” and ensures that she isn't left behind by commercial textbooks (which, she points out “will go on with out me”). She argues that restrictive CC clauses, like the NC clause, represent misguided attempts to retain pride-of-place by original creators unwilling to give over their work to the commons freely. She concludes that “CC-BY is more in line with the practical and ideological goals of the Commons, and the little contribution I want to make to it.”

I understand where Nowviskie is coming from, and I think the generous impulse she is following to make her work free to everyone, without restrictions, is an admirable one. Like her, I believe that individual authors should set their work free, as it were, and not try to exercise control over what becomes of the words, images, or sounds they produce. She is correct in asserting the CC-BY is the license that most perfectly negates the dubious property rights that individual authors are granted by copyright. However, ultimately I think she is wrong about the collective effect of artists and scholars releasing their work under the CC-BY license. I do not believe the CC-BY license does enough to protect and maintain a vibrant intellectual commons.

Here's why. The CC-BY license is, as Nowviskie points out, the Creative Commons license most similar to releasing one's work into the public domain. The problem is, we know what happens to an unprotected public domain in the presence of large, rapacious, commercial interests that have a vested interest in the production of “intellectual property:” it is sucked up, propertized, and spit back out in a form the commons can't use. Lawerence Lessig tells this story forcefully and eloquently in Free Culture, as does James Boyle in his “Second Enclosure Movement.” The example of Disney's transformation of folk and fairy tales is perhaps the clearest. The old stories that Disney based many of it's early movies on were free for anyone to re-imagine, the version's Disney made (which are, for our culture the definitive versions thanks to Disney's ubiquitous publishing reach) are strictly controlled property. The revision of copyright law (shaped in part by Disney's lobbyists) threatens to remove these versions from the public domain forever. There is nothing stopping a textbook publisher from scooping up Nowviskie's work (or the work of any other scholar publishing under CC-BY) and performing the same trick, producing a modified version that would be lost to the commons (and which might be in thousands of classrooms, becoming the definitive version for students). Without protection, the commons becomes fodder for commercial intellectual property producers, who take from it but give nothing back. This exploitation of the commons harms it in several ways: it prevents the re-use of what are often the best known versions of works, it reinforces a system of production that insists on propertizing or otherwise monetizing content to support producers, and it may alienate creators who what to give their work to the commons but feel taken advantage of by commercial uses of their work.

For this reason, I strongly recommend that everyone use either the Share Alike (SA) clause, which forces re-users to release the derivative work under Creative Commons, or Non-Commercial (NC) clause on their CC licensed work. I use both, just to be sure. While some might argue that these clauses should be adopted by those who prefer them and abandoned by those who don't, depending on their personal feelings about the re-use of their work, I hold that the building of the commons is a collective endeavor, and that we must all collectively choose to prevent the enclosure of the new commons we are building together. My work is not very valuable on its own, but combined with the work of all the other contributors to the commons, it forms a body of work worth protecting from those who would take from our community without giving anything back.

PS: This blog is not clearly labelled with the CC-SA-NC license because I am in the middle of a site redesign (I had to push this post out while the debate was hot)… this blog is, however, under CC-SA-NC

PPS: The redesign is also why everything is such a mess! Come back soon for a nicely designed site!

propertized

February 19, 2011

Watson + Capitalism = ???

Earlier this week, a piece of natural language processing software, dubbed Watson, developed by IBM, successfully and decisively defeated two human opponents on the game show Jeopardy. The potential implications of this technology seem immense. Wired reports IBM, “sees a future in which fields like medical diagnosis, business analytics, and tech support are automated by question-answering software like Watson.” One of the humans Watson trounced, former Jeopardy champion Ken Jennings, mused in in the same article, “'Quiz show contestant' may be the first job made redundant by Watson, but I'm sure it won't be the last.”

The question, for me, is what are the larger implications of this emerging automation of intellectual work for our political economy. What happens when we automate vast numbers of service sector jobs? The same jobs that had absorbed (some) of the manufacturing jobs automation had eliminated from the manufacturing sector? Are we on the cusp of a moment, predicted long ago by cybernetics pioneer Norbert Wiener, when “the average human being […] has nothing to sell that's worth anyone's money to buy?”

I find the notion all too plausible. Blame my time spent reading Peter Watts. Emerging media scholar David Parry, always a bigger optimist than me, suggested a skill that may remain the unique domain of human beings during a discussion of Watson's victory on twitter. In response to a half-joking tweet in which fellow academic questioned her own employability in post-Watson world, Dave wrote, “well yeah, that's why we need academics who can do critical thinking, computers aren't so good at that yet.”

Critical thinking is a good thing, and indeed something computers still struggle with. However, under capitalism, meaningful critical thinking, the ability to evaluate arguments, reflect on the big picture situation, enact alternatives to the status quo, is exactly what has been denied the working class. Critical thinking is for capital, the cognitive resources of the working class has been employed in quite a different mode, and one that machines like Watson will likely find all too easy to replicate. This is not to say, of course, that working class people are incapable of critical thought, or that they don't employ critical thinking in their daily lives, only that this thought has not been granted economic value under capitalism.

The question we must ask, then, is what sort of shifts could be made in our political economy to accommodate technologies like Watson, and what sort of are we likely to make? Could we shift our productive mode to value critical thinking by ordinary people? Will we devalue the labor of a vast cross-section of humanity, further destroying the middle class? What tactics or moves might make one shift more likely than the other?

Clearly, I don't know. What do you think?

January 2, 2011

My diss in one sentence

“This suggests that those interested in intervening in Wikipedia, or other peer-production based projects, might be better served by focusing on changing the terms of negotiation between interested parties, rather than technologically empowering individuals.”

October 5, 2010

The Rhetorics of the Information Society – Michael Jones at Future Media Fest

24 hours of video per minute

That's the rate at which digital footage is being uploaded to YouTube, according to Michael Jones' keynote opening keynote presentation at Future Media Fest. Jones, who is Chief Technology Advocate at Google, cited the number as part of his argument that digital communication technology is becoming ever more ubiquitous. Understandably, he saw Google as playing an important role in this ubiquitous information environment.

This image, of thousands of camera-phone eyes feeding days of video into Google as minutes tick by may, for some media theorists, call to mind the image of the Panopticon, the model prison made famous by French philosopher Michel Foucault, in which prisoners arranged in transparent cells at the perimeter had their every move watched by a concealed figure in a central tower. The Panopticon, Foucault explained, was designed to teach prisoners to internalize the values of their guards, because they never knew if a guard was watching, they began to watch themselves.

Is Google a modern day Panopticon, watching over us all, invisibly guiding, Foucault would say “disciplining,” our behavior? Jones didn't think so. He went to great pains to describe Google as a passive entity. “We are your servant,” he said at one point. At another, he claimed, “we don't make decisions, we take what humans do and we amplify it.” As examples, he cited the ways in which Google tried to reflect the needs of its customers. He described how users of Google maps were active participants in the process of drawing the maps that Google served, especially in developing countries. Explaining the motivations of contributors to the Google maps project, Jones said ,”they didn't want me to have their map, they wanted their friends to have their map.” Finally, in response to a questioner who asked how Google could claim that they were a reflection of already existing behavior when values were always embedded in technology, Jones replied that data harvested from users was used to develop the technology itself. For example, he explained that the size of buttons in Google's Gmail webmail service had not been designed by some top-down process of expertise, rather different button sizes had been provided to different users, and the ideal button size had been determined based on data collected on the users' reaction times when using the various buttons.

All this, should, of course, be taken with a grain of salt. Anytime an executive officer of major corporation argues that his company is basically powerless, it suggests the company has become aware of popular anxieties about its power. Certainly, this is true for Google. Jones' claims that Google is passive and reflective also seem to overlook an observation that he made earlier in his presentation, when he noted that, “Henry Ford changed the way cities were designed.” Just as the automobile transformed the American urban landscape, leading to, among other things, the rise of the suburbs, so too, it is difficult to imagine that a technology as powerful as search could fail to transform our patterns of behavior.

That said, however, I think that Jones' apology for Google makes clear important differences between the 19th century technology of the Panopticon and the 21st century technology of search. Unlike the Panopticon, where a human agent stood in the tower and imposed rational, intentional values on the confined prisoners, encouraging them to adopt regimented work habits and abandon dangerous transgressions, nothing human could possibly process the surveillance performed by Google. Just to watch a day's worth of YouTube video would require a three-year effort! Instead, what seems to stand in the center of Google's apparatus of search (to the extent that there is such a thing) is something else entirely, something lashed together out of computer algorithms and pre-conscious thought. Something that adjusts buttons without us noticing and sums together collective contributions to make a map.

This should not be, in and of itself, frightening. The mastery of human consciousness was always a bit of an illusion. However, I do think we may need to do some reflection about who the mechanisms of search benefit, and what larger transformations this shift from intention to algorithm may entail.

July 22, 2010

More Wikipedia Clouds

I put together two more Wikipedia word clouds, in part because I wanted an excuse to work on my Python coding skills, and in part because I enjoy word clouds as an interesting visualization. For these word clouds, I used a Python script to organize the information I scraped from the Wikipedia Zeitgeist page (see prior post for link). The resulting file listed the titles of articles and the number of times each article had been edited for the month(s) it had made the list. By running this file through the Wordle software, I was able to produce a word cloud that displays the titles with their relative sizes determined by the number of edits they had received in a single month.

The image above shows that the Wikipedia article on the Virginia Tech Massacre probably has the largest number of edits in a single month for any one English Wikipedia article, though if you look closely (click through to the larger size on Flickr) you can see some articles, like the one on George W. Bush, represented by many smaller entries in the word cloud. This represents the many months that the George W. Bush article was one of the most edited articles on the English Wikipedia, even though it was never edited nearly as many times in a single month as the Virginia Tech Massacre article.

Here is the same data, with some of the less-edited articles left out. The result is less visually impressive, but a little more legible.

Next, I'll modify my script to count up all the edits and display a cloud showing which titles are the most edited articles on the English Wikipedia ever!

Andy Famiglietti’s Entirely Modest Web-Presence

Posts