Computer Generated News

An article in yesterday's New York Times reports on recent advances in using software to automatically generate sports reporting. The software, created by a firm called Narrative Science, reportedly generates human-like text, and has already had one big operational success:

“Last fall, the Big Ten Network began using Narrative Science for updates of football and basketball games. Those reports helped drive a surge in referrals to the Web site from Google’s search algorithm, which highly ranks new content on popular subjects, Mr. Calderon says.”

The role of Google here cannot be stressed enough. Once again, the preferences of the search engine giant are shaping our contemporary media environment in profound ways, perhaps without much conscious reflection on our part.

My biggest anxiety in cases like this is always the one expressed by Norbert Wiener at the close of his 1947 volume Cybernetics:

“The modern industrial revolution is similarly bound to devalue the human brain at least in its simpler and more routine decisions. Of course, just as the skilled carpenter, the skilled mechanic, the skilled dressmaker have in some degree survived the first industrial revolution, so the skilled scientist and the skilled administrator may survive the second. However, taking the second revolution as accomplished, the average human being of mediocre attainments or less has nothing to sell that is worth anyone’s money to buy.”

What should the humanities be?

A while ago I wrote a blog post expressing my frustrations with the available definitions for the collection of disciplines known as “the humanities.” You can read it on TechStyle, the group blog for Georgia Tech's Brittain Fellows, here. I explained how I didn't think defining the humanities in terms of the canon of “literature,” the method of “reading,” or the advancement of “values” could adequately provide a framework for an academic discipline. Today, briefly and humbly, I would like to propose a definition I think could serve as a framework for the humanities.

The definition I propose is quite simple: the humanities are the disciplines concerned with the production, distribution, and interpretation of human readable texts.

I'm borrowing my use of the term human readable from the Creative Commons project. Creative Commons builds on the distinction, likely familiar to digital humanists and computer scientists alike, between machine readable codes, which are designed to be interpreted by a computer, and human readable codes, which are designed to be interpreted by a person. Creative Commons, for example, creates a machine readable version of their licenses, designed to permit search engines to automatically discover works that have been released with particular re-use rights, and a human readable version of their license, designed to permit “ordinary people” to understand the terms of a particular license and what these terms mean. However, Creative Commons further distinguishes the human readable version of the license from the technical legal code of the license itself. This legal code has sometimes been dubbed the “lawyer readable version.” To fully appreciate the difference between human readable and lawyer readable, you can compare the human readable version of the Creative Commons Attribution license to the full legal code of the same license.

My suggestion then, is that the humanities should focus on texts that are human readable in the sense that Creative Commons human readable licenses are intended to be. That is to say, texts that are written to be read by a varied audience, rather than a narrow group of professionals with intensive and explicit training in interpreting these texts. Texts that are meant to serve as contact zones, where a variety of constituencies might negotiate common understandings of shared issues.

I propose that we focus on the human readable, but not that we limit ourselves to it. Clearly the human readable is always deeply interlinked with a wide variety of other actors: legal and machine codes, media technologies, economic entities, human biology. I only suggest that we make the human readable our point of entry. I believe it is an important point of entry. After all, for all of the specialized knowledge produced by our highly technical and segmented culture, we still rely on human readable texts to build political and economic coalitions that span these specialized forms of knowledge. The science of climate change, for example, cannot impact the political and economic processes that shape the human influence on the climate without the production of human readable texts that explain the significance of the science. Furthermore, these texts do not operate in a vacuum, rather their reception is shaped by earlier texts.

So, that is my modest proposal. The humanities as the study of human readable texts. What do people think?

Friends Don't Let Friends CC-BY

In the last day or so it seems the old Creative Commons license debate has flared back to bright and vibrant life. Perhaps it is a rite of spring, since a bit of googling reveals that LAST spring Nina Paley and Cory Doctorow embarked on a long debate on this very issue. From where I'm sitting (on my twitter feed) there seems to be an emerging consensus in this spring's license campaign that the simple CC-BY license is the best choice for scholars and artists interested in contributing their work to a vibrant intellectual commons. This point of view seems best summarized by Bethany Nowviskie in her blog post “why, oh why, CC-BY.” In her post, Nowviskie notes that the CC-BY license allows for the broadest possible redistribution of work by clearing all possible restrictions on re-use. This openness, she argues, has several benefits: it gives her work the potential to be “bundled in some form that can support its own production by charging a fee, [and help] humanities publishers to experiment with new ways forward,” it removes the potential that her material could someday become part of the growing body of “orphaned work,” and ensures that she isn't left behind by commercial textbooks (which, she points out “will go on with out me”). She argues that restrictive CC clauses, like the NC clause, represent misguided attempts to retain pride-of-place by original creators unwilling to give over their work to the commons freely. She concludes that “CC-BY is more in line with the practical and ideological goals of the Commons, and the little contribution I want to make to it.”

I understand where Nowviskie is coming from, and I think the generous impulse she is following to make her work free to everyone, without restrictions, is an admirable one. Like her, I believe that individual authors should set their work free, as it were, and not try to exercise control over what becomes of the words, images, or sounds they produce. She is correct in asserting the CC-BY is the license that most perfectly negates the dubious property rights that individual authors are granted by copyright. However, ultimately I think she is wrong about the collective effect of artists and scholars releasing their work under the CC-BY license. I do not believe the CC-BY license does enough to protect and maintain a vibrant intellectual commons.

Here's why. The CC-BY license is, as Nowviskie points out, the Creative Commons license most similar to releasing one's work into the public domain. The problem is, we know what happens to an unprotected public domain in the presence of large, rapacious, commercial interests that have a vested interest in the production of “intellectual property:” it is sucked up, propertized, and spit back out in a form the commons can't use. Lawerence Lessig tells this story forcefully and eloquently in Free Culture, as does James Boyle in his “Second Enclosure Movement.” The example of Disney's transformation of folk and fairy tales is perhaps the clearest. The old stories that Disney based many of it's early movies on were free for anyone to re-imagine, the version's Disney made (which are, for our culture the definitive versions thanks to Disney's ubiquitous publishing reach) are strictly controlled property. The revision of copyright law (shaped in part by Disney's lobbyists) threatens to remove these versions from the public domain forever. There is nothing stopping a textbook publisher from scooping up Nowviskie's work (or the work of any other scholar publishing under CC-BY) and performing the same trick, producing a modified version that would be lost to the commons (and which might be in thousands of classrooms, becoming the definitive version for students). Without protection, the commons becomes fodder for commercial intellectual property producers, who take from it but give nothing back. This exploitation of the commons harms it in several ways: it prevents the re-use of what are often the best known versions of works, it reinforces a system of production that insists on propertizing or otherwise monetizing content to support producers, and it may alienate creators who what to give their work to the commons but feel taken advantage of by commercial uses of their work.

For this reason, I strongly recommend that everyone use either the Share Alike (SA) clause, which forces re-users to release the derivative work under Creative Commons, or Non-Commercial (NC) clause on their CC licensed work. I use both, just to be sure. While some might argue that these clauses should be adopted by those who prefer them and abandoned by those who don't, depending on their personal feelings about the re-use of their work, I hold that the building of the commons is a collective endeavor, and that we must all collectively choose to prevent the enclosure of the new commons we are building together. My work is not very valuable on its own, but combined with the work of all the other contributors to the commons, it forms a body of work worth protecting from those who would take from our community without giving anything back.

PS: This blog is not clearly labelled with the CC-SA-NC license because I am in the middle of a site redesign (I had to push this post out while the debate was hot)… this blog is, however, under CC-SA-NC

PPS: The redesign is also why everything is such a mess! Come back soon for a nicely designed site!



Watson + Capitalism = ???

Earlier this week, a piece of natural language processing software, dubbed Watson, developed by IBM, successfully and decisively defeated two human opponents on the game show Jeopardy. The potential implications of this technology seem immense. Wired reports IBM, “sees a future in which fields like medical diagnosis, business analytics, and tech support are automated by question-answering software like Watson.” One of the humans Watson trounced, former Jeopardy champion Ken Jennings, mused in in the same article, “'Quiz show contestant' may be the first job made redundant by Watson, but I'm sure it won't be the last.”

The question, for me, is what are the larger implications of this emerging automation of intellectual work for our political economy. What happens when we automate vast numbers of service sector jobs? The same jobs that had absorbed (some) of the manufacturing jobs automation had eliminated from the manufacturing sector? Are we on the cusp of a moment, predicted long ago by cybernetics pioneer Norbert Wiener, when “the average human being […] has nothing to sell that's worth anyone's money to buy?”

I find the notion all too plausible. Blame my time spent reading Peter Watts. Emerging media scholar David Parry, always a bigger optimist than me, suggested a skill that may remain the unique domain of human beings during a discussion of Watson's victory on twitter. In response to a half-joking tweet in which fellow academic questioned her own employability in post-Watson world, Dave wrote, “well yeah, that's why we need academics who can do critical thinking, computers aren't so good at that yet.”

Critical thinking is a good thing, and indeed something computers still struggle with. However, under capitalism, meaningful critical thinking, the ability to evaluate arguments, reflect on the big picture situation, enact alternatives to the status quo, is exactly what has been denied the working class. Critical thinking is for capital, the cognitive resources of the working class has been employed in quite a different mode, and one that machines like Watson will likely find all too easy to replicate. This is not to say, of course, that working class people are incapable of critical thought, or that they don't employ critical thinking in their daily lives, only that this thought has not been granted economic value under capitalism.

The question we must ask, then, is what sort of shifts could be made in our political economy to accommodate technologies like Watson, and what sort of are we likely to make? Could we shift our productive mode to value critical thinking by ordinary people? Will we devalue the labor of a vast cross-section of humanity, further destroying the middle class? What tactics or moves might make one shift more likely than the other?

Clearly, I don't know. What do you think?

My diss in one sentence

“This suggests that those interested in intervening in Wikipedia, or other peer-production based projects, might be better served by focusing on changing the terms of negotiation between interested parties, rather than technologically empowering individuals.”

The Rhetorics of the Information Society – Michael Jones at Future Media Fest

24 hours of video per minute

That's the rate at which digital footage is being uploaded to YouTube, according to Michael Jones' keynote opening keynote presentation at Future Media Fest. Jones, who is Chief Technology Advocate at Google, cited the number as part of his argument that digital communication technology is becoming ever more ubiquitous. Understandably, he saw Google as playing an important role in this ubiquitous information environment.

This image, of thousands of camera-phone eyes feeding days of video into Google as minutes tick by may, for some media theorists, call to mind the image of the Panopticon, the model prison made famous by French philosopher Michel Foucault, in which prisoners arranged in transparent cells at the perimeter had their every move watched by a concealed figure in a central tower. The Panopticon, Foucault explained, was designed to teach prisoners to internalize the values of their guards, because they never knew if a guard was watching, they began to watch themselves.

Is Google a modern day Panopticon, watching over us all, invisibly guiding, Foucault would say “disciplining,” our behavior? Jones didn't think so. He went to great pains to describe Google as a passive entity. “We are your servant,” he said at one point. At another, he claimed, “we don't make decisions, we take what humans do and we amplify it.” As examples, he cited the ways in which Google tried to reflect the needs of its customers. He described how users of Google maps were active participants in the process of drawing the maps that Google served, especially in developing countries. Explaining the motivations of contributors to the Google maps project, Jones said ,”they didn't want me to have their map, they wanted their friends to have their map.” Finally, in response to a questioner who asked how Google could claim that they were a reflection of already existing behavior when values were always embedded in technology, Jones replied that data harvested from users was used to develop the technology itself. For example, he explained that the size of buttons in Google's Gmail webmail service had not been designed by some top-down process of expertise, rather different button sizes had been provided to different users, and the ideal button size had been determined based on data collected on the users' reaction times when using the various buttons.

All this, should, of course, be taken with a grain of salt. Anytime an executive officer of major corporation argues that his company is basically powerless, it suggests the company has become aware of popular anxieties about its power. Certainly, this is true for Google. Jones' claims that Google is passive and reflective also seem to overlook an observation that he made earlier in his presentation, when he noted that, “Henry Ford changed the way cities were designed.” Just as the automobile transformed the American urban landscape, leading to, among other things, the rise of the suburbs, so too, it is difficult to imagine that a technology as powerful as search could fail to transform our patterns of behavior.

That said, however, I think that Jones' apology for Google makes clear important differences between the 19th century technology of the Panopticon and the 21st century technology of search. Unlike the Panopticon, where a human agent stood in the tower and imposed rational, intentional values on the confined prisoners, encouraging them to adopt regimented work habits and abandon dangerous transgressions, nothing human could possibly process the surveillance performed by Google. Just to watch a day's worth of YouTube video would require a three-year effort! Instead, what seems to stand in the center of Google's apparatus of search (to the extent that there is such a thing) is something else entirely, something lashed together out of computer algorithms and pre-conscious thought. Something that adjusts buttons without us noticing and sums together collective contributions to make a map.

This should not be, in and of itself, frightening. The mastery of human consciousness was always a bit of an illusion. However, I do think we may need to do some reflection about who the mechanisms of search benefit, and what larger transformations this shift from intention to algorithm may entail.

More Wikipedia Clouds

I put together two more Wikipedia word clouds, in part because I wanted an excuse to work on my Python coding skills, and in part because I enjoy word clouds as an interesting visualization. For these word clouds, I used a Python script to organize the information I scraped from the Wikipedia Zeitgeist page (see prior post for link). The resulting file listed the titles of articles and the number of times each article had been edited for the month(s) it had made the list. By running this file through the Wordle software, I was able to produce a word cloud that displays the titles with their relative sizes determined by the number of edits they had received in a single month.

Most edits in one month

The image above shows that the Wikipedia article on the Virginia Tech Massacre probably has the largest number of edits in a single month for any one English Wikipedia article, though if you look closely (click through to the larger size on Flickr) you can see some articles, like the one on George W. Bush, represented by many smaller entries in the word cloud. This represents the many months that the George W. Bush article was one of the most edited articles on the English Wikipedia, even though it was never edited nearly as many times in a single month as the Virginia Tech Massacre article.

Here is the same data, with some of the less-edited articles left out. The result is less visually impressive, but a little more legible.

Most edited articles 2

Next, I'll modify my script to count up all the edits and display a cloud showing which titles are the most edited articles on the English Wikipedia ever!

Wikipedia Zeitgeist

Just for fun, here's a quick and dirty wordcloud built by running the data from the most edited articles on the English Wikipedia (Found here: through the IBM software that powers the website Wordle.

wikipedia zeitgeist

Fun Fact: Wikipedia's Most Edited Pages

Wikipedia maintains a massive archive of statistical data on the project. Among this data is a list of the 50 most edited pages on the English Wikipedia. Of these 50 most edited pages, all but two are pages having to do with project maintenance, such as the page that is used to notify administrators of vandalism.

Only one actual article is listed, the article for George W. Bush. The other non-project maintenance page? The talk page for the article on Barack Obama.

Bitcoin and the Libertarian Individual

In Ellen Ullman, in her excellent memoir Close to the Machine, describes an odd young man she briefly took up with as an on-again off-again lover. Among his many obsessions was the notion of creating a cryptographic currency, a wholly anonymous and independent banking system. Well, it looks like someone has gone and implemented this idea. The BitCoin project “is a peer-to-peer network based digital currency.” It apparently derives its backing from CPU processor cycles. I'm not exactly sure how that works, but the Ron Paul-esque libertarian dreams of the creators are quite clear in their description of the project's advantages: “Be safe from the instability caused by fractional reserve banking and bad policies of central banks.”

I'm pretty sure projects like this get something deeply wrong about the social relationships that money relies on, but I'm not sure exactly what. The individualist mindset that backs all this is suspect, money relies on shared social relationships. However, the bitcoin folks clearly imagine a set of relationships among individuals, in the form of the peer-to-peer network. It is easy to explain why peer-to-peer networks do not describe the world as it currently exists. It is more difficult to explain why attempts to build them seem to inevitably fail.