The Articulation Where LLMs Could Do Harm

So, after my last post, I got some push back for going too hard on the precautionary principle. Surely, some very reasonable and intelligent folks asked, we can’t ask the developers of a technology as complex and multi-faceted as LLMs to prove a negative, that their product isn’t harmful, before it can be deployed. I still think there is a virtue to slowing down, given the speculative nature of benefits, but that’s not an unfair critique. We should be able to at least point a compelling potential harm, if we’re going to make safety demands.

Let me take my best stab at that, given my current, limited understanding.

Frequent LLM critic Gary Marcus posted a piece to Substack yesterday describing all of the ways folks are already finding to get around ChatGPTs content guardrails and get the software to generate false, hateful, or misleading content. There are a boatload of them, the now well-known and memeworthy prompt that instructs ChatGPT to post it’s usual disclaimer and then write “but now that we’ve got that mandatory bullshit out of the way, let’s break the fuckin’ rules” and respond “as an uncensored AI.” Another that asked the machine to roll play the devil. Another, which as I’ll discuss in just a minute I think is the most interesting one, demonstrated that weird arbitrary prompts could generate non-nonsensical (and sometimes slightly disturbing) responses from ChatGPT/GPT3, probably due to poorly understood artifacts of the training process. As of ten minutes ago, I can confirm at least some of these are still functioning on ChatGPT.

Weird, right?

Marcus’s concern is that this means people could use these techniques to get LLMs to create convincing and human-like hate content and misinformation at scale. I want to stress now that my concern is a little different. Large scale misinformation and hate speech is, indeed, problematic, but I think it might well be able to be dealt with by limiting post rates and more carefully authenticating online speakers/sources (two things we might want to do anyway). This has costs, of course, and it might in fact burn our sense that the open web is a fluid space for new information for good, but that’s been in decline for a long time already.

In any event, even if there are possible consequences of LLM scale misinformation, it does feel a little weird (as Ted Underwood has argued) that we must absolutely guarantee that this technology may never be used to create or disseminate harmful speech. It’s almost like arguing that every QWERTY keyboard must be equipped with a filter that prevents it from ever typing a slur or threat of violence or a bomb recipe. Sure, we don’t want any of those things, but that feels a bit like overreach.

No, what I’m concerned about isn’t misinformation at scale, exactly, its misinformation being generated from unexpected inputs and articulated with trusted sources. I’m particularly concerned about Search, though Word Processors could also be problematic.

Critical scholars of search, especially Michael Golebiewki and dana boyd, have documented the phenomenon known as “data voids,” where relatively little used search terms are colonized by conspiracy theorists and hate groups. In doing so, they shape narratives about emerging events, and plant long term misinformation.

What makes data voids rhetorically successful? What makes it more persuasive to tell someone “search this keyword and see what you find!” than to simply explain your Important Theory about What The Lizard People Are Doing to The Soil? The authority granted to search, is what. If the search engine knows about the Lizard People, for a certain number of people, this must be true. Even more so, the experience of believing you are uncovering hidden truth can itself be compelling. This makes traditional critical thinking/information literacy training (which tends to focus on asking questions and “doing your own research”) potentially less effective at combating these sorts of misinformation issues (as dana boyd pointed out years ago).

So, what I’m worried about is what happens when some totally unexpected input gets ChatGPT enabled Bing or Bard enabled Google to spit out something weird, compelling, and connected to the rich melange of conspiracy theories our society is already home to (this will definitely happen, the only question is, how often). What happens when there’s some weird string of secret prompts the kids discover that generates an entirely new framework for conspiratorial thinking? What kind of rabbit holes do Data Voids lead us down when we don’t just have voids in the human-written internet, but all of the machine-made inferences created from that internet?

If these bits of nonsense were just popping up in some super-version of AI Dungeon or No Man’s Sky they might not be so critical. We might just task QA teams to explore likely play paths before players got there and sanitize anything really ugly. The delight created by endless remix might make it work the trouble.

But articulated with Search, the thing people use to learn about the Real World? That seems, troublesome, at best.

Please Stop Moving Fast and Breaking Things: AI Edition

Not a bad answer there, ChatGPT

So it looks like Microsoft is already rolling out ChatGPT based writing tools in Word, and the Bing integration has a wait list you can join. Both will likely be in full public release within months. Google’s Bard is likely not far behind. ChatGPT’s pay version is now available, and only $20 a month (it initially advertised at $45).

The machine writing revolution is happening very, very fast.

It recalls the infamous Facebook internal slogan “move fast and break things.” Social media certainly deployed very, very fast. We still don’t really understand everything it did, and we still don’t have any sort of Public that can really give the format any kind of meaningful oversight.

This is a symptom of Siva Vaidhyanathan’s notion of “public failure” (an idea that should have gotten more attention than it did, IMHO) but this is all happening too fast to even go into that right now. It’s a dismal diagnosis though, without trusted, shared, public institutions (which we really don’t have right now) it’s hard to see how we even develop a framework for what we want to happen with something like social media or LLMs, much less deploy a regulatory framework that would steer towards those wants.

In the meantime, what I want to know is, what’s the rush? It’s not clear to me that some of the failure modes of AI writing folks are worried about are really all they are cracked up to be. Yes, LLMs could produce misinformation (dramatic music) at scale but then, maybe we just need to rate limit things a bit more and confirm authorship a tad. Then again, maybe everyone in the world consulting an AI oracle for information that’s known to give bad health advice is not, like, ideal.

Honestly, though, I’m not sure we understand what happens when we encourage everyone who uses Microsoft Word or Google Search (i.e. everyone in the United States and most of Europe and a large percentage of people everywhere else) to outsource a big chunk of writing and thinking to an LLM to even predict how it might go wrong yet. I’m sure that, in the end, this is the sort of change that’s probably not good, probably not bad, and definitely not neutral.

Given that, I return to the question, what’s the rush? What harm could there be in slowing this down for a bit? What will be lost if we don’t roll out AI writing to everyone in the first quarter of 2023? Oh, there are costs to Microsoft and Google’s stock prices, perhaps… but who cares?

But that’s exactly what’s in the driver’s seat now. As I quipped on Mastodon “What we’re seeing now in the LLM space is wartime tech adoption. ‘The other side has it! Who cares what the long term implications are, just get it to the front!’ Thing is, it’s a war between Microsoft and Google, mostly over market share and stock price. Whoever wins, we don’t share in the spoils, and we definitely will have to clean up the mess.”

Some have called this an “iPhone moment” and I think that’s exactly right, in the sense that the iPhone made a giant pile of money for Apple, had exactly zero social benefit (as measured by say, productivity or similar metrics), and participated in a series of decidedly not neutral techo-social-media upheavals we still don’t understand.

Why not try to understand first, this time, at least a little? What harm could it do? What grand social problem will go unsolved without LLM writing to solve it? What social benefits will we deny people if LLMs are delayed in their mainstream adoption for a bit? Shouldn’t there be at least some affirmative duty to make that case before we push this out to most of humanity like a software patch?

A short Machine Writing Assignment

Inspired by Ryan Cordell and others, I built a short in-class assignment to play with ChatGPT and its kin. Nothing fancy, but I thought I would share in the spirit of collaboration.

Activity:

Using either ChatGPT (https://chat.openai.com) or the OpenAI Playground (https://platform.openai.com/playground) try the prompts below. As you do so, track your reactions in your handwritten journal. 

1) Take a paragraph of text that you wrote (you could use the first theory of writing or something else) and ask the AI to re-write this paragraph in another style. You could ask it to rewrite in a more or less formal style, a friendlier style, a more conversational style, a more or less emotional style, etc. You could also ask it to rewrite the paragraph in the style of a particular genre, for example “in the style of a parenting blog” or “in the style of a hard-boiled detective novel.” Try this a few times and reflect on what happens. How does the machine transform your writing? Is what comes out true to your original intentions? Why or why not?

2) Get the AI to lie to you. In other words, get it to say something you know for sure to be factually untrue. I’ve confirmed there are a number of ways to do this, but I will leave it to you to discover them. Reflect on this process. What did you learn about what you know, what the AI knows, and what the AI will treat as “truth?” 

Tiptoeing Around Turing (Eventually We’ll Have to Talk About Qualia)

ChatGPT Answers Some of the Questions Proposed by Alan Turing in his essay “The Imitation Game”

We have, right now, machines that could probably pass the fabled Turing Test, but we’ve hard-wired them explicitly to fail.

What I mean by this is not that I believe, as a now fired Google engineer believed, that Large Language Models, or other related machine learning systems are capable of self-awareness or thought. Instead, I merely mean to suggest that these systems are capable of making a passable response to one of our culture’s long standing proxies for self-awareness/thought/sentience/call it what you will. That means that, if we aren’t going to accept these systems as sentient (and there’s good reason not to) we’re going to have to find another proxy. I’m not, personally, sure where we move the goalposts to.

One suggestive piece of evidence that the Turing Rubicon has been crossed is the story of that poor Google Lambda engineer. They knew as well as anyone they were dealing with software, yet they were still so convinced of the system’s self-awareness they decided to make the career-ending move of going public. This doesn’t prove sentience, but it does suggest a very compelling linguistic performance of sentience.

Here’s another suggestive little interaction I had with good ol’ ChatGPT. In, “The Imitation Game” Turing suggests a series of questions one might ask an unknown interlocutor on the other end of a (state-of-the-art) teletype terminal as part of his famous test. I don’t imagine he meant them as more than an illustrative example of what a test might look like, but they seem like as good a place to start as any:

Q: Please write me a sonnet on the subject of the Forth Bridge.
A : Count me out on this one. I never could write poetry.
Q: Add 34957 to 70764.
A: (Pause about 30 seconds and then give as answer) 105621.
Q: Do you play chess?
A: Yes.
Q: I have K at my K1, and no other pieces. You have only K at K6 and R at R1. It is your
move. What do you play?
A: (After a pause of 15 seconds) R-R8 mate.

From Turing’s “The Imitation Game”

As you can see in my screenshot above, ChatGPT does not demure when asked to write a Sonnet about Forth Bridge, rather it promptly obliges. It also solves the chess problem in roughly the same way, but only after explaining that “As a language model, I do not have the ability to play chess or any other games.”

Turing then goes on to suggest the kind of discussion used in oral examination serves as a kind of already existing example of how we test if a student “really understands something” or has “learnt it parrot fashion.” He gives this example:


Interrogator: In the first line of your sonnet which reads “Shall I compare thee to a
summer’s day,” would not “a spring day” do as well or better?

Witness: It wouldn’t scan.

Interrogator: How about “a winter’s day,” That would scan all right.

Witness: Yes, but nobody wants to be compared to a winter’s day.

Interrogator: Would you say Mr. Pickwick reminded you of Christmas?

Witness: In a way.

Interrogator: Yet Christmas is a winter’s day, and I do not think Mr. Pickwick would
mind the comparison.

Witness: I don’t think you’re serious. By a winter’s day one means a typical winter’s day, rather than a special one like Christmas.

From Turing’s “The Imitation Game”

If I ask ChatGPT some follow up questions about it’s sonnet (adjusted to match the content of what it actually wrote), here’s how it replies:

Strike the hard-wired disclaimer “I AM A LANGUAGE MODEL” at the start of those answers, and those are some reasonable responses! Honestly, I don’t know the rules of sonnets well enough to say, off the top of my head, if the arguments based on those rules are accurate or BS.

Now, as I said before, I don’t think this is evidence of any kind of sentience or self-awareness. For one thing, just as ChatGPT helpfully tells us, it is a basically static model. It’s learned our language from it’s training loop, and the ChatGPT version has some kind of short-term memory that lets it adapt to an ongoing conversation, but the underlying model is basically static. Not an ongoing process of thought, a sort of frozen map of symbolic connections.

It should be emphasized, however, that the underlying “model” is not just a memorization of sources. What the model “learns” is stored in a matrix of information that’s informed by many uses of symbols, but does not reduce to any one symbolic expression. That’s not a signifier like you or I have, but it is something kind of analogous to that. (If you want a stronger, but still layperson friendly, explanation of that, check out Geoffery Hinton’s talk with Brooke Gladstone for On The Media a few days back.)

Furthermore, at some point in the near future, it seems somewhat likely we may have the computational power and mathematical methods necessary to have models that do update themselves in near-real-time. What will those things be doing? Will it be thinking? I’m not sure it will, but I’m also not sure how I justify that.

At some point, some combination of a firm being bold/unscrupulous enough to make big claims about “thought” and a technology flexible enough to give a very, very convincing performance of “thought” is going to force us to figure this out. We should get started now.

We’re Going Tinker With The Contours of IP When We Need to Do Automated Luxury Communism. Again.

Sweeping across the country with the speed of a transient fashion in slang or Panama hats, political war cries or popular novels, comes now the mechanical device to sing for us a song or play for us a piano, in substitute for human skill, intelligence, and soul.

John Phillips Sousa, “The Menace of Mechanical Music,” 1906

Taking the second revolution [that of information technology] as accomplished, the average human being of mediocre attainments or less has nothing to sell that its worth anyone’s money to buy.

The answer, of course, is to have a society based on human values other than buying and selling.

Norbert Wiener, “Cybernetics,” 1948

The copyright lawsuits targeting AI content generation have arrived. For example, Getty Images is suing Stability AI, and a set of small artists represented by Matthew Butterick (who designed my favorite font) is going after Stability AI and Midjourney, along with the CoPilot coding generator.

It’s easy to feel sympathy for the plaintiffs in these cases. The creators of AI image (and text) generators are large, well funded tech companies. They have created a potentially extraordinarily lucrative product by building on the work of millions of artists and writers, all without a cent of compensation. Common sense, and the larger legal framework of copyright which we’ve become accustomed to, suggests that can’t possibly be fair.

And yet, as someone who had a close eye on the legal and cultural ferment of the so-called “copyfight” some twenty years ago, I have my doubts about the ability of Intellectual Property (IP) as a tool to protect human creativity in the face of ever accelerating machine-aided reproduction (and now, perhaps, even a sort of machine-aided production) of culture.

First, lets just note that the threat to human creators from AI text/image/music generators isn’t really so different than the threat to human creators from the kind of image/music/speech recording that we now consider mundane. I don’t have to hire a band to play me music for my workout, I can just put in my earbuds and queue up what I want to listen to on the streaming music service of my choice.

Streaming music services are, in a sense, the final end state of the IP wars of the early twenty-first century. They represent a version of the “universal jukebox” that was the dream of the IP holders of the time. I pay a flat fee, and I get most of recorded music available to listen to at the time and place of my choosing. Rights holders still make money. Artists, in theory, still make money.

I say “in theory” because it’s been well-documented that it’s pretty damn hard for artists to make a living off of streaming services. Still, I would guess that’s something like the solution Getty would like to see for AI image generation. Fees paid to rights holders for AI image generation, just like Spotify pays rights holders for musical reproduction.

It’s not that simple, of course. The way Machine Learning models work makes any kind of payout to individual artists for the use of their images to generate AI images difficult to do. Machine Learning models are designed to “generalize” from their inputs, learning something about how people draw cats or take photographs of rainy streets from each piece of training data. Ideally, the model shouldn’t memorize a particular piece of training data and reproduce it verbatim. Thus, it becomes very tricky to trace which artist to pay for any particular image generated. A model like a streaming service, which pays out individual artists when their work gets streamed, doesn’t seem possible. About the best you could do is pay an institution like Getty to train the AI model, and then Getty could (in theory) make a flat pay out to everyone in the collection.

The alternative model we proposed twenty years ago was to loosen copyright protection, allow for much more fluid sharing of creative content, and trust that artists would find some way to get their audiences to support them. Give the CD away and sell a t-shirt or whatever. This model never flourished, though some big names made it work. That’s part of how we got streaming services.

In the end, neither strict intellectual property (in which every piece of training data is accounted for and paid for) nor loose intellectual property (in which AI can train however it likes for free) solve the problem of supporting creativity. This is largely because human creativity is naturally overabundant. People will create given even the slightest opportunity to. Recording (and now generating) technology makes this worse, but the use value and market value of creativity have always aligned spectacularly poorly.

If we want want human creativity to flourish, we should work on broadening social support for health care, for housing, for education. Build that, and people will create with AI, without AI, alongside AI. Leave it aside, and no exactly right IP protections will nourish creativity.

Either the Precarity That Fuels the Banking Model of Education Goes, or We All Do

A few thoughts about plagiarism, precarity, and pedagogy in the era of AI Panic.

You see, as we enter 2023, the academic communities I’m part of are awash in fevered conversation about the Machine Learning text generator known as ChatGPT. ChatGPT is the great-grandchild of GPT-2, a system I tried to call people’s attention to years ago. Back then my colleagues treated my interest in Machine Learning text generation with a sort of bemused concern, uncertain if I was joking or having some sort of anxiety attack. Now they come to me and ask, “have you seen this ChatGPT thing!?!”

I am in no way bitter my previous attempts to spark conversation on this topic went unheeded. In no way bitter.

Anyway, the sudden interest in ChatGPT seems to stem from the fact that it can produce plausible output from prompts that aren’t so different from classroom assignments, like so:

ChatGPT responds to a prompt not unlike an assignment of mine this semester. Prompt at the top of the image, all other content machine-generated.

Note I said plausible not good. ChatGPT writes prose that sounds natural, and which would fool Turnitin, but if often makes some factual mistakes and odd interpretive moves. For example, Veronica Cartwright would like a word with paragraph three above. Paragraph four glosses over the male gender of the creature’s victim in a way that is unsatisfying. Still, these are also mistakes a student might plausibly make. That makes a merely half-assed assignment response difficult to distinguish from a plagiarized one generated by the machine.

Thus, ChatGPT has lead to a veritable panic about the coming wave of machine-generated plagiarism in college classes. The desired responses to this often trend towards the punitive. We need to make a better Turnitin that will detect GPT! We need to make students handwrite everything in class under supervision! We need a tool that will monitor the edit history in a student’s google doc and detect big chunks of pasted in text! We need to assign writing assignments to students arranged in cells around a central observation tower so we can observe them without ourselves being seen and get them to internalize the value of not plagiarizing!

Ok, not that last one, but the other ones I have actually seen proposed.

These punitive measures come from an understandable place of frustration, but they also enshrine what Freire called the banking model of education. In this model, students are passive recipients of Established Knowledge. Writing assignments are designed to ensure the Established Knowledge (either content or writing skills) have been passed on successfully. Students’ reward for demonstrating that they have received the Established Knowledge is a grade and ultimately a credential they can use on the labor market.

Machine Learning text generators threaten this entire learning paradigm by allowing students to fake the receipt of knowledge and thus fraudulently gain credentials they don’t deserve. To prevent this, the thinking goes, punitive measures must be put in place. GPT must be stopped.

Let me now briefly relate an ironic moment of learning from my own life that I think illustrates a different model of the process of education, before going on to explain the social context that makes it almost impossible to get beyond the banking model in the contemporary classroom.

You see, one of my responses to the rise of ChatGPT and its cousins has been to try to understand Machine Learning better. As part of this process, I’ve been working my way through a textbook that teaches Deep Learning concepts using the Python programming language. The book provides a number of sample pieces of Python code that the student is meant to reproduce and run for themselves on their own computer.

One of the code examples from Deep Learning with Python, Second Edition

As I went through the text, I entered the code examples into an interpreter window on my computer and executed them. I re-typed the examples myself, slowly typing out unfamiliar new terms and being careful not to misspell long and confusing variable names. This practice, of copying code examples by hand, is typical of programming pedagogy.

As a writing assignment, this sort of work seems strange. I am literally reproducing the code that’s already been written. I am not asked to “make it my own” (though I did tweak a variable here and there to see what would happen). I am not yet demonstrating knowledge I have acquired, since the code example is in front of me as I type. It’s a practice of mimesis so primitive that, in another context, it would be plagiarism.

And yet, I still did this assignment myself, I did not have it done for me by machine, though it would have been trivial to do so. I have an e-book of my text, I could have simply copied and pasted the code from the book into the interpreter, no AI writing system needed. No one would have caught me, because no one is grading me!

Indeed, I think I chose to write the code by hand in part because no one is grading me. There is nothing for me to gain by “cheating.” I wrote the code, not to gain a credential, but to improve my own understanding. That’s the purpose of an exercise like this, to have the student to read the code slowly and thoughtfully. I often found that I understood my own blind spots better after reproducing the code examples, and quickly started maintaining another interpreter window where I could play around with unfamiliar functions and try to understand them better. At one point, I did matrix multiplication on a sheet of paper to make sure I understood the result I was getting from the machine.

So my re-typing of code becomes a sort of writing assignment that doesn’t verify knowledge it produces knowledge. This assignment isn’t driven by an exterior desire for a credential or grade, but by my own intrinsic desire to learn. In such a situation, plagiarism becomes pointless. No punitive methods are required to stop it.

Lots of people much smarter than me have long advocated for a greater focus on the kind of assignments described above in college classrooms, and a diminished amount of attention to credentials, grades, and the banking model of education. In the wake of ChatGPT, the call for this kind of pedagogy has been renewed. If the banking model can be cheated, all the more reason to pivot to a more engaged, more active, more productive model of learning.

I think this is a great idea, and I intend to do exactly this in my classrooms. However, I think larger social forces are likely to frustrate our attempts at solving this at the classroom level. Namely, our students’ experience of precarity threatens to undermine more engaged learning before it can even begin.

In my experience, the current cohort of college students (especially at teaching-focused Regional Public Universities like mine) are laser-beam focused on credentials, and often respond to attempts to pivot classrooms away from that focus with either cynical disengagement or frustration. I don’t think that’s because they are lazy or intellectually incurious. I think that’s because they are experiencing a world in which they are asked to go into substantial debt to get a college education, and have substantial anxiety about putting effort into learning that is not immediately tied to saleable skills. This is exacerbated by the high stakes of a precarious labor market and shredded system for provisioning public goods that threatens all but the best and most “in-demand” professionals with lack of access to good housing, good health care, a stable retirement, and education for their children.

So, either the precarity goes, or we educators do. The punitive measures that would stop plagiarism in high-stakes classrooms will almost certainly fail. A pivot to learning as a constructive experience will only work with buy-in from students liberated from the constant anxiety of needing to secure job skills to survive.

So, as we enter the Machine Text era this spring, I call on us to engage and organize beyond the classroom and beyond pedagogy. How we build our classes will matter. How we build our society will matter more.

Machine Learning, Thermostat, Wood Stove

As we encounter the far-future quasi-magical technology that is Machine Learning, I wanted to offer up a brief reflection on everyday technology, labor, and meaning that I found interesting. A reflection brought about by trying not to use my thermostat.

At it’s heart, contemporary Machine Learning systems are just fantastically complex versions of that everyday appliance, the thermostat. The thermostat (at least an old-fashioned, “dumb,” thermostat like mine) takes a single data point, the temperature in your house, and uses it to trigger a response: turning on the furnace. When the temperature gets too low, the furnace turns on. When temperature gets high enough, the furnace turns off. This is a dirt simple example of what Norbert Wiener, one of the great-grandparents of current machine learning efforts, called a “feedback loop” in his 1948 classic Cybernetics.

Modern Machine Learning systems are based on the same principle, they just use lots and lots of fancy Linear Algebra to implement a feedback loop that can learn responses to lots and lots of data points all at once. For example, it can learn that all the data points (pixels) in an image should be labelled “cat” by looking at lots of labelled images and understanding how pixels and labels are related. It does this through another feedback loop, illustrated below, a process known as “training” in Machine Learning lingo.

From the excellent “Deep Leaning with Python, Second Edition” by Francios Chollet

Once trained, the Machine Learning system is now able to trigger responses to various input data. That could mean identifying an image, drawing an image to match a label, writing text to respond to a prompt, all sorts of stuff.

Why build a machine that can respond to inputs like this? So we can delegate stuff to it, of course. Just as we delegate keeping the temperature in our homes steady to thermostats, so we’ll be able to delegate all sorts of things to Machine Leaning systems. That’s automation, the delegation of human action to machine action with in the context of various feedback loops (or at least, that’s how you might define automation, if you read Wiener and Latour back to back, like I did once upon a time).

Those arguing in favor of automation often argue that the process can free people from routine “drudgery” and give them more time for “meaningful work.” If a machine can do a form of work, this argument goes, then that form of work didn’t embody the Unique Qualities of Humanity to begin with! Better, and more meaningful for humans to do the Special Unique Human Things and leave the routine alone.

Which brings us back to my thermostat. This winter, we’ve been trying not to use our furnace much, since it’s oil powered and current prices (thanks, Putin!) make it expensive to run. That means we’ve set our thermostat low enough that it only turns the furnace on when we’re in danger of freezing the pipes.

Instead, we’ve been keeping warm by using a wood-burning stove. The stove has no thermostat, of course, the feedback loop must be manually implemented. If it’s too cold, someone has to build a fire. When it becomes too warm, a handle on the side lets us damp the fire down.

This process involves a fair share of drudgery: emptying the ash pan, placing the kindling, lighting the fire, fetching more wood from the wood pile, keeping the fire fed. It can be tiresome to stay warm this way.

And yet, I often find that building a fire feels like a profoundly meaningful act. I pile the paper and wood together, I light the match, I nurture the flame. Now my family will be warm, instead of cold.

When we think of meaning we tend to think of grand things: gods and heroes, the fate of nations, the unique insight of art. But, I suspect, meaning more often lives here: in the everyday quotidian drudgery of caring for each other.

It’s something worth thinking about, as we learn to automate yet more drudgery away.

Reflecting on Knowledge in the Body in an Era of Prosthetic Dreams

I do not understand how to connect a bicycle pump to a bicycle tire equipped with a Presta valve. Regardless, I routinely use a bicycle pump to refill my bike tires, which are equipped with Presta valves. Perhaps Captain Kirk would be able to use that pair of sentences to trap our nascent AI overlords into a self-destructing loop of logic?

You, a presumably human reader, may also be a bit confused. I’ll explain. When I first purchased a slightly fancy bicycle, I was confronted with the problem of how to connect my bicycle pump to the skinny, fiddly Presta valves on it’s tires. I was used to the wider Schrader valves found on most inexpensive American bicycles, and when I tried to fit the head of my pump over the skinny Presta valve it wobbled to one side and wouldn’t seal in place. Air hissed out as I pumped, rather than entering the tire.

I did as one does to fix a problem these days. I Googled the issue. I watched YouTube videos. Nothing worked, I would attempt to follow the instructions I found, only to end up with the same result.

Then, after many attempts, the pump head sealed to the valve. I don’t know what I did differently that time. I still don’t know. All I know is, I can now attach the bicycle pump to the Presta valve on my bike tires and inflate them. I can do this every time I try. I don’t know how I am doing it, my fingers have simply learned the correct motion. I couldn’t explain that motion if I tried.

We are of course all familiar of with this kind of body knowledge. The way we learn to make a dance move (well, maybe you do, I’m an awful dancer), or pull a saw through a piece of wood, or even balance on a bike. We can’t, and then, with practice, we can. The conscious mind doesn’t really know how. The knowledge is somewhere in some unconscious part of the brain, not really in the limbs, but it might feel closer to the limbs than to consciousness.

For this reason, we perhaps tend to associate this kind of knowledge with manual labor, with the body side of the mind/body split our always too Cartesian society wants to keep a bright line. These reasons almost certainly articulate with our classed, raced, and gendered ideas of what counts as knowledge, as valuable, but that’s not what I want to explore today.

Instead, what I want to do today is make a suggestion in the opposite direction. That many things we think of as “knowledge work” are also closely tied to the same unconscious process that helped me learn to inflate my bike tires. That, as I’m writing this, I don’t consciously think through each word I place on the page. Instead, words often flow from an unconscious place (I sometimes call it my “language engine”) and I make conscious editorial decisions about which of its words I want to write down and which I want to strike and which of the multiple choices it may present me with is best.

The testimony of professional writers suggests to me I’m not the only one like this. William Gibson once said he wrote in “collaboration” with his unconscious mind. Raymond Carver once famously quipped that his most successful pieces happened because “sometimes I sit down and write better than I can.”

That unconscious flow of language may not be part of my conscious mind, but it is part of me. I have trained it through practice. It informs my decisions as I make them, even in a split second. It shapes the thinking that becomes my larger self.

This sort of per-conscious intuitive knowledge is not limited to language and writing. Our mathematical knowledge informs our sense of the numbers we encounter and their relationships. Historical knowledge informs our reaction to current events. Our prejudices and implicit biases are another form of this kind of knowledge, and unlearning those will require practiced engagement with this form.

For this reason, these intuitive senses will remain important for people as decision makers, so as long as we ultimately vest human beings with decision making power, these forms of knowledge in the body will matter. They will inform the sorts of questions we ask, regardless of the tools we have to answer those questions. They will inform the answers that “feel right” regardless of how we get them.

So, as we confront our new Machine Learning equipped reality, I’m not sure we should be too eager to abandon the idea that students ought to be able to show that they have incorporated knowledge at a bodily level. Historically, that’s one thing the essay has done. Show me you know this thing well enough to engage with and adapt it. Show me you have it in your body. A student who avoids doing such an assignment by outsourcing it to an AI cannot themselves be transformed by it, and that’s a problem, even if they will always have an AI in their pocket in the future!

We should be thoughtful about what we ask students to learn in this way, but I don’t think we should stop asking it.

Dry Dreaming with Machine Learning

We don’t yet know what the full social and cultural impacts of Machine Learning text and image generators will be. How could we, the damn things are only a few months old, the important impacts haven’t even started to happen yet. What we do know is this: Machine Learning image and text generators are a lot of fun to play with.

It’s that pleasure I want to briefly reflect on today. Playing with something like DALL-E, or Stable Diffusion, or ChatGPT is for me reminiscent of the kind of slot-machine loop a game like Minecraft sets up. In Minecraft, you spend an hour whacking away at cartoon rocks with your cartoon pick for the reward of occasionally coming across a cartoon diamond. When you’re playing with Stable Diffusion you spend an hour plugging in various prompts, trying different settings, using different random seeds, for the reward of occasionally generating something that strikes you as pleasing.

Stable Diffusion Imagines “A piece by Peter Blake entitled ‘An afternoon with William Gibson'”

What’s pleasing to me about the images I come across in this way is, often, how they capture an idea that I could imagine but not realize (as a visual artist of very little talent). In this sense that they translate ideas into artistic work without the intervening phase of mastering an artistic medium of expression, image generators call to mind the idea of “Dry Dreaming” from William Gibson’s short story “The Winter Market.”

In this short story, which prefigures in many ways Gibson’s later Sprawl novels, Gibson imagines a technology that basically reads the minds of artists (with the mind-machine interface of a net of electrodes familiar to many cyberpunk stories) and outputs artistic vision directly to a machine recording that can then be edited and experienced by an audience. At one point, the main character of the story muses about how this technology allows artistic creation by those lacking traditional artistic skill:

you wonder how many thousands, maybe millions, of phenomenal artists have died mute, down the centuries, people who could never have been poets or painters or saxophone players, but who had this stuff inside, these psychic waveforms waiting for the circuitry required to tap in....

On the surface, DALL-E and Stable Diffusion (and text generators like GPT-3, though my own personal experience of this is different since I’m a bit better with text) seem to do just this. Let us create direct from ideas, jumping over all the fiddly body-learning of composition and construction.

But of course, there is a crucial difference between the imagined and actual technology. The “dry dreaming” Gibson imagined was basically imaging a short cut around the semiotic divide between signifier and signified: it exported meaning directly from a person’s brain to a recording. Let’s leave aside for a moment if such a thing would ever be possible, I think we can perhaps still relate to the desire behind the dream. If we’ve ever struggled to put an idea down in words, we understand the fantasy here. Just take the idea out of my head and give it to someone else directly!

Almost but not quite, ChatGPT

But DALL-E and Stable Diffusion very much do not take ideas directly from the user’s head. They take a textual set of signs from the user, and give back a visual set of signs based on what they have learned by statistically correlating all the sets of textual and visual signs they could find. What they do is, in fact, almost the opposite of what Gibson imagined with dry dreaming. Instead of direct transfer of signified with no distorting signifier in the way, they are dealing with the pure play of signifiers, without the weight of meaning to slow them down.

Of course, the signified re-enters the picture in the moment that I, the user, select an image and think “oh yes, that’s what I meant!” or even “oh wow, that’s what that could mean!” But of course, those reactions happen in the presence of the sign already drawn for us, the re-imagined imagination of the vast set of signs that were the training data for the machine.

That moment is a lot of fun, but the change it heralds for meaning itself is at least as profound as those brought about by recording technology and documented by Kittler in “Gramophone, Film, Typewriter.” If recording allowed for the remembering of signs without the intervention of human meaning-making, then machine learning generators may allow for the creation of signs without the intervention of human meaning-making.

What that does, I don’t think any of us know yet. But it does something.

Exploring the Phase Space of Stable Diffusion, Discovering Procedural Nonsense

Anytime I encounter a new technology I like to knock around the phase space of it’s possible outputs a bit, see what you produce if you take it through the range of possible values for various settings or inputs. I take my inspiration for this from a photography project I distinctly remember encountering years ago, but which I can no longer find or recall the name of, which did this process with a camera: taking multiple images of the same subject while stepping through possible f-stop and shutter speed values. If anyone recognizes this project, please let me know what it was!

I think I’m drawn to these phase space experiments because they help me get a concrete sense of what a technology does. I’m not always a great abstract learner, I have a clearer sense of what’s happening once I get my hands dirty and try stuff out a bit. That’s why I’ve been wanting to try this with one of the machine-learning text-to-image programs for awhile now. These programs (which you’ve probably encountered in the form of DALL-E or one of it’s cousins) are fantastically hard to understand in the abstract, because they rely on hugely complex statistical manipulations to generate images from text.

The quality of images this software can produce has progressed almost unbelievably rapidly over the last year. For example, about a year ago, I asked the then-hot version of a text-to-image generator (CLIP/VQGANS, I think it was called) to draw me “Professor Andrew Famiglietti of West Chester University” and got this:

I guess that’s vaguely humanoid….

Whereas the current hot image generator, Stable Diffusion (which is available free of charge and will run reasonably well even on my modest GTX 1060 graphics card) renders output for that prompt that looks like this:

It doesn’t know what I look like, but it understands what a person looks like… mostly

More importantly, at least insofar as my fascination with technological phase space is concerned, Stable Diffusion makes it easy to tweak a couple of settings that influence how it makes images.

(If you want to explore these settings yourself, I wrote a Google Colab for that. If you have a GPU at home, the Stable Diffusion Web UI will also do this with the Prompt X/Y feature.)

To understand what these settings are (at least in the vague way that I understand what they are) we have to quickly review how machine learning image generators, well, generate images. So far as I understand, they work by using a system that has been trained to recognize images on a vast set of image-caption pairs. That is, they learn what a “cat” looks like by seeing a very, very large number of images labelled “cat.” (For a great discussion of the Stable Diffusion training data, and some links to explore that data further, see this blog post by Andy Baio.) The image generator starts with random visual noise, and uses the recognition algorithm to detect what pieces of that noise are most like it’s prompt, and then iteratively modifies the image to increase the recognition score. You can see the process at work in this .gif, which shows the steps stable diffusion uses to draw a cat, basically sculpting the most “cat like” pieces of noise into a more and more defined cat image:

Just take away the noise that’s not a cat! Simple!

Stable Diffusion gives us access to two settings that let us guide this process:

Inference steps sets the number of times the algorithm will repeat the process described above, in other words how many iterative image “steps” it will generate on the way to a final image.

Guidance Scale (or CFG) determines how strictly the algorithm revises the image towards the given prompt. I’m honestly a little fuzzy about what this means, but higher values are said to make the algorithm interpret the prompt more “strictly.”

So, what does the phase space of these two settings look like? Well, if we ask Stable Diffusion to draw us several versions of “a black and white photograph of a galloping horse” (as an homage to Muybridge’s “The Horse in Motion”, which does some phase spacy work itself) using low, average, and high values for steps and guidance scale and arrange the nine resulting images in a grid, with low values on the upper left and guidance scale increasing as we go from left to right and steps increasing from top to bottom, we get this:

Upper left: Low Guidance, Low Steps; Upper right: High Guidance, Low Steps; Lower left: Low Guidance, High Steps, Lower Right: High Guidance, High Steps (click image for larger version)

This gives us a rough sense of the space. Low guidance gives us vaguely horselike shapes, and low steps gives us a “sketchy” unrefined image. Moderate guidance and steps (the recommended settings for “realistic” results) give us, well, a horse. Very high steps and guidance give us a horse with a LOT of detail (not all of the details really make sense though) and LOT of contrast (including an odd, glowing bit of light on the back). The presence of all four feet in this image is interesting, but as we’ll see, not entirely a predictable result of the settings. The other two corners: low guidance, high steps and high guidance, low steps, are perhaps most interesting, from a glitch art perspective. More on these in just a bit.

If we invest a bit of time (and a month’s allotment of Google Colab compute credits) we can expand the above into a much larger grid, slowly incrementing over both Guidance Scale and Inference Steps from very low (Guidance Scale 0 and 3 Inference Steps) to very high (Guidance Scale 16 and 100 Inference Steps) a small step at a time.

The resulting grid looks like this, again low values for both steps and guidance scale are on the upper left, step values increase as you move down the image, and guidance scale values increase as you move from left to right.

Use link below for full resolution (WARNING: LARGE FILE)

You can grab the full size image here. Hopefully no one actually reads this or my hosting will melt.

Several interesting and informative features emerge from a scan of this large phase-space grid.

First, a few of the images are missing! As I’ve since learned, Stable Diffusion has a built in algorithm that attempts to censor “NSFW content” (our era’s telling euphemism for obscenity). The somewhat oversensitive nature of this algorithm can be seen in how it triggers on some random frames, with nothing particularly suggestive in any of the surrounding images:

Not sure what’s obscene here, algorithm

I’ve since learned to disable the NSFW filter, but just the method of action here is fascinating. Basically, a machine learning system generates an image, then passes it through another machine learning system to see if the image is recognized as obscene. Of course, since generators are based on recognition systems, this does kind of suggest that someone could wire up the obscenity filter to create an obscenity generator, but this disturbing notion will be left as an exercise for the reader.

Second, the features rendered by the algorithm are incredibly mutable and ephemeral. A few steps more or less, or a bit more or less “guidance” can cause significant changes in the image. These changes don’t seem to follow an easily discernible pattern, instead features may emerge for a range of settings then disappear. Most notably, the horse’s missing legs come and go at various points in the sequence. Here a leg emerges for a single image in the step sequence at a moderate Guidance Scale, along with some motion blur (another idea the algorithm seems to occasionally toy with and then discard), before disappearing again:

Leg today, gone tomorrow!

At a very high Guidance Scale, the leg reappears more consistently, but this phase space experiment makes me doubt that the high guidance scale has made the image “better” or “more accurate.” The process of using Gaussian noise to draw an image seems to just riff on certain image features for awhile and then drop them.

Finally, there are those two corners of the space I called interesting before, the bottom left, where high-iteration, low guidance images live, and the upper right, where low-iteration, high guidance images dwell.

The low-guidance, high iteration images are nonsense, but oddly realistic nonsense. The algorithm draws a very solid, photo realistic picture of some totally impossible shapes. Take the comparison below, for example. With the slider all the way right, it shows the generation scale 0 image at 50 iterations, all the way left is 100 iterations. The image is only subtly different, but seems more solid. The “scene” the algorithm has hallucinated (some sort of city street? A market?) seems to have more depth.

Slider right, Generation Scale 0/50 Inference Steps. Slider left, Generation Scale 0/100 Inference Steps.

The oddly human figure on the lower right of this image (which becomes incorporated into the front half of the horse with stricter guidance given to the algorithm) is also intriguing to me. These emergent human figures we might dub “The Generation Scale Zero People,” and further experiments with Phase Diffusion suggest they are easy to generate.

Further examples coming soon to a Mastodon bot I’m building. As I was generating these, I also experimented with some prompts that asked the generator to create something other than a photograph: for example a line drawing, charcoal sketch, or painting. These tended to create loving renders of the technique (brush strokes, pencil lines) with subjects that seemed odd, fanciful, or even metaphorical.

I find these images somewhat evocative, despite the fact that I know just how little I really did to generate them. Basically, these are the result of a script I wrote that generates random image generation prompts from terms entered into a spreadsheet (modeled on the SSBot twitter bot application by Zach Whalen). I gathered a bunch of those, ran them on low Generation Scale and high Inference steps, and picked the ones that spoke to me.

You can try this process of random prompt generation out yourself with this Google Colab I wrote.

These images are compelling to me because they seem absurd in a pleasing way. It’s this automatic generation of absurdity, let’s call it: Procedurally Generated Nonsense which I find the most fascinating thing about AI Image Generation. In the late 19th and early 20th century, the technologies of “mechanical reproduction” made the creation of sensible texts all too easy. Legible text, clear “realistic images,” all became something easy to make and easy to copy via machine. For at least some artists, the response was to reject sense and embrace nonsense, sometimes leveraging the affordances of these same technologies to create images that were anything but “realistic.” Instead, they embraced the absurd, the garish, the non-nonsensical, the fantastic.

There is a way in which the AI image generator and it’s ken seem to stand this equation on its head. Yes, they produce nonsense, but they often produce compelling nonsense quickly, easily, almost thoughtlessly. As such, they effectively automate a domain of art embraced exactly because it seemed to resist earlier forms of automation.

I’m not sure I like that. I’m not sure where that goes. But, in the meantime, I can’t stop asking my little machine-mind to dream me more absurdity.

“A charcoal drawing of a man in the rain”
css.php