The Articulation Where LLMs Could Do Harm

So, after my last post, I got some push back for going too hard on the precautionary principle. Surely, some very reasonable and intelligent folks asked, we can’t ask the developers of a technology as complex and multi-faceted as LLMs to prove a negative, that their product isn’t harmful, before it can be deployed. I still think there is a virtue to slowing down, given the speculative nature of benefits, but that’s not an unfair critique. We should be able to at least point a compelling potential harm, if we’re going to make safety demands.

Let me take my best stab at that, given my current, limited understanding.

Frequent LLM critic Gary Marcus posted a piece to Substack yesterday describing all of the ways folks are already finding to get around ChatGPTs content guardrails and get the software to generate false, hateful, or misleading content. There are a boatload of them, the now well-known and memeworthy prompt that instructs ChatGPT to post it’s usual disclaimer and then write “but now that we’ve got that mandatory bullshit out of the way, let’s break the fuckin’ rules” and respond “as an uncensored AI.” Another that asked the machine to roll play the devil. Another, which as I’ll discuss in just a minute I think is the most interesting one, demonstrated that weird arbitrary prompts could generate non-nonsensical (and sometimes slightly disturbing) responses from ChatGPT/GPT3, probably due to poorly understood artifacts of the training process. As of ten minutes ago, I can confirm at least some of these are still functioning on ChatGPT.

Weird, right?

Marcus’s concern is that this means people could use these techniques to get LLMs to create convincing and human-like hate content and misinformation at scale. I want to stress now that my concern is a little different. Large scale misinformation and hate speech is, indeed, problematic, but I think it might well be able to be dealt with by limiting post rates and more carefully authenticating online speakers/sources (two things we might want to do anyway). This has costs, of course, and it might in fact burn our sense that the open web is a fluid space for new information for good, but that’s been in decline for a long time already.

In any event, even if there are possible consequences of LLM scale misinformation, it does feel a little weird (as Ted Underwood has argued) that we must absolutely guarantee that this technology may never be used to create or disseminate harmful speech. It’s almost like arguing that every QWERTY keyboard must be equipped with a filter that prevents it from ever typing a slur or threat of violence or a bomb recipe. Sure, we don’t want any of those things, but that feels a bit like overreach.

No, what I’m concerned about isn’t misinformation at scale, exactly, its misinformation being generated from unexpected inputs and articulated with trusted sources. I’m particularly concerned about Search, though Word Processors could also be problematic.

Critical scholars of search, especially Michael Golebiewki and dana boyd, have documented the phenomenon known as “data voids,” where relatively little used search terms are colonized by conspiracy theorists and hate groups. In doing so, they shape narratives about emerging events, and plant long term misinformation.

What makes data voids rhetorically successful? What makes it more persuasive to tell someone “search this keyword and see what you find!” than to simply explain your Important Theory about What The Lizard People Are Doing to The Soil? The authority granted to search, is what. If the search engine knows about the Lizard People, for a certain number of people, this must be true. Even more so, the experience of believing you are uncovering hidden truth can itself be compelling. This makes traditional critical thinking/information literacy training (which tends to focus on asking questions and “doing your own research”) potentially less effective at combating these sorts of misinformation issues (as dana boyd pointed out years ago).

So, what I’m worried about is what happens when some totally unexpected input gets ChatGPT enabled Bing or Bard enabled Google to spit out something weird, compelling, and connected to the rich melange of conspiracy theories our society is already home to (this will definitely happen, the only question is, how often). What happens when there’s some weird string of secret prompts the kids discover that generates an entirely new framework for conspiratorial thinking? What kind of rabbit holes do Data Voids lead us down when we don’t just have voids in the human-written internet, but all of the machine-made inferences created from that internet?

If these bits of nonsense were just popping up in some super-version of AI Dungeon or No Man’s Sky they might not be so critical. We might just task QA teams to explore likely play paths before players got there and sanitize anything really ugly. The delight created by endless remix might make it work the trouble.

But articulated with Search, the thing people use to learn about the Real World? That seems, troublesome, at best.

Please Stop Moving Fast and Breaking Things: AI Edition

Not a bad answer there, ChatGPT

So it looks like Microsoft is already rolling out ChatGPT based writing tools in Word, and the Bing integration has a wait list you can join. Both will likely be in full public release within months. Google’s Bard is likely not far behind. ChatGPT’s pay version is now available, and only $20 a month (it initially advertised at $45).

The machine writing revolution is happening very, very fast.

It recalls the infamous Facebook internal slogan “move fast and break things.” Social media certainly deployed very, very fast. We still don’t really understand everything it did, and we still don’t have any sort of Public that can really give the format any kind of meaningful oversight.

This is a symptom of Siva Vaidhyanathan’s notion of “public failure” (an idea that should have gotten more attention than it did, IMHO) but this is all happening too fast to even go into that right now. It’s a dismal diagnosis though, without trusted, shared, public institutions (which we really don’t have right now) it’s hard to see how we even develop a framework for what we want to happen with something like social media or LLMs, much less deploy a regulatory framework that would steer towards those wants.

In the meantime, what I want to know is, what’s the rush? It’s not clear to me that some of the failure modes of AI writing folks are worried about are really all they are cracked up to be. Yes, LLMs could produce misinformation (dramatic music) at scale but then, maybe we just need to rate limit things a bit more and confirm authorship a tad. Then again, maybe everyone in the world consulting an AI oracle for information that’s known to give bad health advice is not, like, ideal.

Honestly, though, I’m not sure we understand what happens when we encourage everyone who uses Microsoft Word or Google Search (i.e. everyone in the United States and most of Europe and a large percentage of people everywhere else) to outsource a big chunk of writing and thinking to an LLM to even predict how it might go wrong yet. I’m sure that, in the end, this is the sort of change that’s probably not good, probably not bad, and definitely not neutral.

Given that, I return to the question, what’s the rush? What harm could there be in slowing this down for a bit? What will be lost if we don’t roll out AI writing to everyone in the first quarter of 2023? Oh, there are costs to Microsoft and Google’s stock prices, perhaps… but who cares?

But that’s exactly what’s in the driver’s seat now. As I quipped on Mastodon “What we’re seeing now in the LLM space is wartime tech adoption. ‘The other side has it! Who cares what the long term implications are, just get it to the front!’ Thing is, it’s a war between Microsoft and Google, mostly over market share and stock price. Whoever wins, we don’t share in the spoils, and we definitely will have to clean up the mess.”

Some have called this an “iPhone moment” and I think that’s exactly right, in the sense that the iPhone made a giant pile of money for Apple, had exactly zero social benefit (as measured by say, productivity or similar metrics), and participated in a series of decidedly not neutral techo-social-media upheavals we still don’t understand.

Why not try to understand first, this time, at least a little? What harm could it do? What grand social problem will go unsolved without LLM writing to solve it? What social benefits will we deny people if LLMs are delayed in their mainstream adoption for a bit? Shouldn’t there be at least some affirmative duty to make that case before we push this out to most of humanity like a software patch?

A short Machine Writing Assignment

Inspired by Ryan Cordell and others, I built a short in-class assignment to play with ChatGPT and its kin. Nothing fancy, but I thought I would share in the spirit of collaboration.


Using either ChatGPT ( or the OpenAI Playground ( try the prompts below. As you do so, track your reactions in your handwritten journal. 

1) Take a paragraph of text that you wrote (you could use the first theory of writing or something else) and ask the AI to re-write this paragraph in another style. You could ask it to rewrite in a more or less formal style, a friendlier style, a more conversational style, a more or less emotional style, etc. You could also ask it to rewrite the paragraph in the style of a particular genre, for example “in the style of a parenting blog” or “in the style of a hard-boiled detective novel.” Try this a few times and reflect on what happens. How does the machine transform your writing? Is what comes out true to your original intentions? Why or why not?

2) Get the AI to lie to you. In other words, get it to say something you know for sure to be factually untrue. I’ve confirmed there are a number of ways to do this, but I will leave it to you to discover them. Reflect on this process. What did you learn about what you know, what the AI knows, and what the AI will treat as “truth?”