We don’t yet know what the full social and cultural impacts of Machine Learning text and image generators will be. How could we, the damn things are only a few months old, the important impacts haven’t even started to happen yet. What we do know is this: Machine Learning image and text generators are a lot of fun to play with.
It’s that pleasure I want to briefly reflect on today. Playing with something like DALL-E, or Stable Diffusion, or ChatGPT is for me reminiscent of the kind of slot-machine loop a game like Minecraft sets up. In Minecraft, you spend an hour whacking away at cartoon rocks with your cartoon pick for the reward of occasionally coming across a cartoon diamond. When you’re playing with Stable Diffusion you spend an hour plugging in various prompts, trying different settings, using different random seeds, for the reward of occasionally generating something that strikes you as pleasing.
What’s pleasing to me about the images I come across in this way is, often, how they capture an idea that I could imagine but not realize (as a visual artist of very little talent). In this sense that they translate ideas into artistic work without the intervening phase of mastering an artistic medium of expression, image generators call to mind the idea of “Dry Dreaming” from William Gibson’s short story “The Winter Market.”
In this short story, which prefigures in many ways Gibson’s later Sprawl novels, Gibson imagines a technology that basically reads the minds of artists (with the mind-machine interface of a net of electrodes familiar to many cyberpunk stories) and outputs artistic vision directly to a machine recording that can then be edited and experienced by an audience. At one point, the main character of the story muses about how this technology allows artistic creation by those lacking traditional artistic skill:
you wonder how many thousands, maybe millions, of phenomenal artists have died mute, down the centuries, people who could never have been poets or painters or saxophone players, but who had this stuff inside, these psychic waveforms waiting for the circuitry required to tap in....
On the surface, DALL-E and Stable Diffusion (and text generators like GPT-3, though my own personal experience of this is different since I’m a bit better with text) seem to do just this. Let us create direct from ideas, jumping over all the fiddly body-learning of composition and construction.
But of course, there is a crucial difference between the imagined and actual technology. The “dry dreaming” Gibson imagined was basically imaging a short cut around the semiotic divide between signifier and signified: it exported meaning directly from a person’s brain to a recording. Let’s leave aside for a moment if such a thing would ever be possible, I think we can perhaps still relate to the desire behind the dream. If we’ve ever struggled to put an idea down in words, we understand the fantasy here. Just take the idea out of my head and give it to someone else directly!
But DALL-E and Stable Diffusion very much do not take ideas directly from the user’s head. They take a textual set of signs from the user, and give back a visual set of signs based on what they have learned by statistically correlating all the sets of textual and visual signs they could find. What they do is, in fact, almost the opposite of what Gibson imagined with dry dreaming. Instead of direct transfer of signified with no distorting signifier in the way, they are dealing with the pure play of signifiers, without the weight of meaning to slow them down.
Of course, the signified re-enters the picture in the moment that I, the user, select an image and think “oh yes, that’s what I meant!” or even “oh wow, that’s what that could mean!” But of course, those reactions happen in the presence of the sign already drawn for us, the re-imagined imagination of the vast set of signs that were the training data for the machine.
That moment is a lot of fun, but the change it heralds for meaning itself is at least as profound as those brought about by recording technology and documented by Kittler in “Gramophone, Film, Typewriter.” If recording allowed for the remembering of signs without the intervention of human meaning-making, then machine learning generators may allow for the creation of signs without the intervention of human meaning-making.
What that does, I don’t think any of us know yet. But it does something.