Ok, so, I had the AI image generator Stable Diffusion XL generate 100s of “selfies” of US Presidents. Let me explain.
But, before I even start on that, let me state that I don’t intend this as any sort of endorsement of AI image generators as a technique. I understand how problematic they are for artists. My goal here is to understand the tool, not to celebrate it (though I do find some of it’s glitchy output quite pleasing sometimes). One reason I chose US Presidents for this project is that, as public figures of the US government, at least the figures I’ll be representing here are already somewhat “public domain.”
So, we know that image generators are able to do a fair amount of remix work, translating subjects from one style into another, that’s how you make something like Jodorowsky’s Tron. I was curious to learn more about this process of translation. How well, and how reliably, could an image generator take a subject that never appeared in a given genre and represent that subject in that genre? How would it respond when asked to represent a subject in an anachronistic genre? Would it matter if the subject asked for had many different representations in the training data or just a few? Which genre conventions would the system reach for to communicate the requested genre?
I also wanted to get beyond cherry picking single images and get a slightly larger sample of images I could use to start to get a sense of trends. I was less interested in what one could, with effort and repetition, get the tool to do, and more what it’s affordances were. What it would tend to encourage or favor “by default” as it were.
So I decided to take a stab at making a run of many images using the recent XL version of the popular Stable Diffusion AI Image generator, mostly because it’s something I can download and run locally on my own machine, and because it’s incorporated into the Huggingface Diffusers library, which makes scripting with it easy enough for… well, an English Professor!
I decided to use US Presidents as subjects for the series, because they are a series of fairly well-known well-documented people spanning 230 odd years of history. That meant I could pick a recent image genre and guess that most of them would not be represented in this genre in the training data (it’s not impossible some human artist’s take on “Abraham Lincoln taking a selfie” is in the data set, but “Franklin Pierce taking a selfie?” I doubt it). The system would have to translate them into it. At the same time, some Presidents have vastly more visual culture devoted to them than others, both because of relative fame and because recent presidents live in an era with more visual recording. I was curious to know if I could learn anything about how this difference in training data might influence the results I got from the generator. Would it be more adept at translating subjects it had more visual data about?
Also, the logic of “I’m looking for my keys here where the light is better” applies. A list of US presidents was easy to find online and drop into a CSV file for scripting.
I went with the “selfie” genre because we know its one that image generators can do fairly well. There have already been some great critiques of how image generators apply the cultural conventions of the “selfie” genre in anachronistic and culturally inappropriate ways. I was curious to see how the “selfie smile” and other selfie genre conventions might pop up in anachronistic images, and to look for patterns in how these genre conventions appeared.
So I ran off a series of 10 selfies each of all 44 unique presidents (sorry Grover Cleveland, no double dipping) using the prompt “A selfie taken by [President’s Name].” I also asked for “A portrait of [President’s Name] using the same random seed, to see how that compared. I also asked for “An official photograph of [President’s Name] descending the stairs of Air Force One” but that prompt mostly revealed Stable Diffusion rather struggles to represent aircraft.
I’ve take a perusal through the results, and while I think my sample size is still very small, I think I see some things I’d like to count up and look for trends with. I think I’ll do this slowly, one president a day for the next few months, and post what I see in each example on Bluesky/Mastodon as I go. In particular, I’m curious about a couple of trends I think I notice in the images.
First, I’m curious about how the media forms that Stable Diffusion associates with “selfie” seem to change over time. For example, for the first few US presidents, the usual result for “selfie” looks like a painting (with the exception of a few odd, photorealistic hypermodern breakthroughs)
(Left: Typical painting style Washington “selfie” Right: Washington cosplay uncanny valley horror thing)
However, by the time you get to John Quincy Adams and Andrew Jackson, the “selfies” appear frequently as if they were early photographs (perhaps daguerreotypes) rather than paintings, while the “portraits” remain paintings. This despite the fact that (so far as I can tell from a bit of googling) only a handful of photographs were taken of either man, and those were taken very late in life.
Also, not the simulated wear at the corners of that image. There seems to be a lot of that in the various “selfies.” Simulated wear and cracks, simulated tape holding them to simulated albums. The “portraits” in contrast, tend to have frames. I’m curious to see if there are trends there. Does the machine simulate “age” in the images of older subjects, even when asked to simulate an anachronistic genre? It doesn’t always (see Washington above) so is there a trend in how frequently that happens?
Second, I’m curious to see how the anachronistic genre conventions of the selfie are applied across time. So, while fans of Calvin “Silent Cal” Coolidge will be thankful to see he has NOT been rendered with a “selfie smile”…
… some breakthroughs of “selfie style,” sometimes mashed up with period media artifacts, does break through, as in this image where Woodrow Wilson’s arm extends to the corner of the image frame, holding up a small, light smart-phone-sized camera that inexplicably also shoots black and white film with a noticeable grain and a depth of field measured in microns.
Or this one, where a phone is mashed up with period camera hardware to make some kind of dieselpunk accessory for a Harry Truman perhaps being played by Eugene Levy:
At first glance it seems like these style moves become more common the closer you get to the present, even though they don’t really make sense until 2007 or so.
So, those are my first pass instincts. Going to take a closer look at each and do a count, see what I can see. Stay tuned on Bluesky and Mastodon.