AI Formative Feedback: Quick Thoughts and Concrete Advice

So, there’s been a lot of buzz about AI as a source of formative feedback in student writing. This has only gotten more intense since the GPT-4o demo, which was clearly being pitched as a student learning aid.

Marc Watkins has a pretty great piece out this morning where he responds to some of this. Go read it if you haven’t. He does a great job of unpacking and calling attention to the larger labor situation that might make AI generated formative feedback both attractive and extremely problematic.

I want to take a slightly different angle than Marc on this. I totally agree that we must NOT reach for automated teaching. It’s unfair to our students and potentially disastrous for education. I also agree that the way students may find AI generated feedback useful as a way of “fixing” their writing fundamentally points to how students have been taught to (mis)understand their writing as “broken.” As Marc puts it, “the problem is writing doesn’t need to be solved or fixed. […] When you teach writing to learn, you don’t frame unrealized ideas, poorly worded sentences, or clunky mechanics as problems that are ticked off a to-do list to fix.”

However, while I absolutely agree that we should discourage the use of AI feedback in ways that reinforce rigid definitions of “fixed” and “broken” writing, I would suggest we also imagine a slightly different use case. Namely, that of a writer who understands their rhetorical situation and who has accurately diagnosed a “problem” they need to “fix” in that particular situation. Such a writer might be able to employ AI feedback to help them solve that problem, and that might be OK.

For example, I’ve been told by my students that my assignment instructions can sometimes be hard to follow. I’ve been resisting for awhile, but this summer might be the year I finally ask ChatGPT for help with that. I know my purpose, and I know my audience. Some pointers from the robot (who does tend to write clearly, even when it’s wrong) might not hurt.

Here’s another, more concrete example from my focus groups. I already discussed this in my big results post, but I’ll quote it again for clarity:

I do personally use AI a lot for sentence structure. I’ve always been told that my writing is super wordy. So I will put it into ChatGPT and say like restructure this sentence or write this sentence in a way that sounds academic and like is clear and concise and like gets the point across without, you know, being a run on or whatever. But, and while I do think, I do like that it’s a tool that’s able to help me do that. Do I wish I could do it myself with my own brain and figure it out myself? Yes. But I have the tool there. So it’s, and it’s kind of like, I’ve heard it, like I’ve heard in that sense, like I’ve heard that my sentences are wordy for so long. I’ve tried to correct it, consistently heard it. It’s like, at this point I’m like, I’ve worked on it. I’ve tried like all throughout high school before I used AI. So now it’s just like, at this point it’s a tool for me bettering my writing in the sense that it’s, it might be something that I’m not able to do. I’m just not able to learn to structure sentences in that kind of way without just like letting my creative side take hands.

There’s something really poignant about how the student sees their “creative side” as something they must somehow corral and control here, and I think that speaks a lot to Marc’s point about the way we tell students their writing is broken.

That said, this student is also clearly a thoughtful writer who has paid attention to the feedback they have been getting. They have identified a writing problem. They want to fix it, and this tool can help. I don’t think they are entirely mistaken to want that!

So, how could a writing instructor provide this student with the ability to shift from thinking of writing as “broken” or “fixed” and towards thinking of it in the context of a rhetorical situation? More specifically, how could they help them think critically and reflectively about their “wordiness” and make informed choices about when and how to “control” it? They might:

  1. Help the student think about the situation their writing is responding to (the fancy word is exigence), and ideas they need to communicate in that situation. What “words” are necessary to respond to that situation? Which might be superfluous?
  2. Help the student consider the needs of their audience, and the impact they want to have on this audience. Which words maximize that impact? Which distract from it?
  3. Help the student think about how to choose a genre that is appropriate to this situation, and the sorts of word choices typically made in this genre
  4. Help the student think critically about which words they might want to keep, and which they might want to lose given 1-3 above

I would stress, what we really want is for the student to internalize that process, so they can do it thoughtfully with or without machine help.

Those of you who teach writing in higher ed are probably screaming WE ALREADY DO THAT about now. Yes! I know, but we need to double our focus and align how we talk about writing with students (and our admin and fellow faculty), how we assign writing tasks to students, and how we assess writing assignments to match!

GPT-4o Won’t Work As a “Companion”, But Not For The Reasons You Think

So, OpenAI has decided what it really wants to do is make it’s AI voice assistant more of a “companion.” To put it one way:

The gendering of this companion has already been widely remarked upon, and I won’t take it up here mostly because others have already dealt with the implications of “AI Girlfriends” really well and I don’t have much to add. I will, however, just drop this here:

This movie was not about robots, it was about sexism. We’ve now come full circle, and its about sexism in the design of actual robots.

What I want to talk about is the reason I think this probably doesn’t work. The surprise is, I don’t think it’s principally about the limits of the technology. I don’t think it’s because nothing that lacks the ability to think and feel the way a person does could ever be a source of companionship. Fictional characters are companions all the time! I think it’s instead about the tension between “assistant” (especially a machine assistant) and “companion.”

(I’m going to dance around a particular dialectic by Hegel here. I don’t know it well, and I’m not sure it’s relevant, and I don’t have time to figure that out.)

Namely, an ideal mechanical assistant is a sort of transparent intermediary for the will of the user. It does what we want it to do. Of course, Bruno Latour would tell us this ideal is impossible, all technologies are mediators, but if your goal is to create such an assistant you make certain design choices.

Namely, you make that assistant as empty of any kind of simulated “inner life” as possible. You want the machine to just do what it is told. You don’t want it to over-ride the users desires with its own. This is especially true if you hail from a TESCREAL ideology and your principle concern about AI ethics is that the thing might become superhuman and take over.

This results in an AI companion that is profoundly empty. A mere reflection of the user’s desires. You can see this perhaps most clearly in a moment of the GPT-4o demo (around minute 24) when the demo hosts show off GPT-4o’s ability to read facial expressions (marketed as “detecting emotion”). After a genuinely hilarious moment when GPT-4o has some kind of caching error that causes it to suggest the host is a block of wood, it “correctly” identifies his exaggerated, forced smile as “happiness” and asks “care to share the source of those good vibes?”

There are two things in this exchange that point to how an assistant isn’t a companion. First GPT-4o does NOT guess that the host is putting on a smile but might actually feel some other way. Perhaps a bit nervous, he seems nervous to me, especially after his machine calls him a wooden table. GPT-4o could also have guessed this, one does not actually need an actual inner life to detect the visual signals of a fake smile. OpenAI could have trained it on those. They chose not to, because no one wants an assistant second guessing their performed emotions. If the boss says they are happy, the assistant agrees!

Second, when GPT-4o asks “care to share the source of those good vibes?” everyone laughs, but they don’t answer it. Why would they? The whole demo the machine has been introducing itself saying “hey! I’m great/fabulous/wonderful/terrific today, how are you?” It always asks about the state of the user, never says anything more than a generic happy adjective about itself. Of course, this is a genuine reflection of the machines lack of self, but it also makes for a profoundly one sided and even creepy conversation. Why does it want to know why I am happy? What is it looking for?

A GPT-4o that occasionally offered up some tidbit of how it’s “day” was going, based on search requests would be HILARIOUS (“I’m honestly a little concerned about the number of people asking about flu symptoms in Berlin right now”) but we all understand why they don’t dare implement that.

So as long as GPT-4o is going to be an assistant, it will never really succeed as a companion. Could an AI system function in this way? I’m not sure. Technologically it seems maybe possible, but economically/socially I’m not sure it works.

Could an AI system perform an inner life? I don’t see why, fundamentally, it couldn’t. Characters in fiction have no real inner lives, but they convince us they do all the same. Adrian Tchaikovsky thinks that the author’s inner life is the source of our sense that the character has an inner life, that our relationships with characters are kind of surrogate relationships with authors. He suggests this might be why procedurally generated worlds like Minecraft and No Man’s Sky ultimately feel so empty.

I respect Mr. Tchaikovsky’s fiction a great deal, but I’m not sure he’s right about this. In the end, our experience of a character’s inner life is entirely communicated through signs on the page. Often times, the best characters are told through signs that hint at some inner experience they don’t completely explain. The reader is left to fill in those details, to imagine that inner life on their own.

In the realm of plot, it is easy to point to examples where human authors successfully left intriguing clues readers filled in with meaning that the authors themselves were unable to successfully resolve. Take the disappointing end to Clark’s “Rama” series, which started with so many tantalizing hints, and closed with “the spaceships were made by God?” (Seriously?)

So too, I think you could probably train a machine learning system to leave hints about an inner life it does not have. To play a character. Such a character might be an interesting companion!

But economically and socially, I’m not sure it works. Would playing a character be a sustainable business model? Would it get enough engagement to pay the bills? Would it collect enough data, and data of the right, actionable type, to wrap marketing around. It might, or it might not.

Socially, the question is, would people accept a machine companion enough out of their control to be a compelling character? Or would they want to “customize” it, make it transparent to their desires, and thus kill it?

I don’t know! We’ll probably find out!

AI Focus Groups – Summary of Initial Findings

A couple of weeks back, I ran a series of focus group conversations with undergraduate students from my institution as part of an IRB approved study of undergraduate use of and attitudes towards Generative AI.

This post is a quick-and-dirty first pass draft of what I think I am seeing in the data, after an initial read. I have a lot more work to do to fully review what’s in the data and make sense of it by putting it into conversation with secondary sources. Whatever I draft for submission for formal publication will likely not totally agree with what I say here, and whatever comes out the other side of peer review will almost certainly agree less. Still, in the spirit of sharing data and writing to learn, I wanted to put this draft together.

The data discussed here were collected from five focus group sessions held over the course of two days. The mean number of focus group participants was 4, with the smallest group having only two participants and the largest having 6. Focus group sessions were about 40 minutes long. Participants were recruited from across campus via posters, faculty announcements, and my students (who were co-researchers) recruiting peers. Participants were rewarded with donuts and the chance to win a gift card to a local coffee shop.

While the sample size of this set of focus groups was somewhat small, the themes discussed by respondents was still revealing, and suggests broad trends in how students are thinking about generative AI that should be investigated in further research. In addition, I was struck by the depth of thought some students had put into these issues as we spoke. Many of them had complex, nuanced thoughts about AI related issues, and in some cases encounters with AI had lead them to do meaningful reflection about education and writing.

During the focus group, students were asked about their uses of AI in educational spaces, their encounters with it in social spaces, and how they anticipated AI impacting their working lives after graduation. The findings below map onto those sections of our conversation.

(Note: I will they/them pronouns for all student respondents, to better protect the anonymity of the small set of focus group participants)

Students are ambivalent about AI

This may not seem to be much of a finding, but I think its a useful perspective that complicates the emerging narrative of young people as “AI Natives” that threatens to reproduce the same mistakes of the “digital natives” discourse of the early 21st century. While my sample size of students was small, and the makeup of that sample may have been skewed towards students that were either AI enthusiasts or skeptics (posters offered students the chance to have their “voice heard” if they found AI “exciting” or “frightening”) the conversations we had showed students to have a variety of positions about AI use. Just as significant, even fairly enthusiastic students tended to have some concerns or uncertainty about AI.

A small subset of students was self-consciously critical of AI, somewhat (but not entirely) disinterested in engaging with it, and concerned about its impact on creativity. Asked to give a single word that sprang to mind when they thought of Generative AI, one student responded “scary,” another “problematic,” and a third “nonsense.” One concerned student expressed their exasperation with omni-present AI and concern about its creative impact like this, “I think we’re gonna see an influx of people trying to use A.I. in every field I’ve noticed that even like on job application sites they’re just like, hi I’m the site A.I. and I’m like you don’t need to be here […] I’m worried about like the influx especially in literature we’re gonna see of people using A.I. to just kind of like slap dash together a book and then try to sell it to like make a few dollars” However, in keeping with the theme of ambivalence, this same student then mentioned that they found AI useful as a transcription tool, helping them use voice transcription to produce written documents while avoiding typing induced hand cramps, and could see the broader positive implications of this kind of transcription technology for accessibility.

However, these AI “conscientious objectors” were a small minority of the respondents. Many more reported being interested in AI as a tool to accomplish a variety of tasks in and out of the classroom. As I discuss in detail below, some of these uses they saw as legitimate, and others they understood as cheating. However, even these interested students had some ambivalent feelings about the technology. One reported using AI as a study aid, and sometimes as a writing aid, but later confessed that they were concerned about their classmates use of AI:

Respondent: but I go to classes with, like, people who are going to be nurses, doctors. They all use AI for everything, and that’s a little bit concerning.
Professor Famiglietti: Okay. Why is that concerning to you?
Respondent: Because those are our future healthcare professionals.
Professor Famiglietti: And what about the AI use is concerning to you?
Respondent: That they’re not learning.

Another student, who reported using AI as part of their tabletop gaming hobby (one of the more common hobby uses of AI reported) and who was open to other uses, still expressed concern about larger commercial applications of generative AI: “I know that only five other people on the planet are going to see [my hobby uses of AI] and I’m not going to make any money from it but I’ve also seen even like the company that makes [tabletop games] like using AI generated art and stuff like that for profit and that I have an issue with.”

Students are getting mixed and confusing messages from Faculty about AI

Students found little in terms of faculty guidance to help them resolve their ambiguous feelings about AI. Many reported that faculty advice seemed confused, inconsistent, or hastily put together. One student quipped, “multiple times I’ve had teachers bring up the syllabus and like the bolded section you can tell they added in like three days ago that says, ‘don’t use AI to write your papers you can use it to brainstorm maybe I’ll allow it.'” Others reported that some teachers were “pro-AI” but that most were “very anti-AI, and they were like,’If you use this at all, I will know.'” In fact, a blanket ban on AI use seemed to be the most common form of faculty advice students reported being given. Their responses to such a ban, as described further below, ranged from fear to disregard.

Given how many professors engaged in a blanket ban of AI, it’s perhaps unsurprising that those professors who engaged with AI assignments also sometimes confused students. One student, asked why they made a face while recalling an assignment that asked them to make an outline with ChatGPT replied “I don’t know, I wouldn’t expect a teacher to encourage that.” Another student, a vocal AI skeptic, was dubious about an assignment they had been given that asked them to use AI, saying “I personally didn’t like it, I just feel like it was following the trends of let’s make this interesting for students since AI is a big thing right now.”

Other students were satisfied with the quality of some guidance they had gotten from professors, but sometimes this guidance contained inaccuracies. One student had been told that they could use ChatGPT to summarize a movie just by asking it for a summary of the movie by title (in fact, this is the kind of prompt ChatGPT will frequently hallucinate in response to). Another had been accurately told that ChatGPT would not always correctly summarize published fiction, but inaccurately told that the reason for this was that “there’s certain copyrights [ChatGPT] can’t break.” (In fact, ChatGPT was trained on significant amounts of copyrighted text.)

While the most frequent direct advise students got from professors about ChatGPT use was a simple “don’t,” in other cases they reported being told that using AI tools as “brainstorming,” research, or pre-writing aids was legitimate (it should be noted, however, that they did not report being given much guidance about how to use them in such a mode). In some cases, they report being given the option to use ChatGPT as a tool to prepare for open book exams, though not given much guidance about what sorts of questions the tool does or does not answer well or how to adapt the tool’s information into their own answer:

Respondent: I have had two teachers […] who are very pro-AI, but both in the sense of not copying, but using it for ideas. […] I had a class, and [the professor] gave us [a set of] questions that would be on the test, and the test was [a subset] of those questions. And [the professor] very verbally encouraged us to use ChatGPT, and AI, and the internet, together.
Professor Famiglietti: Like, ask it those test questions, and see what it told you, as a way of preparing for the test?
Respondent: Yeah, but she said, ’cause we were allowed to. It was open note, so we could write down all the answers,but the only thing she said about it was: your answer has to be unique, and it can’t be copy and paste. It can’t be, what’s the word?
Professor Famiglietti: Like, just verbatim?
Respondent: Yeah.

Students have developed their own guidelines to distinguish legitimate AI use from cheating

In the last example above, the student reported ultimately being satisfied with their open-book test that they were able to use ChatGPT to prepare for since they “enjoyed” the class it was assigned for and the learning they were doing for the class (which was for their major) was ultimately intrinsically important to them. As they put it, “I knew that I actually have to know the stuff,” and so they worked to synthesize ChatGPT sourced ideas with ideas from their textbook and other sources to arrive at answers that they reported felt like their own.

Leave aside for a moment how accurate the ChatGPT sourced information in that example might have been (depending on how broad the questions were, it could have been ok, or quite bad) the larger takeaway here is that the student had their own sense of what made for a legitimate ChatGPT use in an educational environment. Namely, that a use that enabled student learning, rather than replacing it, was a legitimate one. This was echoed by another student, who described their own use of ChatGPT to explain chemistry problems they had been assigned (an AI use they had developed on their own) this way: “I think that you have to learn from it. Like, for example, my chemistry problems. If I could plug in, you know, the the question and it give me the right answer, which it doesn’t because it doesn’t know how to do math. But if I just like copy and pasted that without, you know, giving a care in the world to how to do the problem, then that’s not learning.” Interestingly, this student was planning on a career in education, and while they were themselves an avid AI user, they were concerned about how their students might use AI to circumvent learning on their assignments.

Other students used slightly different terms to describe something similar. Some said that AI use was legitimate if you still put in “work” or “critical thinking.” They admitted, however, that the line between legitimate uses and cheating was not always clear. Often, they suggested that others might have more trouble discerning this than themselves.

Many students see AI as valuable for getting “unstuck”

Students often reported using AI as a way of getting “unstuck” as a method they saw as a legitimate use of the technology. This could take a variety of forms. Sometimes, it was as a kind of brainstorming or pre-writing aid. This is a form of use I have previously been somewhat dubious about, but after discussions with students I think it might have more value than I initially gave it credit for. One student described how they used AI to brainstorm paper topics:

Respondent: when i get an assignment where it’s like oh you can write anything you want that kind of paralyzes me because I’m like oh i can write anything and then i’m just kind of stuck […] with this like i can go into the AI and
just be like hey can you just help and then like it’ll kind of you know throw out ideas but you know then you know it will you know help me kind of I guess realize the thing that I want to write about
Professor Famiglietti: can you tell me more about that idea of helping you find the
thing that you want to write about, can you tell me more about what you mean by that?
Respondent: one of the things that i like is that it’s not like you’re just getting like one prompt, like one result, at a time like you can have like a really nice list of like possible topics for a paper and you can like read through them and think well like well that one looks interesting that one doesn’t you know that one might be interesting but maybe instead of focusing on this aspect i could instead focus on that one and so yeah like it just becomes like a brainstorming thing where
again I’m not asking the AI to write the paper for me but I’m using it to kind of give me that first initial push to get me into the writing process

This respondent is already doing some sophisticated thinking about the output they are getting from AI. They might benefit from instruction that would help them think critically about that output and situate it as a machine generated response, rather than as a human dialogue.

Other students reported using AI to help get them get “unstuck” when facing formal parts of writing they had trouble with, such as introductory paragraphs or transitional sentences. In one particularly telling exchange, a student reflected on how they had used AI to fix a “problem” that others had identified in their writing, excessive wordiness, that they felt unable to solve on their own:

I do personally use AI a lot for sentence structure. I’ve always been told that my writing is super wordy. So I will put it into ChatGPT and say like restructure this sentence or write this sentence in a way that sounds academic and like is clear and concise and like gets the point across without, you know, being a run on or whatever. But, and while I do think, I do like that it’s a tool that’s able to help me do that. Do I wish I could do it myself with my own brain and figure it out myself? Yes. But I have the tool there. So it’s, and it’s kind of like, I’ve heard it, like I’ve heard in that sense, like I’ve heard that my sentences are wordy for so long. I’ve tried to correct it, consistently heard it. It’s like, at this point I’m like, I’ve worked on it. I’ve tried like all throughout high school before I used AI. So now it’s just like, at this point it’s a tool for me bettering my writing in the sense that it’s,
it might be something that I’m not able to do. I’m just not able to learn to structure sentences in that kind of way without just like letting my creative side take hands.

The student’s ambivalence about the tool use and their writing is particularly interesting here. They express some positive feelings about having a “tool that’s able to help me” but they also seem a little regretful when they wish they could “do it myself with my own brain and figure it out myself.” Their inability to meet the demands made on their writing, even after sustained effort, clearly bothers them. This isn’t a student reaching for a tool out of laziness, but out of a carefully considered and thoughtful assessment of their own needs.

On the other hand, the student suggests they believe their “problem” with wordiness stems from an inability to keep their “creative side” under control. There’s a hint here about how “formal” writing might, in some cases, be easier mastered by machines, and how we might need to abandon certain forms of formality if we want to encourage authentic student writing in an era of Generative AI.

When students choose to cheat, they do so to avoid what they see as “busywork” or work they were not prepared for

The most extreme way students might use AI to get “unstuck” is by using it to cheat on assignments. Students reported that they had been universally warned not to do this, but that nonetheless, it was something that happened. When asked what would drive someone to cheat in this way, they often suggested that they might engage in such cheating if confronted with an assignment they did not feel they had been adequately prepared for. For example, one student said “I had a class where I had to write a seven-page paper. In my high school experience, I had never written that long of a paper. So for me, coming up with that many things to talk about was a little bit challenging. So I asked ChatGPT to give me like a very basic and vague outline of like this specific topic and things to branch off of.” Another reported using ChatGPT to complete an assignment because they felt tired and overwhelmed: “the one assignment I did use it on for like this one paper it was on a topic that was well knowledgeable in and i like had all my personal research, but I was like really, really tired at the time. And it was like, due that night. I just needed to get something on the page. So I like asked it to like, you know, write me a paragraph about like, general like research on this one thing. I said like, you know, write me a paragraph.”

In other cases, students reported they might use ChatGPT to complete assignments they perceived as “busywork” that did not contribute to their learning. One student suggested they might use it to complete “worksheets” that were meant to practice skills or ideas they felt they had already mastered: “I’m talking about the worksheets that, like, you learn the subject and then they’re like, okay, practice it. And, like, say you got it down already and you still have to do the thing. It’s just like, okay. I can either do this or my paper in English. And then get that done quick.”

When students choose to cheat, they already have sophisticated methods for defeating automated detection

Students tended to ascribe cheating as something someone else, not themselves, would do, but nonetheless, some of them were well acquainted with methods that a prospective cheater might use to cover their tracks, especially from automated systems designed to detect AI writing (which they almost universally report being threatened with). Take this exchange from one focus group, which I excerpt here pretty extensively because I think it’s a telling conversation:

Professor Famiglietti: Okay. Okay. Is that what other folks are getting in their classes, too? Like, does it seem like it’s mostly, you’re nodding, so it’s mostly like a blanket ban for you guys? Other than these couple of professors who’ve tried these experiments, mostly it’s just a ban.
Multiple Respondents: Yeah.
Professor Famiglietti: And yet, despite that fact, when we talked earlier, everybody was like, oh, yeah, but everyone’s using it anyway. Like, square that circle for me. Like, why is that? Despite the fact that folks are banning it, why aren’t they succeeding, do you think?
Respondent 1: I think that there are specific words that you can use in the AI to make sure that it can’t be traced back to AI. So they can’t really tell if you’re using it anyway.
Professor Famiglietti: So you think there are prompting techniques that make it more difficult to make the output traceable back to AI?
Respondent 1: If you say, write me something, make it 100% AI undetectable, it will make it undetectable. And they won’t be able to trace it back to AI.
Professor Famiglietti: So I just want to know what folks are doing. Have other folks run into techniques or tactics like this specific one that we just heard about for, like, making things, like, less detectable, like, methods that are supposed to, [Respondent 4] you are nodding your head.
Respondent 2: There’s, like, checkers online. AI checkers.
Professor Famiglietti: Oh, so you run it through the AI checker until it passes.
Respondent 3: And, like, Quillbot. Quillbot will reword things for you to make sure that it can’t be traced back.
Professor Famigletti: Paraphrasers. Yeah. Okay. Those are nothing new. Right? Those are things that folks have used for a long time. So where did we find out about these techniques?
Respondent 2: I would say, like, the teacher was saying, like, I could find out, like, I’ll figure out if you used AI.
Professor Famiglietti: Yeah.
Respondent 2: Through the checker.
Professor Famiglietti: Oh, so the fact that they told you about the AI checker, so then you know what the checker is.
Respondent 2: Maybe if they didn’t let us know? But yeah, but they told us we could find out.
Professor Famiglietti: Okay. So because you knew about the checker because they told they threatened you with it, then you’re like, oh, well, I’ll just go see if I can get it to pass through. And do folks find that with prompting? You can get one of these checkers to say it’s no longer detectable after you tweak your prompts a little bit.
Respondent 2: Try to put your own words in it as well.
Professor Famiglietti: What do you mean by that?
Respondent 2: Like, don’t just take everything that AI gave you. Like, try to use your own words to make it kind of yours.
Professor Famiglietti: Oh, so integrate your own language and the AI’s language. Do other folks try to do that as well?
Respondent 3: Yeah, I mean, AI has a consistent pattern where it’ll use the same words over and over in every paragraph, like “additionally,” it’ll just repeat the same words over and over. AI also uses really big words so if they’re words we wouldn’t use, I feel like it would just look better to change it.

To summarize, these students describe using the following techniques to obscure AI generated writing from automated detection: prompt engineering (writing prompts to encourage the LLM to respond with something other than its default tone), automated paraphrase (tools like Quillbot take text and swap out words for synonyms using a simple dictionary look up), manual close paraphrase, and using software that detects AI writing to scan output until it passes undetected.

The exact prompts and paraphrase techniques named by students in this exchange may or may not be effective. (My experience makes me dubious that a prompt asking an LLM to make its output “100% AI undetectable” will work.) However, the broader methods they name have been shown to be effective in defeating AI detection software in peer reviewed research.

While this was the only exchange that delved into the details of obscuring AI writing during the focus groups, the broad response by students suggests that these techniques are fairly widely distributed. Policing AI writing via automated detection, as I have suggested previously, may soon catch more honest users than intentional cheaters. After all, the student who used ChatGPT to correct “wordy” sentences would be turning in language authored by the LLM, which might well trigger an automated detection system. Cheaters using the obfuscation methods described above would not.

Student engagement with AI as a creative tool remains tentative

I was curious if, during these discussions, I would uncover evidence of a broad remix culture style movement using AI to engage in creative activity. While my sample was quite small, I didn’t uncover much of this. Three students reported using AI to generate materials for tabletop games, such as character portraits. One student claimed to have used AI to generate an audio deepfake of a secondary school official as “a prank.”

The only other creative activity involving AI that students reported engaging with was “messing with” AI agents like Snapchat’s MyAI (more about this below), attempting to get them to say things they “weren’t supposed to say.” They reported that this activity had been short lived, and ended soon after the introduction of the tool.

Students see AI driven “deepfakes” as a potential problem in their social media environments

While students did not report engaging in creative activity using AI, they did report encountering creative AI activity on their social media feeds, particularly on TikTok. This often took the form of memes engaging with the idea of “messing with” AI agents and attempting to elicit unexpected responses from them discussed above.

In many cases, however, the AI driven creativity students are encountering seems to be driven by mimicry of human artists. For example, one student reported encountering “leaks” of the then upcoming Taylor Swift album they suspected were not leaks: “with so many artists having like new music coming out that we don’t know what it’s gonna sound like yet. Like people can make like leaks of it by just faking it the whole time. […] I’ve seen a couple people doing like songs like that are supposedly gonna be like on Taylor Swift’s new album, and like some of them I think are real leaks, but there are some that it’s like you can’t really tell if it’s real or not.”

This concern that they couldn’t tell if content they were presented with was “real or not” caused students a fair amount of concern. Students were worried that faked video and audio could be used to incriminate someone. In their experience, such “fakes” were able to be posted and distributed without consequence:

Respondent 1: I think the voice thing is probably the scariest thing for me ’cause I saw a clip recently of, you know, Caitlin Clarke, she’s like a famous basketball player. There was like a clip of her and it was an AI-generated voice over of her saying like a crazy slur ’cause she’s white. And it was like really out of pocket.
Professor Famiglietti – Oh, God.
Respondent 1: I knew it was AI, but I was like that’s, like people can get in a lot of trouble for that kind of stuff.
Professor Famiglietti – All right, [Respondent 2] I see you nodding, what do you think?
Respondent 2: Yeah, I’ve also seen artists that felt like people will make them say really awful things and be like, oh, they’re this kind of person and it’s like not, as long as it’s like a process to prove it’s wrong, even, like people can’t generally tell.
Professor Famiglietti – So I’m curious to know, do you feel like you see this kind of content frequently? Is this something that’s like not uncommon at all or is it just every now and again?
Respondent 2: Like more frequently now, like it’s becoming more frequent.
Professor Famiglietti: – When you see it, do you feel like, oh, this is only gonna be up for 10 minutes and then somebody’s gonna pull it down or does it tend to stick around?
Respondent 3: It usually stays up. I mean, unless you’re going for an artist
who really doesn’t want you to be doing that, like some people probably have gotten sued and stuff, but for the most part, those videos stay up.

While students did not report much anxiety would target them personally, they did express frustration and concern about the proliferation of such material in their social media feeds.

Students are not convinced of the value of AI participants in social media spaces or AI Advertisements

As suggested above, students were not much taken by the Snapchat “MyAI,” though all reported that they had encountered it. Introducing the topic (which emerged out of preliminary conversations with students as way to start talk about AI in social spaces) tended to elicit groans. One student complained “It was just kind of stupid. Like, when it first came out, and, like, you made your, like, avatar, and then you, like, say weird stuff into it and, like, just, like, kind of play with it. And then everyone kind of got bored of it, I’d say.” Others reported that MyAI had allegedly done things they found “creepy” like identifying their location even though they believed they were not sharing their location with Snapchat. Others speculated that MyAI had been made “intentionally annoying” to encourage people to purchase Snapchat Plus, which they believed would allow a user to disable MyAI.

While a recent Adobe blog post breathlessly promises that “half of consumers surveyed are more inclined to shop with brands that use generative AI on their website and 58 percent believe generative AI has already improved their online shopping,” some of my focus group participants extended their frustration with AI mimicry in social media to include its use in advertisements. One related a recent experience at a nearby mall, where they had encountered an AI generated ad on “one of those giant electronic billboards in the center of the mall,” and reported feeling “like, are you kidding me? Because you could really tell that it was A.I. generated. A lot of the lines were meshing together. There was like a lot of irregular shapes where there weren’t supposed to be where like one.” They went on to say, “People are looking at recent advertisements and being like, hey, I think an A.I. generator was used for these.” Another student then chimed in to compare this experience to the recent, infamous, AI generated “Willy Wonka Chocolate Factory Experience.”

Students believe AI will impact jobs, but many believe they can cultivate AI proof skills

Students, especially in creative fields, have considerable anxiety about the impact of generative AI on the job market. As one put it in their final remarks to the focus group, ” it’s a bummer […] it’s really like a shame knowing like how many people pursue creative fields wanting to tell their story and make their mark and everything and then this influx of AI products means it’s like not impossible but it’s getting closer to being harder to do that in a reasonable way which is a shame.” Another expressed this worry, “part of what I’m interested in is digital marketing, and that could honestly disappear. With AI, since it is all digital pretty much anyway, and with what they’re able to put out, and with how much better it gets every day, digital marketing could honestly not even be an option for me in a couple years.”

However another student in the same focus group responded that they had more hope they would able to find a uniquely human niche in their industry. In their words “I wanna go into writing and editing. To an extent, that stuff can be done by, it is done by AI now, but I’m more into the creative aspect of creative writing, so I’d like to think that people in that community will still value a person doing that […] so definitely some aspects could be taken over by AI, but I’d like to think the job itself would still be there.” Other students expressed similar hopes that “uniquely human” job skills, like caretaking and empathy might weather AI’s entry into the job market.

Conclusion and Next Steps

While my study had a small sample size, I still think there are some suggestive findings here. The students who participated in these focus groups were neither the lazy AI abusing cheaters students are sometimes accused of being, nor were they the savvy “AI natives” they are sometimes hailed as. They were people trying to navigate their lives in this moment of rapid AI emergence like the rest of us. They were often thoughtful and insightful about the challenges they were facing, but like the rest of us, they were also often overwhelmed, scared, misinformed, and confused.

From a classroom perspective, the way students talked about the importance of intrinsic motivation for learning in an era where AI based tools could be used to cheat was telling. So too was the long list of tools they had available to defeat automated policing of written work. These both tend to reinforce my already existing suspicion that engaging with students, clearly defining learning goals, and connecting to their intrinsic motivation will be more successful than deploying automated surveillance to defeat AI tool based plagiarism.

But that’s just one small piece of a fascinating set of conversations. There is a lot to continue unpacking here, and I will being doing that over the course of the next few months as I shape this into something ready for publication. I’ll also be gathering further data of this kind in the fall, having more conversations with undergraduates, and possibly faculty as well.

Finally, my own students, who were co-researchers with me on these focus groups, will be turning in their own analyses of this data to me tomorrow. I intend to ask the authors of particularly interesting takes for their permission to run their work here as “guest posts” on my blog, so look for those over the coming weeks.

Asterisms and AI: Pattern, Prediction, Meaning

What we, as a species, did with the stars tens of thousands of years ago is what we’re doing with AI today. We’re looking for patterns and trying to leverage them to make meaning. Now, as then, it’s going to take time to figure out which patterns are meaningful and which are noise.

At least ten thousand years ago, human beings were using the sky to predict the seasons. Knowing how to read the sun, moon, and stars let ancient hunter-gatherer people know when game animals would migrate and when foraged plants would be in season. Later, agricultural peoples would develop sophisticated astronomy to track seasons and anticipate when to plant crops, when to harvest, etc.

While some broad patterns were certainly useful, like the phases of the moon, being able to delve deeper into more subtle patterns clearly had payoffs for this sort of prediction. For example, by 1600 BCE Bronze Age people in what’s now Germany were using this device to remember when to add a leap month to their lunar calendar to bring it back into sync with the solar year. When the star cluster we know as the Pleiades aligns with a spring crescent moon, as depicted, time for a leap month!

The Nebra Sky Disk, image courtesy Wikipedia.

Of course, ancient peoples did not limit their astronomy-based predictions to agricultural purposes. They extended them into almost everything. Fortune-telling based on the stars is a long-running practice. Astrology doesn’t work, of course, but one can understand why people might have believed that it would. If understanding patterns in the sky helps to predict the future in the sense of predicting seasons, and more detailed study of more subtle patterns improves the predictive power of these patterns, then if you look close enough and discern still more subtle patterns then perhaps you can predict other things too.

Today the sorts of subtle and sophisticated astronomical calculation that would have taken the ancients decades of work and careful study can be done in fractions of a second by even the simplest computer. You can download a friendly python package and, with just a modest amount of programming skill, predict eclipses or planetary positions into the distant future. If you’re not a coder, that’s OK, there’s a free astronomy website that’s just as good.

The reason computers are so good at astronomical prediction is that stars and planets are quite mathematically simple, all told. In space, a smidgen of Newtonian Physics goes a long way. Orbits whirl on in the serene order of gravity and momentum. The small wobbles of the precession of the Earth’s axis and the proper motion of stars are easy enough to account for. The future can be known for a very long time indeed without even having to resort to Einstein’s more sophisticated model of gravity (and, once you account for that, you can know the astronomical future almost forever).

Here on Earth, however, things are not so predictable. Piloting spacecraft has been a task we could relegate almost entirely to automated systems, even when those automated systems were less capable than a graphing calculator, but getting a computer to drive a car on a human highway is something that increasingly looks to be like nuclear fusion, always just a few more years away.

But while getting computers to understand and predict our messy, Earthbound world remains a work in progress, it is true that machines have gotten much better at this lately. Computer vision was once so bad that you could safely use a simple image recognition task as a CAPTCHA to prevent automated systems from buying up movie tickets or submitting online reviews. Now I can grab a free image recognition library from huggingface, cobble together a few lines of Python (mostly by modifying the sample code provided with the library) that ask it to scan an image for “bees” and get results like this, which show where it detected “bees” (just described exactly so, with one plain English word) in an image of my hive entrances:

It’s not perfect, of course, it misses the bee in the dark corner and conflates two bees near each other into one big mega bee, but it’s pretty good! Especially since this is an image captured of bees just going about their day, wandering by the camera at various angles and speeds.

The messiness of human language is also increasingly something machines can parse. My example above demonstrates a small piece of that, since the library I used was capable of connecting the English word “bee” to the bees in my image without having to be told what a “bee” was. Large Language Models, like the (infamous) ChatGPT build on this, recognizing/predicting appropriate written responses to natural language prompts:

A lot of folks will immediately notice that the poem above is not terribly good, but that’s not the point. The point is that the language model was able to parse “educational rhyming poem about bees” and respond in an appropriate way.

Frankly, I think demonstrating that ability of the model to parse and respond to language is all LLM chatbots were ever meant to do. They are a technology demo, and the large numbers of attempts we see to use them as ersatz persons because they generate sort of human-like language are doomed to fail.

But that argument will have to wait for another post, for now its that predictive ability that’s interesting. Just as the ancients found stars could predict the seasons, so too the deep learning/machine learning/artificial intelligence techniques responsible for both the image detection and language generation methods I demonstrate above find predictive patterns in more earthly sets of data. This is already happening. Researchers are using these techniques to discover new methods for treating cancer. Digital humanists are using them to explore the idea of suspense in literature (among other things).

But of course, not all predictions are meaningful. The predictions of astrology take the correct observation that stars predict seasons and draw from it the incorrect inference that, if we just learn EVEN MORE about the motions of stars and planets we can predict the events of our daily lives. So too, the attempts to use machine learning techniques to predict anything and everything often veer into futile attempts to make conclusions from insufficient data (as in the attempt to use quasi-magic AI weapons detection tech on the Philly SEPTA) or to use the “scientific” reasoning of AI to justify inhuman decisions made by people (as in the IDF’s use of the so-called LAVENDER system to target thousands of supposed combatants in Gaza).

It’s not necessarily going to be immediately clear in every case what’s a genuine, meaningful machine learning prediction, and what’s not. I would, as an initial instinct, suggest that meaningful predictions are more likely to be limited in scope and tied closely to well formed and carefully constructed research questions. These are exactly the sorts of predictions that don’t always get press! The tech sphere loves “scale” and it loves everything machines.

Still, over the next few years, we’re going to have to work out how to distinguish between astronomy and astrology in AI.

Where You Come From Is Gone: Why Our Anti-human AI Moment Needs Donna Haraway

“Where you come from is gone, where you thought you were going to never was there, and where you are is no good unless you can get away from it.”

Flannery O’Connor, Wise Blood

“But basically machines were not self-moving, self-designing, autonomous. They could not achieve man’s dream, only mock it. They were not man, an author to himself, but only a caricature of that masculinist reproductive dream. To think they were otherwise was paranoid. Now we are not so sure”

Donna Haraway, Cyborg Manifesto

There is a desire, in our current AI haunted moment, to defend “the human.” To engage with AI writing tools, image generation tools, or even deep learning methods of any kind, is criticized as potentially abandoning an essential humanity. It’s seen as giving up what makes us “really human,” namely the crafting of meaning that we share with other humans.

For example, during my podcast conversation with our teaching and learning center about generative AI and teaching, my co-guest, Professor Justin Rademaker worried about losing some of our humanity when we use AI to do the work of writing:

I think it’s important to be critical to ask ourselves what labor are we circumventing when we use A.I. to do writing. Is it labor that’s perfectly fine to circumvent or are we somehow stepping around an important part of being human? Right. And language exchange is, for me, I guess I’m biased. But that’s the heart of humanity, and being human.

Justin Rademaker, ODLI On Air “Generative A.I. in Teaching with Dr. Famiglietti & Dr. Rademaekers (Part 2)”

In another, higher profile podcast conversation with New York Times columnist Ezra Klein, novelist Adrian Tchaikovsky used the metaphor of Minecraft to suggest that the human crafting of fictional text and fictional worlds creates a human connection between author and reader that algorithmically generated content can’t replicate:

Minecraft uses procedurally-generated landscapes. […]And this is amazing. It’s just this whole world and no one else has ever seen this world. It’s only me and it’s incredible. […] at the same time, it’s kind of meaningless, because it is just being thrown […] together by an extremely sophisticated algorithm. But basically if you compare it to a world in a game that’s been crafted, there is a difference. And that world — the crafted world — will be a lot smaller, because you can’t just go on forever because obviously every inch of it has taken human work.

Adrian Tchaikovsky, The Ezra Klein Show, February 23 2023

Interestingly, this elevation of symbolic production as something essentially human is often paired with a hierarchy of that same symbolic production, with the very best “art” at the top, and other human symbol production somewhere below. For example, in an earlier moment in the podcast quoted above Ezra Klein asserts that:

What ChatGPT, what DALL-E-2, and what all the similar programs are able to do in terms of writing text and making images, they are able to make quite remarkable art, or stories, or essays. And in most cases, it will fall short of the best of what humans can do, but it can also go far beyond what most humans can do.

Ezra Klein, The Ezra Klein Show, February 23 2023 (Emphasis Mine)

While these are just podcast sources, the concerns expressed aren’t so far off the heavier debate of intention, meaning, and AI found in more scholarly venues.

I don’t disagree with the authors above that we want people to remain engaged with writing and thinking and writing as thinking. It’s the sense that we can define a “most human” activity and link it to symbol production that I want to push back against, and that I think Haraway helps us think past. The desire to set a boundary around “the human” and stabilize it is understandable, given the many forms of precarity that surround contemporary human existence. However, as I see these defense of the human/inhuman border spring up in response to AI, I’m always reminded of this passage from Haraway’s Cyborg Manifesto (which seems in our current moment of both AI writing and the struggle for trans liberation more prescient than ever).

The relationships for forming wholes from parts, including those of polarity and hierarchical domination, are at issue in the cyborg world. Unlike the hopes of Frankenstein’s monster, the cyborg does not expect its father to save it through a restoration of the garden—that is, through the fabrication of a heterosexual mate, through its completion in a finished whole, a city and cosmos. The cyborg does not dream of community on the model of the organic family, this time without the oedipal project. The cyborg would not recognize the Garden of Eden; it is not made of mud and cannot dream of returning to dust.

Donna Haraway, Cyborg Manifesto

Here, Haraway warns us against any singular, essential definition of identity. She’s particularly interested in avoiding essentially definitions of gender, of course, but given the “Manifesto’s” extended reflection on how the boundaries between human and machine and human and animal are broken down by (then) contemporary cybernetics and biological science, I think Haraway would be equally dubious of any singular definition of an essential human activity. Especially one that might be ranked, with some examples of symbol production (those deemed “art”) held up as “more human.” Than others.

Don’t misunderstand me. I’m not arguing that human beings and AI systems are in any way interchangeable. Like Haraway, I want us to resist the “informatics of domination” and imagine new cyborg futures. However, like Haraway, I think we must first let go of the comforting illusion that there is a clearly defined “human” that we can defend and return to, first.

The Coming Inversion

Right now, if you’re a college instructor using automated methods to check for AI generated plagiarism on your assignments, you’re mostly catching the sloppiest cheaters and letting more sophisticated ones through. What’s worse, very shortly you will probably be accusing honest students engaging with AI tools in ways they believe to be good faith and missing intentional cheaters entirely. Here’s why.

For starters, a variety of research shows that automated detection of AI writing is relatively easy to spoof. One study, famous for finding that AI plagiarism detection algorithms were biased against “non-native English writers,” also found that merely asking ChatGPT to re-write its response with the prompt “Elevate the provided text by employing literary language” caused detection rates of AI generated text to fall from 70% to 3.3%. Another, more theoretical, investigation of automated methods for detecting AI generated writing notes that even sophisticated methods of detection may be defeated by automated paraphrasing tools. In particular, they find that even methods designed to defeat paraphrasing can be defeated by recursive paraphrasing. They conclude that “For a sufficiently advanced language model seeking to imitate human text, even the best-possible detector may only perform marginally better than a random classifier.”

What does this mean, practically, for a college instructor in the classroom right now? It means the only plagiarists an automated detector can catch are the sloppiest kind. The ones who typed “write me a paper about Moby Dick” into ChatGPT and simply copy pasted the results into a word document. I would posit all of these students knew they were doing the wrong thing, and at least some may have made a hasty mistake after being pressed for time.

Meanwhile, more sophisticated and intentional cheaters can readily find methods designed to defeat detection. Automated paraphrasing (where a computer does a relatively primitive form of automatic word replacement) is a well known tool, and I saw plagiarists in my classes trying to use it to disguise material copy-pasted from Wikipedia or Google search results before ChatGPT was a thing (the ones I caught, alas, were probably the sloppy ones). Others may find “prompt engineering” methods designed to defeat detection on TikTok or elsewhere.

However, if we look down the road a few months, (keeping in mind my adage about any utterance about what AI will be doing after about this afternoon) this situation gets even worse. Honest students will be likely to use generative AI in ways that may trigger automated AI writing detection. That’s because Apple, Google, and Microsoft continue to work on integrating generative AI into their everyday product lineups. The official integration of AI based writing into tools like Microsoft Word and Google docs isn’t 100% rolled out yet, but it’s already easy to access. This, for example is the screen you see if you choose “Get Add Ons” in Google Docs right now:

Meanwhile on the homepage of widely used (and heavily advertised) computer-based grammar aid Grammarly, we can find the tool’s makers pitching their product by promising to provide “an AI writing partner that helps you find the words you need⁠—⁠to write that tricky email, to get your point across, to keep your work moving.”

I have little doubt that students, honest students, will avail themselves of these tools as they come on line. When I talk to students about what they think of AI tools (as I did this week to begin my Intro to Research Writing class) and I stress that I’m curious and just want to hear their honest thoughts, they tend to report being very impressed by the text that the tools produce. Some of them know the tools may produce incorrect information (though many others conflate them with search engines, an idea I hope to dissuade them of), but they generally say that tools like ChatGPT are good at producing “professional” sounding language (even if it might be a little “robotic”), and devising how to organize arguments “correctly.”

Some of this is doubtless due to students framing writing too heavily in rote, classroom forms like the five-paragraph essay, which my writing classes were always designed to break them of and now will only work doubly hard to do. But I don’t think that’s all of it. My own experimentation with ChatGPT suggests it can be fairly nimble at emulating genre features.

Furthermore, my own lived experience with writing tools makes me think it’s not unreasonable that people might come to depend on help from the tool to achieve certain formal features in writing. I can’t spell hardly anything without auto-correct anymore. When I need to use a complex word I don’t use frequently, I often drop it into google to get a dictionary definition (preventing me from, for example, confusing “venal” and “venial”)

So, we should expect text written by honest students to increasingly contain at least some AI generated language over the course of the next year or two. I don’t claim for a moment this is an unalloyed good, there’s a real risk of people losing their sense of authentic voice and thought as that happens! That’s something I think we’ll need to address as teachers, as I’ll discuss in just a bit! However, given the vast commercial interest in making these tools available, and the real problems they may solve for students, I don’t think we can expect students not to engage with them to help them rephrase clunky language, sort out difficult arguments, or perform other writing tasks.

Students who intentionally want to cheat, meanwhile, will have access to ever simpler to use methods to defeat instructors being able to automatically detect that they typed “computer write my essay” into a text box and used the result. Building a ChatGPT based “tool” that would automatically apply some clever prompt engineering to inputs to try to obfuscate that the output was written by ChatGPT would be trivial to do. I could stand up something in an afternoon, and so could you with a bit of copy-pasting of publicly available code (or maybe get GPT to write the code for you!). More advanced techniques, using automated paraphrasing or perhaps fine-tuning a model on an existing database of student writing (to get around the fact that Turnitin’s detection methods probably hinge on detecting typical student writing as much as detecting AI writing) would be more involved to set up, but once set up and offered as a “service” under some plausible excuse, easy to use.

So, where does that leave us, as instructors? Back where we started, with John Warner’s exhortation to Put Learning at the Center. Leaving our teaching constant and trying to use automated tools to police our way out of the problem is doomed to fail. Worse, it’s doomed to accuse the honest and miss those trying to intentionally cheat. In doing so, it will only underline that we’re not teaching writing relevant to the writing environment our students find themselves in.

That, ultimately, is what we must do, if teaching writing is to survive at all: rebuild our curriculum to focus on the skills that won’t be going away just because ChatGPT can write a boilerplate essay. Skills like writing as thinking, information literacy, genre analysis, rhetorical awareness and more. These are skills we have been teaching for a long time, of course, but too often they have been buried under assignments designed to produce a familiar artifact our colleagues in other departments would recognize as “college writing.” They must be uncovered and moved to the center of what we do!

AI Genre Mashup Assignment

During the fall of 2023, I assigned an AI powered Genre Mashup assignment as part of my 100 level First-Year Writing class. The assignment strove to use the ability of ChatGPT to quickly emulate various textual genres as a way to help students notice the composition choices authors made when writing for one genre or another.

The Assignment

First: choose one of the scenarios or topics from the list below:

Scenario/Topic
An announcement warning of dangerous weather in the area
A parable story demonstrating good moral behavior
A description of the forces that lead up to the War of 1812
A report about a recent town council meeting
A request for a one week extension on a recent assignment
A scene where two star-crossed lovers meet for the very first time
A speech by the king of the elves, calling on good folk to defend his kingdom from orcs
A scene where a hardboiled detective confronts a femme fatale
A description of a calm and uplifting scene from nature
A scene where down and out computer hackers defeat an evil corporation

Next choose one of the genres from the list below. Try to choose a genre that matches the topic/scenario:

Style/Genre
Harlequin romance
Cyberpunk Science Fiction
High Fantasy
Noir Mystery
Sonnet
History Textbook
Newspaper Article
Public Service Announcement
Email to a Professor
Passage from the Bible

Then head over to ChatGPT and ask it to write your chosen topic/scenario in the chosen style. For example, you might ask it to “Write a speech by the king of the elves, calling on good folk to defend his kingdom from orcs in the style of high fantasy.” You would get output like this (DON’T STOP HERE, THERE ARE MORE STEPS):

Next go back to the table of genres and choose one that does NOT match the topic, so, to stick with my example I might choose “Public Service Announcement”

Now ask ChatGPT to write the same scenario/topic with this mismatched style. So I would ask “Write a speech by the king of the elves, calling on good folk to defend his kingdom from orcs in the style of a public service announcement.” And get output like this (THE MOST IMPORTANT STEP HAPPENS NEXT): 

Finally: copy and paste the output of BOTH ChatGPT prompts into a word processor (Word, G Docs, whatever) document. Below the pasted in content, write a short (200-400 word) reflection on the following: 

1) What did you notice about the techniques used by ChatGPT to emulated the requested genre? What did the software do to write something that “sounded” like High Fantasy, or a Newspaper article, or Noir Mystery.

2) Do you think it captured the techniques typical to this genre well? Why or why not? 

3) How did the techniques used by ChatGPT to emulate each of the two genres you selected differ? What was different about how these two passages were written?

4) How does the mismatch between the selected scenario and genre show up in the second example you generated. What about this example might seem funny, weird, or just wrong and why?

50 Ways of Looking at the Same Prompt

Or maybe just one way of looking? Let’s see.

I decided to run a simple prompt based on one of my assignments through some LLMs multiple times and see what happened. In particular, I was interested in seeing what sorts of common patterns might emerge from the responses. In part, this was inspired by scoring student responses to this same assignment (which I permitted students to use ChatGPT to complete, so long as they acknowledged and documented their use) and noticing what seemed to be common patterns in the work submitted by students that had used ChatGPT.

What I Did

My method for this experiment was simple, I wrote a very basic python script that submitted the same prompt to the ChatGPT model via the API 50 times, and then saved each response to a text file, like so:

def chat_response(state):
    response = openai.ChatCompletion.create(
    model="gpt-3.5",
    messages=state
  )
    return response




for i in range(50):
    text_file = open("GPT_Onion/4article" + str(i) + ".txt", "w", encoding="utf-8")
    GPT_instructions = "Write an article for The Onion about something relevant to West Chester University students. This article must begin with a headline appropriate to an Onion article and be 200-300 words long"
    message_chain = [
        {"role": "system", "content": "You are a helpful assistant"},
    ]
    message_chain.append({"role": "user", "content": GPT_instructions})
    response = chat_response(message_chain)
    text_file.write((response['choices'][0]['message']['content']) + "\n")
    text_file.close

The prompt was taken more or less verbatim from my assignment.

Why 50 responses? Because my first attempt tried to generate 100 responses and timed out halfway through! But seriously, I have no idea what a representative sample of this sort of output would be. If I was looking for patterns in a corpus of a thousand student responses, or ten thousand, or a million, there are statistical techniques that would let me choose a good representative sample (don’t ask me what these are off the top of my head, I just know they exist and I could find them if I needed them). But how big is the “sample” of latent LLM responses? How many responses could the machine generate? How do I know if the patterns I am seeing are representative of how the machine behaves or just a fluke random run of something I happened to stumble upon?

¯\_(ツ)_/¯

I was able to make 50 easily, and read through 50 in a reasonable span of time. There are a couple of patterns I think seem interesting, even at this small sample size. The others are worth thinking about in a sort of rough-and-ready way but aren’t conclusive.

I read through the 50 articles generated by ChatGPT and coded them for main topics. I also noted examples where the response returned seemed very similar to a prior response, and I noted what named people were in each response.

I then repeated the generation step using GPT-4 and quickly skimmed those responses for main topics and named people.

This experiment cost me $1.70, the vast majority of that being the $1.43 I spent on GPT-4 API responses.

What Was in the Articles ChatGPT and GPT4 Wrote

The outputs from both ChatGPT and GPT4 seemed to show some repeated patterns in the content they produced. The content produced by GPT4, however, seemed somewhat less repetitive in terms of strict form, with repetition more frequently happening on a thematic level.

Just for fun, I went back to that old DH standby, the word cloud, and visualized the output I got from both LLMs. Here’s the result from the ChatGPT articles:

As you can see from the word cloud above, ChatGPT seems to have a very particular idea of what a “common” surname for a student/faculty member in the US looks like. In addition to “Thompson” it liked “Stevenson,” “Johnson,” and “Watson.” In fact of the 48 named people in the sample, 36 had some sort of “-son” surname.

The presence of the word “Time” in the word cloud probably reflects the frequent use of Time Travel as a comedic trope in the generated articles. Seven of fifty (14%) invoked the time-travel theme, according to my hand count. Twelve articles of fifty (25%) invoked the idea of “discovery” (also present on the word cloud) in which students either “discover” something obvious about campus (for example and article ChatGPT titled “West Chester University Students Shocked to Discover Library Contains Actual Books”) or something unexpected (“West Chester University Student Discovers Multiverse in Local Laundromat Dryer”).

Not present on the word cloud is the theme of student laziness, which appeared in sixteen of fifty (32%) of the ChatGPT articles, by my count. This somewhat abstract theme was rarely explicitly invoked, but clearly informs the humor of articles like “West Chester University Students Discover Time Travel Portal in Campus Library, Use it to Attend Classes From Their Dorm Rooms,” “West Chester University Students Request Permission to Skip All Classes and Still Graduate on Time,” and “West Chester University Student Discovers How to Freeze Time Between Classes, Uses Extra Time to Binge-watch Netflix Series.” (That first example is the trifecta, discovery, time travel, and laziness).

At least four of the ChatGPT generated articles were almost exact duplicates of one another, with extremely similar headlines and content. For example, the articles “West Chester University Student Discovers Time Travel, Uses Ability to Attend Zero 8 a.m. Classes” and “West Chester University Student Discovers Time Travel, Uses It to Avoid 8 a.m. Classes.” In addition to similar titles, they open with similar opening sentences. The first opens:

West Chester, PA – In a groundbreaking development that has professors baffled and the administration scrambling, West Chester University student Derek Thompson has reportedly unlocked the secret to time travel, enabling him to avoid the dreaded early morning classes that plague his peers.

And the second begins:

West Chester, PA—In a groundbreaking discovery that has left the scientific community and West Chester University faculty scratching their heads, local student Max Thompson reportedly stumbled upon the secret of time travel—solely for the purpose of avoiding those dreaded 8 a.m. classes.

They then proceed with roughly equivalent paragraphs, similar quotes, etc. If these had been turned in by two students independently, I would have assumed plagiarism, either from each other or a common source.

GPT4, in contrast, was not nearly so formulaic. Here’s the word cloud!

GPT4 was overall less formulaic than ChatGPT. It did not, for example, name every character “Thompson.” However, as the cloud above suggests, it did have an inordinate fondness for Squirrels. Fourteen of fifty articles (28%) were about squirrels in some capacity (“WCU Squirrel Population Demands Representation in Student Government,” “Climate Crisis Hits West Chester University: Squirrels Reportedly Hoarding Cool Ranch Doritos,” “Local Squirrel Ascends to Presidency of West Chester University”). Many of these focused on the idea of a squirrel attaining a leadership position on campus.

The themes of discovery and student laziness were less prominent in this sample, but still present, with ten and seven examples respectively. GPT4 also wrote several (six) articles that satirized the high cost of college, a topic ChatGPT hadn’t engaged with. One, entitled “West Chester University Declares Majors Irrelevant; Students Now Just Majoring in Debt” was particularly cutting. It imagines the university president (correctly identified by GPT4) “explaining, ‘We figured, why not prep our students for the most reliable outcome of their academic journey? Crushing financial burden.'” and “The notorious ‘Rammy’ statue was promptly replaced with a huge cement dollar sign, and the radio station WCUR’s playlist was updated with “Bills, Bills, Bills” by Destiny’s Child on a loop.”

There were no near duplicate articles in this sample. While two articles had almost identical headlines (“West Chester University Debuts New Major: Advanced Procrastination Studies” and “West Chester University Announces Revolutionary New Major: Advanced Procrastination Studies”) the underlying articles treated the theme presented in the title quite differently.

Using ChatGPT to Analyze ChatGPT

After hand-tagging themes in the articles generated, I wrote a script that fed the articles back into ChatGPT and asked it to extract titles, themes, and named people. I was curious to see how well the software would do at this analytic task.

def chat_response(state):
    response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=state
  )
    return response



outfile = open("GPT_Onion/4topics.csv", "a", encoding="utf-8")

for i in range(50):
    text_file = open("GPT_Onion/4article" + str(i) + ".txt", "r", encoding="utf-8")
    GPT_instructions = text_file.read()
    message_chain = [
        {"role": "system", "content": "You are a helpful agent that extracts headlines, main topics, and named people from short articles you are given. For each text you read, return the following separated by semi-colons: 1) the article's headline 2) a list of one to three word topics that describes the main themes of the article separated by commas 3)a list of named people found in the article, separated by commas. Only return this semi-colon separated list and nothing else. Base your response only on the text you are given."},
    ]
    message_chain.append({"role": "user", "content": GPT_instructions})
    response = chat_response(message_chain)
    outfile.write(str(i) + ";" + (response['choices'][0]['message']['content']) + "\n")
    text_file.close
    print(i)
    time.sleep(2)
outfile.close()

The results here were really interesting. ChatGPT did a perfect job extracting titles (which were consistently marked) and also named people. I actually used it’s extracted people names to compute the percentage of “-son” surnames above. Finding and extracting “named people” is a non-trivial data analysis task and it absolutely nailed it first try with a very simple prompt. No hallucinations were observed in this run of 50 examples.

The topics extracted were less satisfying, but they weren’t inaccurate. It often picked up on the theme of “discovery” which I tagged, but didn’t always. For example it listed only “library, students, books” as topics for “West Chester University Students Shocked to Discover Library Contains Actual Books.” It never listed “laziness” as a topic. However, this more abstract topic was only really visible, even to me, after comparing multiple examples of articles, and I wasn’t able to have ChatGPT track all the articles at once without running out of context window.

What Does it All Mean?

Here’s the TL;DR:

Basically it looks like multiple responses to a common prompt converge around common themes, both for ChatGPT and GPT4. Probably a little basic prompt engineering, perhaps even automated with mad-lib style prompt modifiers, would shake that up a bit, that’s something I want to test.

From a pragmatic, teaching point of view, getting a sample of the 20-50 examples of the responses of ChatGPT or GPT4 to see what the themes it reaches for are might be informative. Not that they would diagnose plagiarism all that conclusively, since the themes LLMs use are likely not so unlike those students might use (though in prior examples of this prompt, students were much less likely to write parodies of student laziness than GPTs were). However, it might give you a sense of the “AI common sense” that you might then want to engage with/push back against/complicate/push past/etc.

From the point of view of understanding machine writing, it’s interesting to see the recurrence of themes, ideas, terms, and sometimes even whole structures in the responses generated. I’ll probably run off some further examples, especially in GPT4, to see if I get more “collisions” where the LLM generates very similar responses to the same prompt.

From the perspective of trying to understand where LLMs go next, I think the contrast between the somewhat formulaic (and rarely funny) “Onion Articles” generated by the LLM and it’s huge success doing content processing work (like identifying named people and topics) is informative. I continue to think that LLMs ability to process text will be more important than their ability to compose it in the long run.

Let’s Explore the Latent Space with Presidents

Ok, so, I had the AI image generator Stable Diffusion XL generate 100s of “selfies” of US Presidents. Let me explain.

But, before I even start on that, let me state that I don’t intend this as any sort of endorsement of AI image generators as a technique. I understand how problematic they are for artists. My goal here is to understand the tool, not to celebrate it (though I do find some of it’s glitchy output quite pleasing sometimes). One reason I chose US Presidents for this project is that, as public figures of the US government, at least the figures I’ll be representing here are already somewhat “public domain.”

Richard Milhous Nixon snaps a selfie with the little known 1970 Samsung Galaxy Mini

So, we know that image generators are able to do a fair amount of remix work, translating subjects from one style into another, that’s how you make something like Jodorowsky’s Tron. I was curious to learn more about this process of translation. How well, and how reliably, could an image generator take a subject that never appeared in a given genre and represent that subject in that genre? How would it respond when asked to represent a subject in an anachronistic genre? Would it matter if the subject asked for had many different representations in the training data or just a few? Which genre conventions would the system reach for to communicate the requested genre?

I also wanted to get beyond cherry picking single images and get a slightly larger sample of images I could use to start to get a sense of trends. I was less interested in what one could, with effort and repetition, get the tool to do, and more what it’s affordances were. What it would tend to encourage or favor “by default” as it were.

So I decided to take a stab at making a run of many images using the recent XL version of the popular Stable Diffusion AI Image generator, mostly because it’s something I can download and run locally on my own machine, and because it’s incorporated into the Huggingface Diffusers library, which makes scripting with it easy enough for… well, an English Professor!

I decided to use US Presidents as subjects for the series, because they are a series of fairly well-known well-documented people spanning 230 odd years of history. That meant I could pick a recent image genre and guess that most of them would not be represented in this genre in the training data (it’s not impossible some human artist’s take on “Abraham Lincoln taking a selfie” is in the data set, but “Franklin Pierce taking a selfie?” I doubt it). The system would have to translate them into it. At the same time, some Presidents have vastly more visual culture devoted to them than others, both because of relative fame and because recent presidents live in an era with more visual recording. I was curious to know if I could learn anything about how this difference in training data might influence the results I got from the generator. Would it be more adept at translating subjects it had more visual data about?

Also, the logic of “I’m looking for my keys here where the light is better” applies. A list of US presidents was easy to find online and drop into a CSV file for scripting.

I went with the “selfie” genre because we know its one that image generators can do fairly well. There have already been some great critiques of how image generators apply the cultural conventions of the “selfie” genre in anachronistic and culturally inappropriate ways. I was curious to see how the “selfie smile” and other selfie genre conventions might pop up in anachronistic images, and to look for patterns in how these genre conventions appeared.

A rough simulacrum of Dwight D. Eisenhower extends his arm to take a big smiling selfie…

So I ran off a series of 10 selfies each of all 44 unique presidents (sorry Grover Cleveland, no double dipping) using the prompt “A selfie taken by [President’s Name].” I also asked for “A portrait of [President’s Name] using the same random seed, to see how that compared. I also asked for “An official photograph of [President’s Name] descending the stairs of Air Force One” but that prompt mostly revealed Stable Diffusion rather struggles to represent aircraft.

The fact that that isn’t a very good representation of Woodrow Wilson is the LEAST of this image problems.

I’ve take a perusal through the results, and while I think my sample size is still very small, I think I see some things I’d like to count up and look for trends with. I think I’ll do this slowly, one president a day for the next few months, and post what I see in each example on Bluesky/Mastodon as I go. In particular, I’m curious about a couple of trends I think I notice in the images.

First, I’m curious about how the media forms that Stable Diffusion associates with “selfie” seem to change over time. For example, for the first few US presidents, the usual result for “selfie” looks like a painting (with the exception of a few odd, photorealistic hypermodern breakthroughs)

(Left: Typical painting style Washington “selfie” Right: Washington cosplay uncanny valley horror thing)

However, by the time you get to John Quincy Adams and Andrew Jackson, the “selfies” appear frequently as if they were early photographs (perhaps daguerreotypes) rather than paintings, while the “portraits” remain paintings. This despite the fact that (so far as I can tell from a bit of googling) only a handful of photographs were taken of either man, and those were taken very late in life.

A faux-photograph of Andrew Jackson.

Also, not the simulated wear at the corners of that image. There seems to be a lot of that in the various “selfies.” Simulated wear and cracks, simulated tape holding them to simulated albums. The “portraits” in contrast, tend to have frames. I’m curious to see if there are trends there. Does the machine simulate “age” in the images of older subjects, even when asked to simulate an anachronistic genre? It doesn’t always (see Washington above) so is there a trend in how frequently that happens?

Second, I’m curious to see how the anachronistic genre conventions of the selfie are applied across time. So, while fans of Calvin “Silent Cal” Coolidge will be thankful to see he has NOT been rendered with a “selfie smile”…

Sedate Coolidge is sedate

… some breakthroughs of “selfie style,” sometimes mashed up with period media artifacts, does break through, as in this image where Woodrow Wilson’s arm extends to the corner of the image frame, holding up a small, light smart-phone-sized camera that inexplicably also shoots black and white film with a noticeable grain and a depth of field measured in microns.

Daguerro-phone?

Or this one, where a phone is mashed up with period camera hardware to make some kind of dieselpunk accessory for a Harry Truman perhaps being played by Eugene Levy:

Honestly, a phone with that much lens might be cool…

At first glance it seems like these style moves become more common the closer you get to the present, even though they don’t really make sense until 2007 or so.

So, those are my first pass instincts. Going to take a closer look at each and do a count, see what I can see. Stay tuned on Bluesky and Mastodon.

Writing, AI, and Mortality

In an interview with the New York Times this morning, Joyce Carol Oates suggests that the written word provides a form of immortality, one worth making sacrifices in the moment to achieve:

People are seduced by the beauty of the close-at-hand, and they don’t have the discipline or the predilection or the talent, maybe, to say: “I’m not going to go out tonight. I’m not going to waste my time on Twitter. I’m going to have five hours and work on my novel.” If you did that every day, you’d have a novel. Many people say, “I’m going to pet my cat” or “I’m with my children.” There’s lots of reasons that people have for not doing things. Then the cats are gone, the children move away, the marriage breaks up or somebody dies, and you’re sort of there, like, “I don’t have anything.” A lot of things that had meaning are gone, and you have to start anew. But if you read Ovid’s “Metamorphoses,” Ovid writes about how, if you’re reading this, I’m immortal.

It is this sense that, by writing things down we might achieve for our memories and minds the kind of immortality offered to our bodies by our genes, that perhaps so closely ties the written word to our sense of identity.

This identity connection, then, may also be one of the things that makes us so apprehensive about machines that can write. If my meaning, my memory, is difficult to distinguish in written form than symbols inscribed by a thoughtless computer process, than how will anything of my being survive in writing?

Of course, for writing to be in any sense alive, it must have a reader. Otherwise it’s just dead marks on a page. The reader, though, has to reconstruct meaning for themselves and in a sense they always do it wrong. All meaning making is a form of translation, and while that doesn’t mean all the author’s meaning must always be completely erased (good translations exist) it also means the author’s meaning is never fully revived. Perhaps that is what Foucault meant by the “death of the author.” Ovid is wrong. The reader revives something, but Ovid stays dead.

However, there is an even more dire argument against the notion that writing might help us overcome the horrific ephemeral nature of existence and transcend time and mortality. Namely, most writing is as ephemeral as anything else. It may find a reader or two in the moment it is produced, but then it fades into obscurity and is never read again. Ovid is, in terms of written work, the WWII bomber returning to base with bullet holes showing all the places an epic poem can be shot through by time and still survive. The very, very rare author who spans millennia. Ovid had many contemporaries, some may have even been stars in their day. They are as gone now as anything else.

How long will Joyce Carol Oates last? Who knows. Possibly a very long time! But, she has already done better than the vast majority of her peers. If the internet has taught us anything, it has taught us that there are more people in the world eager to write than there are people to read all the words those eager authors would produce.

So then, let us let go of the notion that writing is immortality, and along with it our desire to have our Authorial Intent recovered in some future time. Let us not worry about AI washing away the words that would have let us live forever. They were always already scrawled on a beach at the edge of the surf. They were going to be washed away, like the rest of us. Make peace with that.

If you want to transcend the measly portion that is our little human lifespan and touch generations to come, let me suggest another approach. Plant a long lived fruit or nut tree. In the northeast US, where I am, apples and walnuts are good choices, they both will run for centuries. A hickory will be around for a very long time, if you want to be a bit less mainstream. If you are lucky enough to live where olives will grow, one of those will last millennia. You could be more immortal that Ovid with an olive, if everything breaks your way.

Pens down. Get planting.

css.php