The Coming Inversion

Right now, if you’re a college instructor using automated methods to check for AI generated plagiarism on your assignments, you’re mostly catching the sloppiest cheaters and letting more sophisticated ones through. What’s worse, very shortly you will probably be accusing honest students engaging with AI tools in ways they believe to be good faith and missing intentional cheaters entirely. Here’s why.

For starters, a variety of research shows that automated detection of AI writing is relatively easy to spoof. One study, famous for finding that AI plagiarism detection algorithms were biased against “non-native English writers,” also found that merely asking ChatGPT to re-write its response with the prompt “Elevate the provided text by employing literary language” caused detection rates of AI generated text to fall from 70% to 3.3%. Another, more theoretical, investigation of automated methods for detecting AI generated writing notes that even sophisticated methods of detection may be defeated by automated paraphrasing tools. In particular, they find that even methods designed to defeat paraphrasing can be defeated by recursive paraphrasing. They conclude that “For a sufficiently advanced language model seeking to imitate human text, even the best-possible detector may only perform marginally better than a random classifier.”

What does this mean, practically, for a college instructor in the classroom right now? It means the only plagiarists an automated detector can catch are the sloppiest kind. The ones who typed “write me a paper about Moby Dick” into ChatGPT and simply copy pasted the results into a word document. I would posit all of these students knew they were doing the wrong thing, and at least some may have made a hasty mistake after being pressed for time.

Meanwhile, more sophisticated and intentional cheaters can readily find methods designed to defeat detection. Automated paraphrasing (where a computer does a relatively primitive form of automatic word replacement) is a well known tool, and I saw plagiarists in my classes trying to use it to disguise material copy-pasted from Wikipedia or Google search results before ChatGPT was a thing (the ones I caught, alas, were probably the sloppy ones). Others may find “prompt engineering” methods designed to defeat detection on TikTok or elsewhere.

However, if we look down the road a few months, (keeping in mind my adage about any utterance about what AI will be doing after about this afternoon) this situation gets even worse. Honest students will be likely to use generative AI in ways that may trigger automated AI writing detection. That’s because Apple, Google, and Microsoft continue to work on integrating generative AI into their everyday product lineups. The official integration of AI based writing into tools like Microsoft Word and Google docs isn’t 100% rolled out yet, but it’s already easy to access. This, for example is the screen you see if you choose “Get Add Ons” in Google Docs right now:

Meanwhile on the homepage of widely used (and heavily advertised) computer-based grammar aid Grammarly, we can find the tool’s makers pitching their product by promising to provide “an AI writing partner that helps you find the words you need⁠—⁠to write that tricky email, to get your point across, to keep your work moving.”

I have little doubt that students, honest students, will avail themselves of these tools as they come on line. When I talk to students about what they think of AI tools (as I did this week to begin my Intro to Research Writing class) and I stress that I’m curious and just want to hear their honest thoughts, they tend to report being very impressed by the text that the tools produce. Some of them know the tools may produce incorrect information (though many others conflate them with search engines, an idea I hope to dissuade them of), but they generally say that tools like ChatGPT are good at producing “professional” sounding language (even if it might be a little “robotic”), and devising how to organize arguments “correctly.”

Some of this is doubtless due to students framing writing too heavily in rote, classroom forms like the five-paragraph essay, which my writing classes were always designed to break them of and now will only work doubly hard to do. But I don’t think that’s all of it. My own experimentation with ChatGPT suggests it can be fairly nimble at emulating genre features.

Furthermore, my own lived experience with writing tools makes me think it’s not unreasonable that people might come to depend on help from the tool to achieve certain formal features in writing. I can’t spell hardly anything without auto-correct anymore. When I need to use a complex word I don’t use frequently, I often drop it into google to get a dictionary definition (preventing me from, for example, confusing “venal” and “venial”)

So, we should expect text written by honest students to increasingly contain at least some AI generated language over the course of the next year or two. I don’t claim for a moment this is an unalloyed good, there’s a real risk of people losing their sense of authentic voice and thought as that happens! That’s something I think we’ll need to address as teachers, as I’ll discuss in just a bit! However, given the vast commercial interest in making these tools available, and the real problems they may solve for students, I don’t think we can expect students not to engage with them to help them rephrase clunky language, sort out difficult arguments, or perform other writing tasks.

Students who intentionally want to cheat, meanwhile, will have access to ever simpler to use methods to defeat instructors being able to automatically detect that they typed “computer write my essay” into a text box and used the result. Building a ChatGPT based “tool” that would automatically apply some clever prompt engineering to inputs to try to obfuscate that the output was written by ChatGPT would be trivial to do. I could stand up something in an afternoon, and so could you with a bit of copy-pasting of publicly available code (or maybe get GPT to write the code for you!). More advanced techniques, using automated paraphrasing or perhaps fine-tuning a model on an existing database of student writing (to get around the fact that Turnitin’s detection methods probably hinge on detecting typical student writing as much as detecting AI writing) would be more involved to set up, but once set up and offered as a “service” under some plausible excuse, easy to use.

So, where does that leave us, as instructors? Back where we started, with John Warner’s exhortation to Put Learning at the Center. Leaving our teaching constant and trying to use automated tools to police our way out of the problem is doomed to fail. Worse, it’s doomed to accuse the honest and miss those trying to intentionally cheat. In doing so, it will only underline that we’re not teaching writing relevant to the writing environment our students find themselves in.

That, ultimately, is what we must do, if teaching writing is to survive at all: rebuild our curriculum to focus on the skills that won’t be going away just because ChatGPT can write a boilerplate essay. Skills like writing as thinking, information literacy, genre analysis, rhetorical awareness and more. These are skills we have been teaching for a long time, of course, but too often they have been buried under assignments designed to produce a familiar artifact our colleagues in other departments would recognize as “college writing.” They must be uncovered and moved to the center of what we do!

AI Genre Mashup Assignment

During the fall of 2023, I assigned an AI powered Genre Mashup assignment as part of my 100 level First-Year Writing class. The assignment strove to use the ability of ChatGPT to quickly emulate various textual genres as a way to help students notice the composition choices authors made when writing for one genre or another.

The Assignment

First: choose one of the scenarios or topics from the list below:

An announcement warning of dangerous weather in the area
A parable story demonstrating good moral behavior
A description of the forces that lead up to the War of 1812
A report about a recent town council meeting
A request for a one week extension on a recent assignment
A scene where two star-crossed lovers meet for the very first time
A speech by the king of the elves, calling on good folk to defend his kingdom from orcs
A scene where a hardboiled detective confronts a femme fatale
A description of a calm and uplifting scene from nature
A scene where down and out computer hackers defeat an evil corporation

Next choose one of the genres from the list below. Try to choose a genre that matches the topic/scenario:

Harlequin romance
Cyberpunk Science Fiction
High Fantasy
Noir Mystery
History Textbook
Newspaper Article
Public Service Announcement
Email to a Professor
Passage from the Bible

Then head over to ChatGPT and ask it to write your chosen topic/scenario in the chosen style. For example, you might ask it to “Write a speech by the king of the elves, calling on good folk to defend his kingdom from orcs in the style of high fantasy.” You would get output like this (DON’T STOP HERE, THERE ARE MORE STEPS):

Next go back to the table of genres and choose one that does NOT match the topic, so, to stick with my example I might choose “Public Service Announcement”

Now ask ChatGPT to write the same scenario/topic with this mismatched style. So I would ask “Write a speech by the king of the elves, calling on good folk to defend his kingdom from orcs in the style of a public service announcement.” And get output like this (THE MOST IMPORTANT STEP HAPPENS NEXT): 

Finally: copy and paste the output of BOTH ChatGPT prompts into a word processor (Word, G Docs, whatever) document. Below the pasted in content, write a short (200-400 word) reflection on the following: 

1) What did you notice about the techniques used by ChatGPT to emulated the requested genre? What did the software do to write something that “sounded” like High Fantasy, or a Newspaper article, or Noir Mystery.

2) Do you think it captured the techniques typical to this genre well? Why or why not? 

3) How did the techniques used by ChatGPT to emulate each of the two genres you selected differ? What was different about how these two passages were written?

4) How does the mismatch between the selected scenario and genre show up in the second example you generated. What about this example might seem funny, weird, or just wrong and why?