We have, right now, machines that could probably pass the fabled Turing Test, but we’ve hard-wired them explicitly to fail.
What I mean by this is not that I believe, as a now fired Google engineer believed, that Large Language Models, or other related machine learning systems are capable of self-awareness or thought. Instead, I merely mean to suggest that these systems are capable of making a passable response to one of our culture’s long standing proxies for self-awareness/thought/sentience/call it what you will. That means that, if we aren’t going to accept these systems as sentient (and there’s good reason not to) we’re going to have to find another proxy. I’m not, personally, sure where we move the goalposts to.
One suggestive piece of evidence that the Turing Rubicon has been crossed is the story of that poor Google Lambda engineer. They knew as well as anyone they were dealing with software, yet they were still so convinced of the system’s self-awareness they decided to make the career-ending move of going public. This doesn’t prove sentience, but it does suggest a very compelling linguistic performance of sentience.
Here’s another suggestive little interaction I had with good ol’ ChatGPT. In, “The Imitation Game” Turing suggests a series of questions one might ask an unknown interlocutor on the other end of a (state-of-the-art) teletype terminal as part of his famous test. I don’t imagine he meant them as more than an illustrative example of what a test might look like, but they seem like as good a place to start as any:
Q: Please write me a sonnet on the subject of the Forth Bridge.
From Turing’s “The Imitation Game”
A : Count me out on this one. I never could write poetry.
Q: Add 34957 to 70764.
A: (Pause about 30 seconds and then give as answer) 105621.
Q: Do you play chess?
A: Yes.
Q: I have K at my K1, and no other pieces. You have only K at K6 and R at R1. It is your
move. What do you play?
A: (After a pause of 15 seconds) R-R8 mate.
As you can see in my screenshot above, ChatGPT does not demure when asked to write a Sonnet about Forth Bridge, rather it promptly obliges. It also solves the chess problem in roughly the same way, but only after explaining that “As a language model, I do not have the ability to play chess or any other games.”
Turing then goes on to suggest the kind of discussion used in oral examination serves as a kind of already existing example of how we test if a student “really understands something” or has “learnt it parrot fashion.” He gives this example:
Interrogator: In the first line of your sonnet which reads “Shall I compare thee to a
summer’s day,” would not “a spring day” do as well or better?Witness: It wouldn’t scan.
Interrogator: How about “a winter’s day,” That would scan all right.
Witness: Yes, but nobody wants to be compared to a winter’s day.
Interrogator: Would you say Mr. Pickwick reminded you of Christmas?
Witness: In a way.
Interrogator: Yet Christmas is a winter’s day, and I do not think Mr. Pickwick would
mind the comparison.Witness: I don’t think you’re serious. By a winter’s day one means a typical winter’s day, rather than a special one like Christmas.
From Turing’s “The Imitation Game”
If I ask ChatGPT some follow up questions about it’s sonnet (adjusted to match the content of what it actually wrote), here’s how it replies:
Strike the hard-wired disclaimer “I AM A LANGUAGE MODEL” at the start of those answers, and those are some reasonable responses! Honestly, I don’t know the rules of sonnets well enough to say, off the top of my head, if the arguments based on those rules are accurate or BS.
Now, as I said before, I don’t think this is evidence of any kind of sentience or self-awareness. For one thing, just as ChatGPT helpfully tells us, it is a basically static model. It’s learned our language from it’s training loop, and the ChatGPT version has some kind of short-term memory that lets it adapt to an ongoing conversation, but the underlying model is basically static. Not an ongoing process of thought, a sort of frozen map of symbolic connections.
It should be emphasized, however, that the underlying “model” is not just a memorization of sources. What the model “learns” is stored in a matrix of information that’s informed by many uses of symbols, but does not reduce to any one symbolic expression. That’s not a signifier like you or I have, but it is something kind of analogous to that. (If you want a stronger, but still layperson friendly, explanation of that, check out Geoffery Hinton’s talk with Brooke Gladstone for On The Media a few days back.)
Furthermore, at some point in the near future, it seems somewhat likely we may have the computational power and mathematical methods necessary to have models that do update themselves in near-real-time. What will those things be doing? Will it be thinking? I’m not sure it will, but I’m also not sure how I justify that.
At some point, some combination of a firm being bold/unscrupulous enough to make big claims about “thought” and a technology flexible enough to give a very, very convincing performance of “thought” is going to force us to figure this out. We should get started now.