The Road Ahead

Cracking the Turing Test

We started the last chapter with Turing’s definition of intelligence –or thinkking, to be precise– by emphasizing that Turing didn’t expect his imitation game to be implemented as an objective test of “intelligence” for computers. Yet, it has been interpreted as such by myriad computer experts and non-experts alike over the years, and many attempts have been made to operationalize it. It doesn’t help that Turing claims he hoped by the year 2000, we would have computers that could, in principle, beat 30% of the judges in such a setup. Many took this as an objective milestone to claim we reached strong AI.

Any concrete implementation of the imitation game runs into the practical issues of human gullibility and biases, which makes it almost impossible to select judges who are guaranteed not to fall for cheap tricks. These issues alone explain all the occasions before 2022 when someone claimed they beat the Turing Test.

However, starting in 2023, for the first time, we have technology that is eerily close to what many people would claim a worthy contender for the imitation game: large language models (LLMs). Some of the wildest claims about modern LLMs like GPT-4 seem to imply the most powerful of these models are capable of human-level reasoning, at least in some domains.

But can modern language models pass the Turing Test? Again, this is hard to evaluate objectively because there are so many implementation details to get right. But I, and most other experts, don’t believe we are there yet. For all their incredible skills, LLMs fail in predictable ways, allowing any sufficiently careful and well-informed judge to detect them. So no, my bet is current artificial intelligence can’t yet beat the imitation game, at least in the spirit originally proposed by Turing.

In any case, there’s no GPT-4 out there consistently tricking humans into believing it is one of us.

Or is it? And how would we know?