A Surprising Shortcoming of ChatGPT

Mainstream media coverage of AI titans like ChatGPT often gives the impression that they’re awesome game changers—already virtually flawless and reliable.

It would be foolish, one gathers, not to jump wholeheartedly on the AI bandwagon, to have any doubts at all about the blessings that AI can confer.

And, to be sure, ChatGPT makes an excellent first impression. Ask it a detailed question, and it will typically provide a thoughtful and thought-provoking response within seconds. Its texts are full of well organized and apparently accurate information.

Some acquaintances have nothing but praise for the program and have incorporated it into their daily routines.

So it’s all the more surprising that it recently showed some unexpected weaknesses.

Earlier this year, as reported in the German news magazine Der Spiegel, software engineer Robert Caruso arranged for a chess match between ChatGPT and a chess program being run on a 1977 Atari 2600.

The oldtimer had a mere 128 bytes. When playing chess, it could only look one to two moves ahead.

Before the match got under way, ChatGPT assured Caruso that it understood how to play chess, and it boasted that it would be looking 10–15 moves ahead.

But the experiment soon turned into a rout. Atari won with breathtaking ease, and its opponent never had a chance.

ChatGPT made one serious blunder after another, sometimes confusing a bishop for a rook and, on other occasions, failing to recognize a threatened pawn fork.

Overall, it had trouble following the flow of the game, even after it was supplied with standard chess notation.

A few years ago, when helping to coach chess at Manchester-Gate Elementary, none of the students there ever made so many egregious mistakes in a single game.

So how did this happen? How did a 1970s David bash and smash a cutting-edge 21st-century Goliath?

It turns out that, although ChatGPT can draw on vast digital resources and deftly use probabilities to formulate its outputs, it’s not that adept at following rules and engaging in logical thinking.

Other programs, such as Deep Blue, are designed along different lines and can defeat even the best human players.

But a failure to play decent chess is just one reason why ChatGPT and other AI programs need to be considered critically, at best as works in progress.

The data that they train on hail from all parts of the digital landscape, including the dubious and unsavory parts.

To date, some AI offerings, including ChatGPT, have sometimes exhibited racist tendencies as well as gender bias.

Also, as author Karen Hao notes in Empire of AI, an ongoing tension has existed in OpenAI, the company that produces ChatGPT, between the desire to build a safe system with ethical guardrails and the drive to bring profitable products to the public before its competitors can.

Given that unavoidable conflict in AI companies, it’s not clear that the first goal—developing AI products that serve humanity’s best interests—will always prevail.

And then there’s the matter of AI’s occasional inclination to fabricate—or “hallucinate,” as industry insiders call it. ChatGPT and similar programs have provided falsehoods now and then instead of actual facts.

Given such circumstances, it seems best to remain cautious about the actual merits and promise of this new technology. It needs to be used prudently, and its responses need to be verified.

Sharing valid and trustworthy information with others is a serious matter.

Let´s face it: It’s a lot more than just a game.

Author