Review: We Put ChatGPT-4, Bing Chat, and Bard to the Test
We designed trials to compare our chatbot overlords from OpenAI, Microsoft, and Google. They’re smart, they’re interactive—and they’re pretty little liars….
Imagine trying to review a machine that, every time you pressed a button or key or tapped its screen or tried to snap a photo with it, responded in a unique way—both predictive and unpredictable, influenced by the output of every other technological device that exists in the world. The product’s innards are partly secret. The manufacturer tells you it’s still an experiment, a work in progress; but you should use it anyway, and send in feedback. Maybe even pay to use it. Because, despite its general unreadiness, this thing is going to change the world, they say.
This is not a traditional WIRED product review. This is a comparative look at three new artificially intelligent software tools that are recasting the way we access information online: OpenAI’s ChatGPT, Microsoft’s Bing Chat, and Google’s Bard.
For the past three decades, when we’ve browsed the web or used a search engine, we’ve typed in bits of data and received mostly static answers in response. It’s been a fairly reliable relationship of input-output, one that’s grown more complex as advanced artificial intelligence—and data monetization schemes—have entered the chat. Now, the next wave of generative AI is enabling a new paradigm: computer interactions that feel more like human chats.
But these are not actually humanistic conversations. Chatbots don’t have the welfare of humans in mind. When we use generative AI tools, we’re talking to language-learning machines, created by even larger metaphorical machines. The responses we get from ChatGPT or Bing Chat or Google Bard are predictive responses generated from corpora of data that are reflective of the language of the internet. These chatbots are powerfully interactive, smart, creative, and sometimes even fun. They’re also charming little liars: The data sets they’re trained on are filled with biases, and some of the answers they spit out, with such seeming authority, are nonsensical, offensive, or just plain wrong.
You’re probably going to use generative AI in some way if you haven’t already. It’s futile to suggest never using these chat tools at all, in the same way I can’t go back in time 25 years and suggest whether or not you should try Google or go back 15 years and tell you to buy or not to buy an iPhone.
But as I write this, over a period of about a week, generative AI technology has already changed. The prototype is out of the garage, and it has been unleashed without any kind of industry-standard guardrails in place, which is why it’s crucial to have a framework for understanding how they work, how to think about them, and whether to trust them.
Talking ’bout AI Generation
When you use OpenAI’s ChatGPT, Microsoft’s Bing Chat, or Google Bard, you’re tapping into software that’s using large, complex language models to predict the next word or series of words the software should spit out. Technologists and AI researchers have been working on this tech for years, and the voice assistants we’re all familiar with—Siri, Google Assistant, Alexa—were already showcasing the potential of natural language processing. But OpenAI opened the floodgates when it dropped the extremely conversant ChatGPT on normies in late 2022. Practically overnight, the powers of “AI” and “large language models” morphed from an abstract into something graspable.
Microsoft, which has invested billions of dollars in OpenAI, soon followed with Bing Chat, which uses ChatGPT technology. And then, last week, Google began letting a limited number of people access Google Bard, which is based on Google’s own technology, LaMDA, short for Language Model for Dialogue Applications.