BLOOMBERG – Earlier this year, Princeton Computer Science Professor Arvind Narayanan set up a voice interface to ChatGPT for his nearly four-year-old daughter. It was partly an experiment and partly because he believed AI agents would one day be a big part of her life.
Narayanan’s daughter was naturally curious, often asking about animals, plants and the human body, and he thought ChatGPT could give useful answers to her questions, he told me. To his surprise, the chatbot developed by OpenAI also did an impeccable job at showing empathy, once he told the system it was speaking to a small child.
“What happens when the lights turn out?” his daughter asked.
“When the lights turn out, it gets dark, and it can be a little scary,” ChatGPT responded through an synthetic voice. “But don’t worry! There are lots of things you can do to feel safe and comfortable in the dark.”
It then gave some advice on using nightlights, closing with a reminder that “it’s normal to feel a bit scared in the dark”. Narayanan’s daughter was visibly reassured by the explanation, he wrote in a Substack post.
Microsoft and Alphabet’s Google are rushing to enhance their search engines with the large language model technology that underpins ChatGPT – but there is good reason to think the technology works better as an emotional companion than as a provider of facts.
That might sound weird, but what’s weirder is that Google’s Bard and Microsoft’s Bing, which is based on ChatGPT’s underlying technology, are being positioned as search tools when they have an embarrassing history of factual errors: Bard gave incorrect information about the James Webb Telescope in its very first demo while Bing goofed on a series of financial figures in its own.
The cost of factual mistakes is high when a chatbot is used for search. But when it’s designed as a companion, it’s much lower, according to Eugenia Kuyda, founder of the AI companion app Replika, which has been downloaded more than five million times.
“It won’t ruin the experience, unlike with search where small mistakes can break the trust in the product,” Kuyda added.
Margaret Mitchell, a former Google AI researcher who co-wrote a paper on the risks of large language models, has said large language models are simply “not fit for purpose” as search engines.
Language models make mistakes because the data they’re trained on often includes errors and because the models have no ground truth on which to verify what they say. Their designers may also prioritise fluency over accuracy.
That is one reason these tools are good at mimicking empathy. After all, they’re learning from text scraped from the web, including the emotive reactions posted on social media platforms like Twitter and Facebook, and from the personal support shown to users of forums like Reddit and Quora.
Conversations from movie and TV show scripts, dialogue from novels, and research papers on emotional intelligence all go into the training pot to make these tools appear empathetic. No surprise then that some people are using ChatGPT as a kind of robo-therapist, according to an April feature in Bloomberg Businessweek. One person said they used it to avoid becoming a burden on others, including their own human therapist.
To see if I could measure ChatGPT empathic abilities, I put it through an online emotional intelligence test, giving it 40 multiple choice questions and telling it to answer each question with a corresponding letter.
The result: It aced the quiz, getting perfect scores in the categories of social awareness, relationship management and self-management, and only stumbling slightly in self awareness.
In fact, ChatGPT performed better on the quiz than I did, and it also beat my colleague, global banking columnist Paul Davies, even though we are both human and have real emotions (or so we think). – Parmy Olson