People trust legal advice generated by ChatGPT more than a lawyer – new study [theconversation.com]:
People who aren't legal experts are more willing to rely on legal advice provided by ChatGPT than by real lawyers – at least, when they don't know which of the two provided the advice. That's the key finding of our new research, which highlights some important concerns about the way the public increasingly relies on AI-generated content. We also found the public has at least some ability to identify whether the advice came from ChatGPT or a human lawyer.
AI tools like ChatGPT and other large language models (LLMs) are making their way into our everyday life. They promise to provide quick answers, generate ideas, diagnose medical symptoms, and even help with legal questions by providing concrete legal advice.
But LLMs are known to create so-called "hallucinations" [theconversation.com] – that is, outputs containing inaccurate or nonsensical content. This means there is a real risk associated with people relying on them too much, particularly in high-stakes domains such as law. LLMs tend to present advice confidently, making it difficult for people to distinguish good advice from decisively voiced bad advice.
We ran three experiments on a total of 288 people. In the first two experiments, participants were given legal advice and asked which they would be willing to act on. When people didn't know if the advice had come from a lawyer or an AI, we found they were more willing to rely on the AI-generated advice. This means that if an LLM gives legal advice without disclosing its nature, people may take it as fact and prefer it to expert advice by lawyers – possibly without questioning its accuracy.
Even when participants were told which advice came from a lawyer and which was AI-generated, we found they were willing to follow ChatGPT just as much as the lawyer.
One reason LLMs may be favoured, as we found in our study, is that they use more complex language. On the other hand, real lawyers tended to use simpler language but use more words in their answers.
The third experiment investigated whether participants could distinguish between LLM and lawyer-generated content when the source is not revealed to them. The good news is they can – but not by very much.
In our task, random guessing would have produced a score of 0.5, while perfect discrimination would have produced a score of 1.0. On average, participants scored 0.59, indicating performance that was slightly better than random guessing, but still relatively weak.