Last week, OpenAI published tips for educators in a promotional blog post that shows how some teachers are using ChatGPT as an educational aid, along with suggested prompts to get started. In a related FAQ, they also officially admit what we already know: AI writing detectors don't work, despite frequently being used to punish students with false positives.
In a section of the FAQ titled "Do AI detectors work?", OpenAI writes, "In short, no. While some (including OpenAI) have released tools that purport to detect AI-generated content, none of these have proven to reliably distinguish between AI-generated and human-generated content."
In July, we covered in depth why AI writing detectors such as GPTZero don't work, with experts calling them "mostly snake oil."
[...]
That same month, OpenAI discontinued its AI Classifier, which was an experimental tool designed to detect AI-written text. It had an abysmal 26 percent accuracy rate.
(Score: 1) by lars_stefan_axelsson on Wednesday September 13 2023, @07:24PM
Quite, all these terms have clear and rigid statistical definitions. Accuracy is (TP + FP) / (TP + FP + TN + TP). With T(rue), F(alse), N(egative), and P(ositive) respectively. (Remember 'True/False' is about the real world, and 'Postitive/Negative' is the test result.)
So 26% accuracy really is quite shit as far as binary classification results go. And it doesn't tell the whole story, especially in cases with a large class imbalance. In those cases even a 99% accuracy can be completely unusable. It also depends on what the cost of e.g. a FP is compared to a FN aso.
Hence all the other measures that have been proposed and are used: https://en.wikipedia.org/wiki/Evaluation_of_binary_classifiers [wikipedia.org]
Stefan Axelsson