Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 15 submissions in the queue.
posted by hubie on Tuesday September 12 2023, @07:31AM   Printer-friendly
from the burn-it-all-down dept.

https://arstechnica.com/information-technology/2023/09/openai-admits-that-ai-writing-detectors-dont-work/

Last week, OpenAI published tips for educators in a promotional blog post that shows how some teachers are using ChatGPT as an educational aid, along with suggested prompts to get started. In a related FAQ, they also officially admit what we already know: AI writing detectors don't work, despite frequently being used to punish students with false positives.

In a section of the FAQ titled "Do AI detectors work?", OpenAI writes, "In short, no. While some (including OpenAI) have released tools that purport to detect AI-generated content, none of these have proven to reliably distinguish between AI-generated and human-generated content."

In July, we covered in depth why AI writing detectors such as GPTZero don't work, with experts calling them "mostly snake oil."
[...]
That same month, OpenAI discontinued its AI Classifier, which was an experimental tool designed to detect AI-written text. It had an abysmal 26 percent accuracy rate.


Original Submission

 
This discussion was created by hubie (1068) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 1) by lars_stefan_axelsson on Wednesday September 13 2023, @07:24PM

    by lars_stefan_axelsson (3590) on Wednesday September 13 2023, @07:24PM (#1324478)

    Quite, all these terms have clear and rigid statistical definitions. Accuracy is (TP + FP) / (TP + FP + TN + TP). With T(rue), F(alse), N(egative), and P(ositive) respectively. (Remember 'True/False' is about the real world, and 'Postitive/Negative' is the test result.)

    So 26% accuracy really is quite shit as far as binary classification results go. And it doesn't tell the whole story, especially in cases with a large class imbalance. In those cases even a 99% accuracy can be completely unusable. It also depends on what the cost of e.g. a FP is compared to a FN aso.

    Hence all the other measures that have been proposed and are used: https://en.wikipedia.org/wiki/Evaluation_of_binary_classifiers [wikipedia.org]

    --
    Stefan Axelsson