Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 10 submissions in the queue.
posted by hubie on Tuesday September 12 2023, @07:31AM   Printer-friendly
from the burn-it-all-down dept.

https://arstechnica.com/information-technology/2023/09/openai-admits-that-ai-writing-detectors-dont-work/

Last week, OpenAI published tips for educators in a promotional blog post that shows how some teachers are using ChatGPT as an educational aid, along with suggested prompts to get started. In a related FAQ, they also officially admit what we already know: AI writing detectors don't work, despite frequently being used to punish students with false positives.

In a section of the FAQ titled "Do AI detectors work?", OpenAI writes, "In short, no. While some (including OpenAI) have released tools that purport to detect AI-generated content, none of these have proven to reliably distinguish between AI-generated and human-generated content."

In July, we covered in depth why AI writing detectors such as GPTZero don't work, with experts calling them "mostly snake oil."
[...]
That same month, OpenAI discontinued its AI Classifier, which was an experimental tool designed to detect AI-written text. It had an abysmal 26 percent accuracy rate.


Original Submission

 
This discussion was created by hubie (1068) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by aafcac on Tuesday September 12 2023, @04:11PM (5 children)

    by aafcac (17646) on Tuesday September 12 2023, @04:11PM (#1324242)

    Presumably, it correctly classified 26% correctly, which includes both correct positives and negatives. So 74% of the time was either false positives or false negatives. Which is to say that it's absolute rubbish and you'd likely be better off just flipping a coin.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 3, Informative) by maxwell demon on Tuesday September 12 2023, @05:03PM (4 children)

    by maxwell demon (1608) on Tuesday September 12 2023, @05:03PM (#1324264) Journal

    If 26 percent of the results were right and the rest were wrong, you could make a detector with 74 percent accuracy by simply always outputting the opposite. That's what the parent was telling you.

    The absolute worst accuracy you can have is 50 percent. That amounts to guessing.

    --
    The Tao of math: The numbers you can count are not the real numbers.
    • (Score: 2) by aafcac on Tuesday September 12 2023, @08:51PM (3 children)

      by aafcac (17646) on Tuesday September 12 2023, @08:51PM (#1324300)

      Probably not. That would depend upon how many AI written articles and how many person written ones you've got. Because it would be 26% of the AI articles and 26% of the person written articles being correctly identified. Without more information, you have no way of knowing whether betting for or against the algorithm getting it right in a given situation makes sense. If you assume that it's half and half, then yes that's probably a fair position to take. But, once either AI or person written articles hit 3/4 of the total, that strategy will result in worse results. And well before that the strategy gets to be barely any better than flipping a coin.

      • (Score: 2) by maxwell demon on Wednesday September 13 2023, @04:42AM (2 children)

        by maxwell demon (1608) on Wednesday September 13 2023, @04:42AM (#1324367) Journal

        No, it's elementary logic. It doesn't even depend on what this is about. If there are only two possible answers, and one answer is wrong, then the opposite answer is right. And if the original answer is wrong 74% of the time, then the opposite answer is right 74% of the time. This is completely independent on what the algorithm is, or what question it answers.

        --
        The Tao of math: The numbers you can count are not the real numbers.
        • (Score: 2) by aafcac on Wednesday September 13 2023, @04:15PM (1 child)

          by aafcac (17646) on Wednesday September 13 2023, @04:15PM (#1324438)

          It's not elementary logic, it's elementary statistics and the answer definitely does depend upon the proportion of the samples that are AI written and those that are generated by humans. Both false positives and false negatives are false results. That 74% includes both false positives and false negatives and you don't know which one it is. If 99% of the samples are from humans and you've got an accuracy rate of 26% assuming that the remaining are by AI is going to be wrong nearly everytime. If 99% are by AI, then you do get the result correct nearly 100% of the time for the same basic reason. However, if it's a 75/25 split, then you get roughly that split in both the samples identified as being AI generated and those being human generated. The result is that you need to remove the false positives from the positive results and the negatives from the false results.

          In other words, logically it may make sense that it's not relevant, but those figures do drastically affect the likelihood of being correct based on that strategy. But, the breakdown between false positives and false negatives can vary significantly depending upon the underlying population.

          • (Score: 1) by lars_stefan_axelsson on Wednesday September 13 2023, @07:24PM

            by lars_stefan_axelsson (3590) on Wednesday September 13 2023, @07:24PM (#1324478)

            Quite, all these terms have clear and rigid statistical definitions. Accuracy is (TP + FP) / (TP + FP + TN + TP). With T(rue), F(alse), N(egative), and P(ositive) respectively. (Remember 'True/False' is about the real world, and 'Postitive/Negative' is the test result.)

            So 26% accuracy really is quite shit as far as binary classification results go. And it doesn't tell the whole story, especially in cases with a large class imbalance. In those cases even a 99% accuracy can be completely unusable. It also depends on what the cost of e.g. a FP is compared to a FN aso.

            Hence all the other measures that have been proposed and are used: https://en.wikipedia.org/wiki/Evaluation_of_binary_classifiers [wikipedia.org]

            --
            Stefan Axelsson