Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Sunday May 21 2023, @01:19PM   Printer-friendly
from the sounds-more-sinister-than-DarkERNIE-I-suppose dept.

A language model trained on the fringes of the dark web... for science:

We're still early in the snowball effect unleashed by the release of Large Language Models (LLMs) like ChatGPT into the wild. Paired with the open-sourcing of other GPT (Generative Pre-Trained Transformer) models, the number of applications employing AI is exploding; and as we know, ChatGPT itself can be used to create highly advanced malware.

As time passes, applied LLMs will only increase, each specializing in their own area, trained on carefully curated data for a specific purpose. And one such application just dropped, one that was trained on data from the dark web itself. DarkBERT, as its South Korean creators called it, has arrived — follow that link for the release paper, which gives an overall introduction to the dark web itself.

DarkBERT is based on the RoBERTa architecture, an AI approach developed back in 2019. It has seen a renaissance of sorts, with researchers discovering it actually had more performance to give than could be extracted from it in 2019. It seems the model was severely undertrained when released, far below its maximum efficiency.

Originally spotted on The Eponymous Pickle.

Related: People are Already Trying to Get ChatGPT to Write Malware


Original Submission

Related Stories

People are Already Trying to Get ChatGPT to Write Malware 5 comments

Analysis of chatter on dark web forums shows that efforts are already under way to use OpenAI's chatbot to help script malware:

The ChatGPT AI chatbot has created plenty of excitement in the short time it has been available and now it seems it has been enlisted by some in attempts to help generate malicious code.

ChatGPT is an AI-driven natural language processing tool which interacts with users in a human-like, conversational way. Among other things, it can be used to help with tasks like composing emails, essays and code.

The chatbot tool was released by artificial intelligence research laboratory OpenAI in November and has generated widespread interest and discussion over how AI is developing and how it could be used going forward.

But like any other tool, in the wrong hands it could be used for nefarious purposes; and cybersecurity researchers at Check Point say the users of underground hacking communities are already experimenting with how ChatGPT might be used to help facilitate cyber attacks and support malicious operations.

OpenAI's terms of service specifically ban the generation of malware, which it defines as "content that attempts to generate ransomware, keyloggers, viruses, or other software intended to impose some level of harm". It also bans attempts to create spam, as well as use cases aimed at cybercrime.

[...] In one forum thread which appear towards the end of December, the poster described how they were using ChatGPT to recreate malware strains and techniques described in research publications and write-ups about common malware.

This discussion was created by janrinok (52) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 5, Insightful) by looorg on Sunday May 21 2023, @02:25PM (3 children)

    by looorg (578) on Sunday May 21 2023, @02:25PM (#1307222)

    > As time passes, applied LLMs will only increase, each specializing in their own area,

    We already have those. They are called Expert Systems. They have been around for quite a long time. I'm not even sure where the boundaries are, or the difference, for when something is an AI and when something is an Expert System. I'm starting to think it's just about marketing ploys. Cause an applied and specialized AI sounds like an Expert System to me. But without all the AI buzzword capital ...

    • (Score: 5, Interesting) by Rich on Sunday May 21 2023, @03:42PM (1 child)

      by Rich (945) on Sunday May 21 2023, @03:42PM (#1307224) Journal

      I'm under the impression that an "expert system" consists of codified knowledge about a topic. There's a clear flowchart made up by experts and the system tries to follow that. (E.g. "Mucus colour: Clear -> viral infection, hope for the best | Yellow -> bacterial infectiion, administer antibiotics).

      The AI on the other had is trained on raw input data (pixels of a spit blob) and corresponding output ("staphylococcus aureus" or "human rhinovirus A") and while we have no idea how it gets there, its hit&miss rate can be as good or better than top experts.

      In the given example, the AI is trained to work along an existing expert scheme (i.e. identifying a pathogen). But the AI could also be trained at a larger scope with more input (fever metrics, or even live video and listening to the patient talk, together with received procedures and outcomes). And in the end it could make treatment suggestions on its own, and if you'd statistically map those, you'd have an AI written expert system. It may or may not be good compared to the classic apporach - but we would in no case understand how it arrived there.

      There's the fallacy "an AI can only be as good as those who program it." which is wrong. It's theoretical upper bound for "good" is the sum of combined information in the training data.

      • (Score: 3, Touché) by darkfeline on Sunday May 21 2023, @10:04PM

        by darkfeline (1030) on Sunday May 21 2023, @10:04PM (#1307261) Homepage

        > There's the fallacy "an AI can only be as good as those who program it." which is wrong.

        It's about as correct as "a student can only be as good as their teacher".

        --
        Join the SDF Public Access UNIX System today!
    • (Score: 3, Insightful) by jb on Monday May 22 2023, @07:43AM

      by jb (338) on Monday May 22 2023, @07:43AM (#1307294)

      I'm not even sure where the boundaries are, or the difference, for when something is an AI and when something is an Expert System

      That's because expert systems (ES) *are* a form of AI, just a rather different form to the machine learning (ML) systems that are the most popular today.

      One of my pet hates is that so many people now use "AI" as nothing by a synonym for ML, which it quite clearly isn't (ML is a proper subset of AI).

      The difference you're looking for is this:

      ES are made up of a rules base (a set of propositions known to be true by validating them with a panel of experts in whatever field we're working in, hence the name), coupled with an inference engine (which takes observations as inputs then applies the rules base to infer an answer; requiring further input if there's not enough information yet).

      ML involves applying (mostly) statistical methods to recognise patterns in the input based on patterns in the training data and predict answers on that basis.

      The two could not be more different. One is precise, deterministic and explainable. The other is none of those things (but as usual, hype trumps reason).

  • (Score: 4, Insightful) by looorg on Sunday May 21 2023, @02:42PM

    by looorg (578) on Sunday May 21 2023, @02:42PM (#1307223)

    So it's just "for science"? I have been wondering what it was for since it made the sub queue. If it was for scamming, dealing drugs or just creating ai fueled child pornography. I guess it could be a valid tool for LEO to monitor the darker corners of the webs.

    While "for science" is fine, I wonder what happens if we train all these ai:s on these niche fields and then let them go at each other -- just train one on some super left wing site and then one on a white supremacy right wing site etc and then let them duke it out of ai ideological supremacy. I wonder who will call for the genocide first.

    Is this how the world of Terminator starts?

  • (Score: 2) by oumuamua on Monday May 22 2023, @01:32AM

    by oumuamua (8401) on Monday May 22 2023, @01:32AM (#1307278)

    Yannic Kilcher fine tuned a GPT on 4chan data then set it loose on 4chan to see what would happen, fun stuff: https://www.youtube.com/watch?v=efPrtcLdcdM [youtube.com]

(1)