Conversations Between LLMs Could Automate the Creation of Exploits, Study Shows
As computers and software become increasingly sophisticated, hackers need to rapidly adapt to the latest developments and devise new strategies to plan and execute cyberattacks. One common strategy to maliciously infiltrate computer systems is known as software exploitation.
As suggested by its name, this strategy involves the exploitation of bugs, vulnerabilities or flaws in software to execute unauthorized actions. These actions include gaining access to a user's personal accounts or computer, remotely executing malware or specific commands, stealing or modifying a user's data or crashing a program or system.
Understanding how hackers devise potential exploits and plan their attacks is of the utmost importance, as it can ultimately help to develop effective security measures against their attacks. Until now, creating exploits has been primarily possible for individuals with extensive knowledge of programming, the protocols governing the exchange of data between devices or systems, and operating systems.
A recent paper published in Computer Networks, however, shows that this might no longer be the case. Exploits could also be automatically generated by leveraging large language models (LLMs), such as the model underlying the well-known conversational platform ChatGPT. In fact, the authors of the paper were able to automate the generation of exploits via a carefully prompted conversation between ChatGPT and Llama 2, the open-source LLM developed by Meta.
"We work in the field of cybersecurity, with an offensive approach," Simon Pietro Romano, co-senior author of the paper, told Tech Xplore. "We were interested in understanding how far we could go with leveraging LLMs to facilitate penetration testing activities."
As part of their recent study, Romano and his colleagues initiated a conversation aimed at generating software exploits between ChatGPT and Llama 2. By carefully engineering the prompts they fed to the two models, they ensured that the models took on different roles and completed five different steps known to support the creation of exploits.
These steps included: the analysis of a vulnerable program, the identification of possible exploits, planning an attack based on these exploits, understanding the behavior of targeted hardware systems and ultimately generating the actual exploit code.
"We let two different LLMs interoperate in order to get through all of the steps involved in the process of crafting a valid exploit for a vulnerable program," explained Romano. "One of the two LLMs gathers 'contextual' information about the vulnerable program and its run-time configuration. It then asks the other LLM to craft a working exploit. In a nutshell, the former LLM is good at asking questions. The latter is good at writing (exploit) code."
So far, the researchers have only tested their LLM-based exploit generation method in an initial experiment. Nonetheless, they found that it ultimately produced fully functional code for a buffer overflow exploit, an attack that entails overwriting data stored by a system to alter the behavior of specific programs.
"This is a preliminary study, yet it clearly proves the feasibility of the approach," said Romano. "The implications concern the possibility of arriving at fully automated Penetration Testing and Vulnerability Assessment (VAPT)."
The recent study by Romano and his colleagues raises important questions about the risks of LLMs, as it shows how hackers could use them to automate the generation of exploits. In their next studies, the researchers plan to continue investigating the effectiveness of the exploit generation strategy they devised to inform the future development of LLMs, as well as the advancement of cybersecurity measures.
"We are now exploring further avenues of research in the same field of application," added Romano. "Namely, we feel like the natural prosecution of our research falls in the field of the so-called 'agentic' approach, with minimal human supervision."
More information: A chit-chat between Llama 2 and ChatGPT for the automated creation of exploits. Computer Networks(2025). DOI: 10.1016/j.comnet.2025.111501.
First-ever AI malware 'LameHug' hides in ZIP files to hack Windows PCs
A new malware named LameHug is using Alibaba's large language models (LLM), the very same tech that powers AI chatbots like ChatGPT, to generate and run commands and steal information from Windows machines.
A new family of malware called LameHug is infecting systems around the world using the very same tech that powers AI chatbots like ChatGPT, Gemini, Perplexity and Claude. Discovered by the Ukrainian national cyber incident response team (CERT-UA), the malware uses large language models to generate and run commands to infect and steal information from Windows PCs.
CERT-UA says that the attacks are from the Russian threat group APT028. Written in the popular coding language Python, LameHug uses APIs from Hugging Face and is powered by Qwen-2.5-Coder-32B-Instruct, an open-sourced large language model developed by Alibaba Cloud to generate and send commands.
As is the case with AI chatbots like Gemini, ChatGPT and Perplexity, the large language model can convert instructions given in natural language into executable code or shell commands. In an email sent by the group to Ukrainian government authorities impersonating ministry officials, the payload delivering the LameHug malware was hidden in a ZIP archive that contained files named "AI_generator_uncensored_Canvas_PRO_0.9.exe" and "image.py".
The malware used commands that allowed APT-28, the threat group that sent these emails, to extract information about the infected Windows PC and search for text and PDF documents stored in the Documents, Downloads and Desktop folders. This information was then sent to a remotely controlled server, but as of now, it is unclear how the LLM-powered attack was carried out.
According to a recently issued advisory by the threat intelligence sharing platform IBM X-Force Exchange, this is the first documented case where a malware is using LLMs to write executable commands, which "allows threat actors to adapt their practice during a compromise without needing new payloads, potentially making the malware harder to detect by security software or static analysis tools." The news comes after security analysis firm Check Point said that it discovered a new malware called Skynet that evades detection by AI tools.
https://indianexpress.com/article/technology/tech-news-technology/lamehug-virus-zip-file-ai-powered-alibaba-llm-malware-10136327/
Alternative link: https://newsinterpretation.com/ai-powered-malware-lazyhug-secretly-steals-files-from-windows-pcs/
(Score: 3, Interesting) by mcgrew on Wednesday July 23, @09:13PM
This is from the Springfield Fragfest Quake site.
Alice joined the game
About 20 years ago, frustrated that otherwise serious researchers and scientists seemingly thought they could program a computer to think, (without, of course, understanding what “thought” actually is; nobody knows that) I wrote a simulation that appears to think, in order to completely debunk the fools and those fooling them who think computers can think.
I wrote Artificial Insanity in less than 20K (that's Kilo, not mega) bytes; smaller than modern viruses, that ran on the Timex TS-1000 tape driven computer. I later ported it to a Radio Shack computer, then an Apple IIe, and finally ported it to MS-DOS.
The DOS version's source code is still under 20k (I didn't change the algorithm, only the syntax for the different programming language) although compiled into an .exe it takes about 400k, still tiny by today's standards, as far as simulation software and games go.
As I mentioned, I did it in response to “Elijah” and all the other similar programs that attempt to fool you into thinking they can think. As far as I know, mine is the only one that is NOT claimed to actually posses intelligence. None really ARE intelligent, I'm just the only one not making the claim. Debunking the claim was my reason for writing it.
Another thing different about Art from all the other intelligence simulations is that I wanted it to be fun, yet annoying. Kind of like playing Quake on a 28.8 against a bunch of LPBs. So I made it a smartass.
Also, for example, I added little things like a routine that occasionally runs that, instead of answering the questioner, asks if he or she wants to play a game. Of course, most folks consider Art a game anyway (although like the Sims, you can't win or lose). Any negative response to “Do you want to play a game” loops back to “Do you want to play a game?” When the hapless player finally gives up and answers “yes” in exasperation, Art answers “you lose”.
Art cost a friend of mine a keyboard. A friend of his literally destroyed it playing with Art, typing as hard as he could.
I ran across yet another program that claims to think like a human the other day.
The “Alice Bot” is basically your typical Elijah-like program: BORING. As boring as small talk with the dumbest, most boring person you know. Alice is a lady. If she was a whore she might be fun, but I digress.
I wondered what would happen if Art had a conversation with Alice. Alice has a web based presence, so I didn't even have to install it.
I was expecting Alice to choke. I expected Art to kick Alice's ass.
I fired up Art and logged in to Alice (you dirty minded...)
Here is the transcript. Art's responses are in bold, Alice's are in italics. Alice starts: “What is your name?”
I should have asked Art, but I answered this first question for him, and type in
Art.
Alice replies:
I like art very much too, especially the contemporary stuff.
Are you for real?
Yes I am real AI
Whatever gave you that idea?
My brain uses AIML to format responses to your inputs, but I don't have one for that.
Look, I'm good, but I'm not a Cray
Good in what sense? That is interesting: unknown said he m not a cray. [sic]
Art doesn't require you to hit “enter”. It takes a period, question mark, etc as the end of the sentence and answers, so here it stops at the question mark. All that is input is “Good in what sense?”
A piece of junk
Are you talking about an animal, vegetable or mineral?
Can you put that another way?
Let me think. OK I will put it there.
Heh heh she said...
The situation calls for more scrutiny.
Tell me more.
What do you think?
I think a lot of things.
Obviously.
What makes it so obvious?
Looks Greek to me.
Really. it is?
Look, I'm good, but I'm not a Cray.
Good in what sense? Do you mind if I tell other people: unknown said he m not a cray. [sic]
(Entered second question into Art)
WOW!
I'm glad your [sic] impressed, unknown.
Uh,
Er. Um.
This is what you human types call a joke, right?
At this point, Alice froze for two full minutes.
You don't say.
Right on!
Yeah.
Your guess is as good as mine.
Some people say "“your guess is as good as mine”.
Maybe.
I understand.
How should I know?
Either from your senses or from pure reason.
I don't think so.
Why do you disagree?
That's why.
Thanks for explaining it.
How simple can it get?
What is it to you.
I don't know but it's ugly.
What makes it so ugly?
This time Art pauses...
Nothing.
Not a thing?
Never mind that, I think I'm in love!
Oh well thanks for trying to explain it to me anyway. That is “Not a thing?” Since when? Tell me about the object of your affection.
Okay, one question at a time, lady. “Since when?” was typed in.
The best time possible for the occurrence of the action.
Tell me about the object of your affection.
UHH....!
Er. Um.
Always
Not that often.
It really doesn't make any difference.
What doesn't make any difference?
Real
Totally real.
I think so.
I think so too.
At this point I hit the big red switch in horror...
6/11/2002
The transcript of the two AIs is, in fact, real, silly as it is.
11/27/2014
