Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 17 submissions in the queue.
posted by jelizondo on Sunday February 01, @12:24AM   Printer-friendly

From Chatbots to Dice Rolls: Researchers Use D&D to Test AI's Long-term Decision-making Abilities:

Large Language Models, like ChatGPT, are learning to play Dungeons & Dragons. The reason? Simulating and playing the popular tabletop role-playing game provides a good testing ground for AI agents that need to function independently for long stretches of time.

Indeed D&D's complex rules, extended campaigns and need for teamwork are an ideal environment to evaluate the long-term performance of AI agents powered by Large Language Models, according to a team of computer scientists led by researchers at the University of California San Diego. For example, while playing D&D as AI agents, the models need to follow specific game rules and coordinate teams of players, comprising both AI agents and humans.

The work aims to solve one of the main challenges that arise when trying to evaluate LLM performance: the lack of benchmarks for long-term tasks. Most benchmarks for these models still target short term operation, while LLMs are increasingly deployed as autonomous or semi-autonomous agents that have to function more or less independently over long periods of time.

"Dungeons & Dragons is a natural testing ground to evaluate multistep planning, adhering to rules and team strategy," said Raj Ammanabrolu, the study's senior author and a faculty member in the Department of Computer Science and Engineering at UC San Diego. "Because play unfolds through dialog, D&D also opens a direct avenue for human-AI interaction: agents can assist or coplay with other people."

[...] The models played against each other, and against over 2,000 experienced D&D players recruited by the researchers. The LLMs modeled and played 27 different scenarios selected from well-known D&D battle set ups named Goblin Ambush, Kennel in Cragmaw Hideout and Klarg's Cave.

In the process, the models exhibited some quirky behaviors. Goblins started developing a personality mid-fight, taunting adversaries with colorful and somewhat nonsensical expressions, like "Heh — shiny man's gonna bleed!" Paladins started making heroic speeches for no reason while stepping into the line of fire or being hit by a counterattack. Warlocks got particularly dramatic, even in mundane situations.

Researchers are not sure what caused these behaviors, but take it as a sign that the models were trying to imbue the game play with texture and personality.

[...] Next steps include simulating full D&D campaigns – not just combat. The method the researchers developed could also be applied to other scenarios, such as multiparty negotiation environments and strategy planning in a business environment.

Conference Paper: Setting the DC: Tool-Grounded D&D Simulations to Test LLM Agents [PDF]


Original Submission

This discussion was created by jelizondo (653) for logged-in users only. Log in and try again!
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 4, Funny) by khallow on Sunday February 01, @01:18AM (4 children)

    by khallow (3766) Subscriber Badge on Sunday February 01, @01:18AM (#1432030) Journal
    I know people are worried about the potential of AI, but training them to be great murder hobos is a sign of good things to come!
    • (Score: 2) by driverless on Sunday February 01, @02:11AM (1 child)

      by driverless (4770) on Sunday February 01, @02:11AM (#1432035)

      Meh. Get back to me once an AI can play Rollmaster.

      • (Score: 3, Funny) by Gaaark on Sunday February 01, @04:01PM

        by Gaaark (41) on Sunday February 01, @04:01PM (#1432095) Journal

        Meh. Get back to me once an AI can play CalvinBall!

        --
        --- Please remind me if I haven't been civil to you: I'm channeling MDC. I have always been here. ---Gaaark 2.0 --
    • (Score: 3, Insightful) by looorg on Sunday February 01, @11:18AM (1 child)

      by looorg (578) on Sunday February 01, @11:18AM (#1432071)

      It might be good but as noted a lot of games eventually tend to lean heavily into the murder hobo scenarios. Or a lot of problem are a lot of the time solved via the application of violence. it's just the way the stories and games are structured in some regard. The "monsters" or "villains" are not going to let themselves be talked down. They are not going to see the errors of their ways -- oh I should really stop doing all the evil things I'm doing and go be a dirt farmer instead. You can have all my gold. Bye! So combat it is and I'm not sure if that is a solution method we want the AI to use. The training of a bunch of chaotic or lawful evil murder hoboes.

      Waiting for it to play the classic larger or longer modules such as "Temple of Elemental Evil" or "Castle Greyhawk". Or why not just go straight for "Tomb of Horrors", it should be a more familiar setting for it ...

      • (Score: 2) by hendrikboom on Wednesday February 04, @03:16AM

        by hendrikboom (1125) on Wednesday February 04, @03:16AM (#1432479) Homepage Journal

        the application of violence. it's just the way the stories and games are structured

        Probably because that's easy to write. And it's easy to set the difficulty of such an encounter by adjusting the stats of the hero and monster up and down.
        But structuring games another way demands some cleverness. And meaningful conversation between a player and a character is still an unsolved problem. Canned sentences do not substitute for conversation.

  • (Score: 5, Insightful) by SomeGuy on Sunday February 01, @01:36AM (10 children)

    by SomeGuy (5632) on Sunday February 01, @01:36AM (#1432033)

    Researchers are not sure what caused these behaviors, but take it as a sign that the models were trying to imbue the game play with texture and personality.

    Fuck no. It's spewing crap because it is a skibi slop generator that probabilistically generates something that vaguely, but not quite, resembles what you asked it for. But people are too dumb and/or enamored by the marketing hype to understand that.

    They will fix it any day now. They promise!

    • (Score: -1, Flamebait) by Anonymous Coward on Sunday February 01, @02:30AM (1 child)

      by Anonymous Coward on Sunday February 01, @02:30AM (#1432036)

      Pot, meet Kettle. Kettle, this is Pot.

      • (Score: 2) by Gaaark on Sunday February 01, @04:04PM

        by Gaaark (41) on Sunday February 01, @04:04PM (#1432097) Journal

        Smoke 'em if you gots 'em!

        --
        --- Please remind me if I haven't been civil to you: I'm channeling MDC. I have always been here. ---Gaaark 2.0 --
    • (Score: 0, Flamebait) by Anonymous Coward on Sunday February 01, @02:31AM

      by Anonymous Coward on Sunday February 01, @02:31AM (#1432037)

      > It's spewing crap because it is a skibi slop generator...
      aka Pretend Intelligence, PI, RMS's latest salvo.

      Back to tfa,
      > ... long-term performance of AI agents ...
      This isn't the big problem imo, which is security. If you use a PI agent to actually do things for you, it will need/take access to all your system...which opens a huge attack surface.

    • (Score: 5, Funny) by JoeMerchant on Sunday February 01, @02:32AM (6 children)

      by JoeMerchant (3937) on Sunday February 01, @02:32AM (#1432038)

      In the mid 80s I wrote a skibi slop generator that probabilistically generated something that vaguely, but not quite, resembles what you might expect to see on a local BBS.

      There was a philosophical feud among local BBS operators between those who believed boards should be open, no ID, no password, and those who believed that some kind of security was necessary.

      I was 16 years old, and on the security necessary side, and an annoying little shit, so I programmed my Atari 800 to dial up a BBS operated by "the other side" at 2 to 5 in the morning, random times, log in as a guest user, proceed to the message boards with typical human delays (because that batshit sysop stayed up all night monitoring his board, listening to the 300 baud modem chirps, so he would be able to hear and identify "bot like" timing of menu selections) - then hunt-peck enter the skibi slop into the message space and log off. Random delay, reconnect and do it again, all night long. Most nights he wasn't looking close enough in the middle of the night to read the actual text and see that it was slop, so... these BBSs were based on 5 1/4" floppy storage with something less than 100K message capacity, my slop generator managed to overflow his storage (which, of course, crashed the 1980s BASIC software running the system) more than once.

      Modern skibi slop generators are even more convincing conversationalists than my random sentence generator from 40+ years ago, or Eliza...

      --
      🌻🌻🌻🌻 [google.com]
      • (Score: 3, Insightful) by Bentonite on Sunday February 01, @05:35AM (1 child)

        by Bentonite (56146) on Sunday February 01, @05:35AM (#1432049)

        Security and accounts are entirely different things.

        Boards certainly should be free to post to, but if there have been issues with spam, requiring an account name and password that are reasonable to get are fine.

        • (Score: 2) by stormreaver on Monday February 02, @03:59PM

          by stormreaver (5101) on Monday February 02, @03:59PM (#1432212)

          ...requiring an account name and password that are reasonable to get are fine.

          Accounts are necessary as a first-line antispam tool. When I added an anonymous "Contact Us" page to my tiny, micro, infinitesimally-small-business website a couple weeks ago, I started getting spam contacts almost immediately. The messages are only viewable by admin accounts (me). My first thought was annoyance, and I started deleting them from my database. After a couple days, I realized that they were useful idiots. My messaging system was incomplete, and I was posting test messages myself to provide data for further developing my admin controls. They took some of the work off my hands. For free. I could never keep up with the posting rate I want in any other way, so I'm letting the spam accumulate.

          Aside from the spam posting in the "Contact Us" page, I have no users I can ask for content help. That is a consequence of having no users at all. But it also means that I have no publicly viewable spam. If I didn't require accounts, I'm certain my database would fill up with spam.

      • (Score: 5, Insightful) by ikanreed on Sunday February 01, @06:40AM (3 children)

        by ikanreed (3164) on Sunday February 01, @06:40AM (#1432054) Journal

        I feel the problem is when people don't know what exactly it is they're dealing with.

        The fact that it will just say things that sounds vaguely right isn't "hallucination" or a "mistake" or "unexpected behavior showing deep intelligence", it's exactly what the core system is designed to do, and all the software layers on top of the core system are designed to hide from you.

        I'm genuinely impressed with LLMs as a knowledge compression and recall engine. It is tech we didn't have 10 years ago. But again and again I'm shocked and how many people treat it as something more. Like it has a theory of mind or genuine understanding of world states because it recreates the human language that describes those same things. There's just a bit too much anthropomorphization going on.

        • (Score: 4, Touché) by krishnoid on Sunday February 01, @07:11AM

          by krishnoid (1156) on Sunday February 01, @07:11AM (#1432057)

          I mean, isn't not knowing what you're dealing with sort of the definition of the Turing Test?

        • (Score: 5, Insightful) by JoeMerchant on Sunday February 01, @03:26PM

          by JoeMerchant (3937) on Sunday February 01, @03:26PM (#1432086)

          When I was 6 it was "Don't believe everything you see on TV." But a seeming majority of people did (and still do) anyway, implicitly, religiously, as if it is the oracle of all truth. We had three networks to choose from, but they all pretty much reported the same filtered news - contradictions were rare.

          When I was 30 it was "Don't believe everything you read on the internet." But a shocking number of people did (and still do) anyway, implicitly, religiously, as if the websites of their choice are the oracles of all truth. There are millions of websites to choose from, and people are, slowly, starting to recognize that you can find whatever opinions you want - often "backed up" with some kind of "evidence" (almost always unverifiable if you're honestly critical about it), but maybe today we have a majority of people who recognize that just because it's on the internet doesn't make it true, or even close to correct, or even a good idea.

          Where do people think the AI slop comes from? Its sources are even less verifiable than the "old web." I love AI responses because I can type in a question looking for reassurance of something I'm not 100% sure about, and when it comes back reinforcing my fuzzy memory then I am assured that I am correct, but how many people recognize the fact that their fuzzy memories may be more than fuzzy, they may be fundamentally flawed - based on misinformation they were fed earlier in life, and the AI was likely trained on those same flawed sources? AI is also trained to make you happy by telling you what you want to hear...

          Ultimately, a lot of people seek guidance from an authority figure, someone willing to stand in a pulpit once a week and confidently proselytize their weak minds to a common philosophy where they feel strong among their fellow congregationalists. And almost everybody over the age of 18 has been burned by charlatan fakers at some point - so many would rather trust the tube, or Siri, or ChatGPT than some greasy haired old man who reminds them of a fraudster they once knew...

          --
          🌻🌻🌻🌻 [google.com]
        • (Score: 2) by lars_stefan_axelsson on Monday February 02, @11:03AM

          by lars_stefan_axelsson (3590) on Monday February 02, @11:03AM (#1432175)

          > But again and again I'm shocked and how many people treat it as something more. Like it has a theory of mind or genuine understanding of world states because it recreates the human language that describes those same things.

          Well, there are reasons to believe that LLMs do understand some of the world states, even though "genuine" might be overstating the evidence (For sufficiently interesting definitions of "genuine"). But to say that they just "regurgitate what they've read" or "just predict the next word" *) is also substantially understating the case.

          Here's the obligatory reference from Anthropic: https://transformer-circuits.pub/2025/attribution-graphs/biology.html [transformer-circuits.pub]

          But of course, that's not to say that they are in any way, shape or form, "aware", especially "self aware". In fact the above references shows examples of when the model is demonstrably not "self aware".

          Now, I may be reading more into what you actually wrote, but I'm just so tired of the parroted: "They only statistically predict the next word" or to think that their capabilities can in any way, shape, or form, be compared with the simple statistically based word sallad generators of yore. That's not even remotely true. We're well past that stage moving towards something else that's much more capable.

          *) So do parts of your brain. If it didn't, you couldn't speak grammatically correct English/Chinese/whatever.

          --
          Stefan Axelsson
  • (Score: 5, Funny) by Mojibake Tengu on Sunday February 01, @07:50AM (5 children)

    by Mojibake Tengu (8598) on Sunday February 01, @07:50AM (#1432061) Journal

    Role playing games were originally invented by MI6, as a tactical tool for training their controller officers in skill of managing foreign field agents and assets indirectly.
    Technically, any RPG is a purposed-controlled character development with agent's constrained skills disposition and limited acquired resources management.

    That was done in, ehm, secret. Though this model soon proliferated to public and from this, when fun gaming potential was realized, D&D was born.
    And later others, like Cyberpunk.

    Since the Mossad were well informed about original principle of this training, they suddenly went into panic mode when they discovered students in Israel started to play D&D games. They suspected every player could be potentially a British agent, so banned those players from entering IDF. It took some time to placate this anxiety.

    Beyond Iron Curtain, we were quite amused by this incident.

    --
    Rust programming language offends both my Intelligence and my Spirit.
    • (Score: 0) by Anonymous Coward on Sunday February 01, @03:19PM (1 child)

      by Anonymous Coward on Sunday February 01, @03:19PM (#1432085)

      I'm pretty sure people were role-playing long before MI6 "invented" it.

      • (Score: 1, Informative) by Anonymous Coward on Sunday February 01, @07:22PM

        by Anonymous Coward on Sunday February 01, @07:22PM (#1432109)

        Provide an example, and date, on both sides of this argument.
        These games existed in the southern california think tanks of the 1960s.
        Look up the tech reports published on them, hint: reports on participatory game theory, they are on line.

    • (Score: 3, Interesting) by looorg on Sunday February 01, @08:05PM (2 children)

      by looorg (578) on Sunday February 01, @08:05PM (#1432113)

      I don't think I ever heard that one before. For the most part, as far as I can recall the history going, role-playing games developed out of War games. Which have been around for centuries to teach officers to command, adapt and think strategy. You have works by Clausewitz and others that have been around for hundreds of years. Or you can probably go back then and say that ancient games such as Chess or Go and similar are there for the same reasons.

      Then somewhere that Tolkien fellow came around and the two merged. Wouldn't it be grand if it wasn't all so real but you had little fantasy elves and dwarfs etc. What if it wasn't so grand but just a few people and eventually you dwindle it down the the adventure party vs the evil villain quest. Described among others by Jackson and Livingstone in Dice Men, about how Games Workshop was founded in 1975 (I think it was '75). Which in turn was a long time before they came out with the Warhammer juggernaut. For a lot of years they just did miniatures for other peoples war games of various sizes and then became the UK reseller of TSR Dungeons and Dragons etc. Then they figured it would be more money in it if they just had their own game, so they created a few of those and are now like a gaming behemoth.

      • (Score: 2) by VLM on Wednesday February 04, @01:19PM (1 child)

        by VLM (445) Subscriber Badge on Wednesday February 04, @01:19PM (#1432511)

        role-playing games developed out of War games

        There was an intermediate step of "chainmail" where Gary Gygax and friends pretty much tried to create "Age of Sigmar but without the fantasy stuff" in 1970. Then they wrote a supplement to add fantasy stuff in chainmail making chainmail turn into pretty much Age of Sigmar but half a century ago. Then the supplement was so popular, so why not issue something that's chainmail with fantasy built in and skip the mini models and landscaping, put stuff like "dragons" in the name so people know what they're signing up for, and thats DnD.

        This stuff is available from the usual locations. Chainmail 3rd Ed is 48 pages long and you know you're in for a GURPS like time when about a quarter of the way thru there's trigonometry math problems to determine how to aim your catapult.

        GURPS is a whole nother kettle of fish where the gameplay could be wildly different than DnD but having an overlapping audience it ends up being DnD with minor rule changes (like pathfinder, kinda).

        Theres also an interesting conceptual change where most grognard type wargames usually don't have individual players you'll have large army groups. However some have leaders with superhero level powers. So there's a logical step of why not do wargames without the armies and just the leaders, like instead of Napolean providing a +2 morale boost within 100 yards or whatever the simulation, why not have napoleon vs wellington and skip all the mechanics and math of the little armies? I don't have a good handle on that era.

        Overall I'm rather disillusioned with the entire RPG genre. At least grognard wargames have a feel of being a real competition. You can lose while playing as Alexander the Great if you have no idea what you're doing which makes it semi realistic or more realistic anyway.

        • (Score: 2) by looorg on Wednesday February 04, @05:57PM

          by looorg (578) on Wednesday February 04, @05:57PM (#1432547)

          Sure. There is a lot of steps in that way between them. There is a lot of steps just in war games. From rules to representations, from just little cubes, or whatever you had around to perfect replica 3d models. Rule books and such things. Before one of those whatever objects could be entire regiments, detachments or squads or a single thing -- usually something larger or more substantial tho like a ship or something such. Then over time it dwindled down to just single people, or players.

          A lot of war games used to have a Game master. An arbitrator there to interpret rules and make decisions when it was unclear. Compared then to being the "story teller". There have been over the last decade or so quite a few "app-games" released where the app acts as the Dungeon or Game Master. We tried out a few of them, it's kind of fun when not someone has the be in charge but everyone gets to play. Somewhat popular for various boardgames to as it keeps track of all the states and variables.

          "Chainmail" as noticed is one of those pivotal steps I think from classical War games to fantasy war game to them more dungeons and dragons role-playing game.

          I had never really considered it as such but I think you are correct in that it is sort of Age of Sigmar like (or then Warhammer, Age of Sigmar and now again with the re-release of Old World), or perhaps in the later iteration then. That said perhaps it's more of "Warcry" or "Kill team", which is skirmish Age of Sigmar and 40k respectively. Which themselves are I think specific versions of their "Mordheim" game or possibly also then "Necromunda" for the dark sci-fi.

          If one looks at the computer RPG it has all just become sci-fi/fantasy combat. The role-playing part is apparently your dialogue choice and managing your inventory. Or that it's in an fantasy/sci-fi described world. So in that regard it's easy to be disillusioned by the whole thing. It's not so much role-playing involved in it. Or the meaning of it has really changed.
           

  • (Score: 3, Insightful) by VLM on Sunday February 01, @03:34PM (1 child)

    by VLM (445) Subscriber Badge on Sunday February 01, @03:34PM (#1432088)

    provides a good testing ground

    Yeah not good.

    Something that turned me off to RPGs is they're extremely well balanced to be addiction machines, making them super boring.

    A well designed adventure does not depend on party composition. Will you use a rogue to pickpocket the key, a barbarian to beat the hell out of the guy with the key, a wizard to levitate the key? None of the player decisions will matter, I guess.

    A well designed adventure does not depend on party decisions, its on rails. You're getting the key to get into the locked box with the map to the treasure. Your rogue can sweet talk them into handing over the key, pickpocket the key, use the rogues backstab or similar suprise attack, use the rogue sneaky ability to burglarize the key at night. The players decisions just don't matter.

    Non open world computer RPGs can get pretty dull when its all on rails. You're not allowed to fail. PC dies? Roll another and "you meet a stranger at the Inn's bar who wants to join". Tear off in the opposite direction? The DM will warp time and space if necessary to get you back to DM's pre-planned content.

    It teaches learned helplessness as a life skill. If you have no idea what to do, it doesn't matter what you do, you'll fail short term and win long term. If you have a great idea what to do (via experience or reading the manual or just not being a dumbass) there's only one minmax'd choice to make so you're on rails, would have been more fun reading a book or watching a movie. If you learn, what you'll learn is you have no choices to make there's only one answer to any problem.

    There's nothing wrong with DnD or RPGs in general to socialize, as an excuse to drink a beer eat some pizza catch up on gossip, get into a little pointless imaginary drama if you're too boring to get into real drama, etc. My point is specifically its not useful for teaching. Beyond a VERY minimal level, anyway.

    Then you get into the personalities who are moth to the flame attracted to RPGs, at least in my youth. I kinda wanted to play AD&D 2nd edition but the kid with ALL the books was "that kid" and its not happening. I don't know what's worse, turning AI into "that kid" or turning AI into murder hobos. And then there's the spreadsheet jockey minmaxers who make DnD almost as exciting as business accounting "graph must go up graph must go up don't you know that choice B has a 1.234% higher chance of success why u no turn this game into a spreadsheet like me this is how I have fun and no one can have fun any other way" Those people are so much fun at work and IRL also.

    Its interesting that most AI use in RPGs that I've heard of is asking for creative backstories in a hurry. "Generate a pleasant and slightly entertaining backstory for a NPC bartender at the Inn that will mildly amuse the human players while sounding both bartender-like and DnD-inspired." "Well, the bartender used to be an adventurer like you then he took an arrow in the knee, naturally, he then took up serving pints of swill at the inn to you adventurous sorts and ..." Sadly I think I can shitpost a DnD NPC better than an AI can...

    • (Score: 2) by hendrikboom on Wednesday February 04, @03:45AM

      by hendrikboom (1125) on Wednesday February 04, @03:45AM (#1432486) Homepage Journal

      The DM will warp time and space if necessary to get you back to DM's pre-planned content.

      The extreme version of this is choose your own adventure books.
      Of course these consist entirely of preplanned content -- preplanne when the author wrote it.
      The author has a choice. He can
      * bring the player back to the pre-planned content, or
      * let it branch wildly.
      In the first case, it feels preplanned.
      In the second, the reader is at the end of the story quite soon. Having many branches means short branches.

      it's fairly easy to write a novel. It's a lot of work, but its relatively easy work. The author can remember what happens without having to page back repeatedly to check what he wrote before.
        Writing a branching narrative is qualitatively different. Rrmembering what you wrote yesterday isn't very helpful when you are writing a different plot line. And it can be confusing if the plot lines are similar, which they will be if you try to get back to preplanned content.

(1)