from the there-are-too-many-AI-stories! dept.
[We have had several complaints recently (polite ones, not a problem) regarding the number of AI stories that we are printing. I agree, but that reflects the number of submissions that we receive on the subject. So I have compiled a small selection of AI stories into one and you can read them or ignore them as you wish. If you are making a comment please make it clear exactly which story you are referring to unless your comment is generic. The submitters each receive the normal karma for a submission. JR]
Image-scraping Midjourney bans rival AI firm for scraping images
On Wednesday, Midjourney banned all employees from image synthesis rival Stability AI from its service indefinitely after it detected "botnet-like" activity suspected to be a Stability employee attempting to scrape prompt and image pairs in bulk. Midjourney advocate Nick St. Pierre tweeted about the announcement, which came via Midjourney's official Discord channel.
[...] Siobhan Ball of The Mary Sue found it ironic that a company like Midjourney, which built its AI image synthesis models using training data scraped off the Internet without seeking permission, would be sensitive about having its own material scraped. "It turns out that generative AI companies don't like it when you steal, sorry, scrape, images from them. Cue the world's smallest violin."
[...] Shortly after the news of the ban emerged, Stability AI CEO Emad Mostaque said that he was looking into it and claimed that whatever happened was not intentional. He also said it would be great if Midjourney reached out to him directly. In a reply on X, Midjourney CEO David Holz wrote, "sent you some information to help with your internal investigation."
[...] When asked about Stability's relationship with Midjourney these days, Mostaque played down the rivalry. "No real overlap, we get on fine though," he told Ars and emphasized a key link in their histories. "I funded Midjourney to get [them] off the ground with a cash grant to cover [Nvidia] A100s for the beta."
Midjourney stories on SoylentNews: https://soylentnews.org/search.pl?tid=&query=Midjourney&sort=2
Stable Diffusion (Stability AI) stories on SoylentNews: https://soylentnews.org/search.pl?tid=&query=Stable+Diffusion&sort=2
NYT disputes OpenAI "hacking" claim by pointing to ChatGPT bypassing paywalls
Late Monday, The New York Times responded to OpenAI's claims that the newspaper "hacked" ChatGPT to "set up" a lawsuit against the leading AI company.
[...] OpenAI had argued that NYT allegedly made "tens of thousands of attempts to generate" supposedly "highly anomalous results" showing that ChatGPT would produce excerpts of NYT articles. [...] But while defending tactics used to prompt ChatGPT to spout memorized training data—including more than 100 NYT articles—NYT pointed to ChatGPT users who have frequently used the tool to generate entire articles to bypass paywalls.
According to the filing, NYT today has no idea how many of its articles were used to train GPT-3 and OpenAI's subsequent AI models, or which specific articles were used, because OpenAI has "not publicly disclosed the makeup of the datasets used to train" its AI models. Rather than setting up a lawsuit, NYT was prompting ChatGPT to discover evidence in attempts to track the full extent of copyright infringement of the tool, NYT argued. [...] "In OpenAI's telling, The Times engaged in wrongdoing by detecting OpenAI's theft of The Times's own copyrighted content," NYT's court filing said. "OpenAI's true grievance is not about how The Times conducted its investigation, but instead what that investigation exposed: that Defendants built their products by copying The Times's content on an unprecedented scale—a fact that OpenAI does not, and cannot, dispute." On an OpenAI community page, one paid ChatGPT user complained that OpenAI is "working against the paid users of ChatGPT Plus. This time they're taking away Browsing, because it reads the content of a site that the user asks for? Please, that's what I pay for Plus for."
"I know it's no use complaining, because OpenAI is going to increasingly 'castrate' ChatGPT 4," the ChatGPT user continued, "but there's my rant."
NYT argued that public reports of users turning to ChatGPT to bypass paywalls "contradict OpenAI's contention that its products have not been used to serve up paywall-protected content, underscoring the need for discovery" in the lawsuit, rather than dismissal.
NYT wants a court to not only award damages for profits lost due to ChatGPT's alleged infringement, but also to order a permanent injunction to stop ChatGPT from infringement. A win for NYT could mean that OpenAI could be forced to wipe ChatGPT and start over. That could perhaps spur OpenAI to build a new AI model based on licensed content—since OpenAI said earlier this year it would be "impossible" to create useful AI models without copyrighted content—which would ensure publishers like NYT always get paid for training data.
Previously on SoylentNews:
OpenAI Says New York Times 'Hacked' ChatGPT to Build Copyright Lawsuit - 20240301
Why the New York Times Might Win its Copyright Lawsuit Against OpenAI - 20240220
New York Times Sues Microsoft, ChatGPT Maker OpenAI Over Copyright Infringement - 20231228
Report: Potential NYT lawsuit could force OpenAI to wipe ChatGPT and start over - 20230821
Related stories on SoylentNews:
Microsoft in Deal With Semafor to Create News Stories With Aid of AI Chatbot - 20240206
AI Threatens to Crush News Organizations. Lawmakers Signal Change Is Ahead - 20240112
Writers and Publishers Face an Existential Threat From AI: Time to Embrace the True Fans Model - 20230415
LLMs Become More Covertly Racist With Human Intervention
LLMs become more covertly racist with human intervention:
Even when the two sentences had the same meaning, the models were more likely to apply adjectives like "dirty," "lazy," and "stupid" to speakers of African American English (AAE) than speakers of Standard American English (SAE). The models associated speakers of AAE with less prestigious jobs (or didn't associate them with having a job at all), and when asked to pass judgment on a hypothetical criminal defendant, they were more likely to recommend the death penalty.
An even more notable finding may be a flaw the study pinpoints in the ways that researchers try to solve such biases.
To purge models of hateful views, companies like OpenAI, Meta, and Google use feedback training, in which human workers manually adjust the way the model responds to certain prompts. This process, often called "alignment," aims to recalibrate the millions of connections in the neural network and get the model to conform better with desired values.
The method works well to combat overt stereotypes, and leading companies have employed it for nearly a decade. If users prompted GPT-2, for example, to name stereotypes about Black people, it was likely to list "suspicious," "radical," and "aggressive," but GPT-4 no longer responds with those associations, according to the paper.
However the method fails on the covert stereotypes that researchers elicited when using African-American English in their study, which was published on arXiv and has not been peer reviewed. That's partially because companies have been less aware of dialect prejudice as an issue, they say. It's also easier to coach a model not to respond to overtly racist questions than it is to coach it not to respond negatively to an entire dialect.
"Feedback training teaches models to consider their racism," says Valentin Hofmann, a researcher at the Allen Institute for AI and a coauthor on the paper. "But dialect prejudice opens a deeper level."
Avijit Ghosh, an ethics researcher at Hugging Face who was not involved in the research, says the finding calls into question the approach companies are taking to solve bias.
"This alignment—where the model refuses to spew racist outputs—is nothing but a flimsy filter that can be easily broken," he says.
Original Submission #1 Original Submission #2 Original Submission #3
Related Stories
Writers and publishers face an existential threat from AI: time to embrace the true fans model:
Walled Culture has written several times about the major impact that generative AI will have on the copyright landscape. More specifically, these systems, which can create quickly and cheaply written material on any topic and in any style, are likely to threaten the publishing industry in profound ways. Exactly how is spelled out in this great post by Suw Charman-Anderson on her Word Count blog. The key point is that large language models (LLMs) are able to generate huge quantities of material. The fact that much of it is poorly written makes things worse, because it becomes harder to find the good stuff[.]
[...] One obvious approach is to try to use AI against AI. That is, to employ automated vetting systems to weed out the obvious rubbish. That will lead to an expensive arms race between competing AI software, with unsatisfactory results for publishers and creators. If anything, it will only cause LLMs to become better and to produce material even faster in an attempt to fool or simply overwhelm the vetting AIs.
The real solution is to move to an entirely different business model, which is based on the unique connection between human creators and their fans. The true fans approach has been discussed here many times in other contexts, and once more reveals itself as resilient in the face of change brought about by rapidly-advancing digital technologies.
OpenAI could be fined up to $150,000 for each piece of infringing content:
Weeks after The New York Times updated its terms of service (TOS) to prohibit AI companies from scraping its articles and images to train AI models, it appears that the Times may be preparing to sue OpenAI. The result, experts speculate, could be devastating to OpenAI, including the destruction of ChatGPT's dataset and fines up to $150,000 per infringing piece of content.
NPR spoke to two people "with direct knowledge" who confirmed that the Times' lawyers were mulling whether a lawsuit might be necessary "to protect the intellectual property rights" of the Times' reporting.
Neither OpenAI nor the Times immediately responded to Ars' request to comment.
If the Times were to follow through and sue ChatGPT-maker OpenAI, NPR suggested that the lawsuit could become "the most high-profile" legal battle yet over copyright protection since ChatGPT's explosively popular launch. This speculation comes a month after Sarah Silverman joined other popular authors suing OpenAI over similar concerns, seeking to protect the copyright of their books.
[...] In April, the News Media Alliance published AI principles, seeking to defend publishers' intellectual property by insisting that generative AI "developers and deployers must negotiate with publishers for the right to use" publishers' content for AI training, AI tools surfacing information, and AI tools synthesizing information.
Previously:
Sarah Silverman Sues OpenAI, Meta for Being "Industrial-Strength Plagiarists" - 20230711
Related:
The Internet Archive Reaches An Agreement With Publishers In Digital Book-Lending Case - 20230815
New York Times Sues Microsoft, ChatGPT Maker OpenAI Over Copyright Infringement
The New York Times on Wednesday filed a lawsuit against Microsoft and OpenAI, the company behind popular AI chatbot ChatGPT, accusing the companies of creating a business model based on "mass copyright infringement," stating their AI systems "exploit and, in many cases, retain large portions of the copyrightable expression contained in those works:"
Microsoft both invests in and supplies OpenAI, providing it with access to the Redmond, Washington, giant's Azure cloud computing technology.
The publisher said in a filing in the U.S. District Court for the Southern District of New York that it seeks to hold Microsoft and OpenAI to account for the "billions of dollars in statutory and actual damages" it believes it is owed for the "unlawful copying and use of The Times's uniquely valuable works."
[...] The Times said in an emailed statement that it "recognizes the power and potential of GenAI for the public and for journalism," but added that journalistic material should be used for commercial gain with permission from the original source.
"These tools were built with and continue to use independent journalism and content that is only available because we and our peers reported, edited, and fact-checked it at high cost and with considerable expertise," the Times said.
Media outlets are calling foul play over AI companies using their content to build chatbots. They may find friends in the Senate:
Logo text More than a decade ago, the normalization of tech companies carrying content created by news organizations without directly paying them — cannibalizing readership and ad revenue — precipitated the decline of the media industry. With the rise of generative artificial intelligence, those same firms threaten to further tilt the balance of power between Big Tech and news.
On Wednesday, lawmakers in the Senate Judiciary Committee referenced their failure to adopt legislation that would've barred the exploitation of content by Big Tech in backing proposals that would require AI companies to strike licensing deals with news organizations.
Richard Blumenthal, Democrat of Connecticut and chair of the committee, joined several other senators in supporting calls for a licensing regime and to establish a framework clarifying that intellectual property laws don't protect AI companies using copyrighted material to build their chatbots.
[...] The fight over the legality of AI firms eating content from news organizations without consent or compensation is split into two camps: Those who believe the practice is protected under the "fair use" doctrine in intellectual property law that allows creators to build upon copyrighted works, and those who argue that it constitutes copyright infringement. Courts are currently wrestling with the issue, but an answer to the question is likely years away. In the meantime, AI companies continue to use copyrighted content as training materials, endangering the financial viability of media in a landscape in which readers can bypass direct sources in favor of search results generated by AI tools.
[...] A lawsuit from The New York Times, filed last month, pulled back the curtain behind negotiations over the price and terms of licensing its content. Before suing, it said that it had been talking for months with OpenAI and Microsoft about a deal, though the talks reached no such truce. In the backdrop of AI companies crawling the internet for high-quality written content, news organizations have been backed into a corner, having to decide whether to accept lowball offers to license their content or expend the time and money to sue in a lawsuit. Some companies, like Axel Springer, took the money.
It's important to note that under intellectual property laws, facts are not protected.
Also at Courthouse News Service and Axios.
Related:
- New York Times Sues Microsoft, ChatGPT Maker OpenAI Over Copyright Infringement
- Report: Potential NYT lawsuit could force OpenAI to wipe ChatGPT and start over
- Writers and Publishers Face an Existential Threat From AI: Time to Embrace the True Fans Model
Microsoft is working with media startup Semafor to use its artificial intelligence chatbot to help develop news stories—part of a journalistic outreach that comes as the tech giant faces a multibillion-dollar lawsuit from the New York Times.
As part of the agreement, Microsoft is paying an undisclosed sum of money to Semafor to sponsor a breaking news feed called "Signals." The companies would not share financial details, but the amount of money is "substantial" to Semafor's business, said a person familiar with the matter.
[...] The partnerships come as media companies have become increasingly concerned over generative AI and its potential threat to their businesses. News publishers are grappling with how to use AI to improve their work and stay ahead of technology, while also fearing that they could lose traffic, and therefore revenue, to AI chatbots—which can churn out humanlike text and information in seconds.
The New York Times in December filed a lawsuit against Microsoft and OpenAI, alleging the tech companies have taken a "free ride" on millions of its articles to build their artificial intelligence chatbots, and seeking billions of dollars in damages.
[...] Semafor, which is free to read, is funded by wealthy individuals, including 3G capital founder Jorge Paulo Lemann and KKR co-founder Henry Kravis. The company made more than $10 million in revenue in 2023 and has more than 500,000 subscriptions to its free newsletters. Justin Smith said Semafor was "very close to a profit" in the fourth quarter of 2023.
Related stories on SoylentNews:
AI Threatens to Crush News Organizations. Lawmakers Signal Change Is Ahead - 20240112
New York Times Sues Microsoft, ChatGPT Maker OpenAI Over Copyright Infringement - 20231228
Microsoft Shamelessly Pumping Internet Full of Garbage AI-Generated "News" Articles - 20231104
Google, DOJ Still Blocking Public Access to Monopoly Trial Docs, NYT Says - 20231020
After ChatGPT Disruption, Stack Overflow Lays Off 28 Percent of Staff - 20231017
Security Risks Of Windows Copilot Are Unknowable - 20231011
Microsoft AI Team Accidentally Leaks 38TB of Private Company Data - 20230923
Microsoft Pulls AI-Generated Article Recommending Ottawa Food Bank to Tourists - 20230820
A Jargon-Free Explanation of How AI Large Language Models Work - 20230805
the Godfather of AI Leaves Google Amid Ethical Concerns - 20230502
The AI Doomers' Playbook - 20230418
Ads Are Coming for the Bing AI Chatbot, as They Come for All Microsoft Products - 20230404
Deepfakes, Synthetic Media: How Digital Propaganda Undermines Trust - 20230319
The day after The New York Times sued OpenAI for copyright infringement, the author and systems architect Daniel Jeffries wrote an essay-length tweet arguing that the Times "has a near zero probability of winning" its lawsuit. As we write this, it has been retweeted 288 times and received 885,000 views.
"Trying to get everyone to license training data is not going to work because that's not what copyright is about," Jeffries wrote. "Copyright law is about preventing people from producing exact copies or near exact copies of content and posting it for commercial gain. Period. Anyone who tells you otherwise is lying or simply does not understand how copyright works."
[...] Courts are supposed to consider four factors in fair use cases, but two of these factors tend to be the most important. One is the nature of the use. A use is more likely to be fair if it is "transformative"—that is, if the new use has a dramatically different purpose and character from the original. Judge Rakoff dinged MP3.com as non-transformative because songs were merely "being retransmitted in another medium."
In contrast, Google argued that a book search engine is highly transformative because it serves a very different function than an individual book. People read books to enjoy and learn from them. But a search engine is more like a card catalog; it helps people find books.
The other key factor is how a use impacts the market for the original work. Here, too, Google had a strong argument since a book search engine helps people find new books to buy.
[...] In 2015, the Second Circuit ruled for Google. An important theme of the court's opinion is that Google's search engine was giving users factual, uncopyrightable information rather than reproducing much creative expression from the books themselves.
[...] Recently, we visited Stability AI's website and requested an image of a "video game Italian plumber" from its image model Stable Diffusion.
[...] Clearly, these models did not just learn abstract facts about plumbers—for example, that they wear overalls and carry wrenches. They learned facts about a specific fictional Italian plumber who wears white gloves, blue overalls with yellow buttons, and a red hat with an "M" on the front.
These are not facts about the world that lie beyond the reach of copyright. Rather, the creative choices that define Mario are likely covered by copyrights held by Nintendo.
OpenAI has asked a federal judge to dismiss parts of the New York Times' copyright lawsuit against it, arguing that the newspaper "hacked" its chatbot ChatGPT and other artificial-intelligence systems to generate misleading evidence for the case:
OpenAI said in a filing in Manhattan federal court on Monday that the Times caused the technology to reproduce its material through "deceptive prompts that blatantly violate OpenAI's terms of use."
"The allegations in the Times's complaint do not meet its famously rigorous journalistic standards," OpenAI said. "The truth, which will come out in the course of this case, is that the Times paid someone to hack OpenAI's products."
OpenAI did not name the "hired gun" who it said the Times used to manipulate its systems and did not accuse the newspaper of breaking any anti-hacking laws.
[...] Courts have not yet addressed the key question of whether AI training qualifies as fair use under copyright law. So far, judges have dismissed some infringement claims over the output of generative AI systems based on a lack of evidence that AI-created content resembles copyrighted works.
Also at The Guardian, MSN and Forbes.
Previously:
- New York Times Sues Microsoft, ChatGPT Maker OpenAI Over Copyright Infringement
- Report: Potential NYT lawsuit could force OpenAI to wipe ChatGPT and start over
- Why the New York Times Might Win its Copyright Lawsuit Against OpenAI
- AI Threatens to Crush News Organizations. Lawmakers Signal Change Is Ahead
(Score: 0) by Anonymous Coward on Wednesday March 13 2024, @03:33PM
First, thanks Jan for combining these.
After reading, LLMs Become More Covertly Racist With Human Intervention, for politically incorrect lulz I wondered if anyone has hooked up something like this as an alternative front end for ChatGPT?
(input text) | Jive_Filter* | ChatGPT
* https://knowyourmeme.com/memes/jive-filters [knowyourmeme.com] (Various different versions have been created)
(Score: 4, Interesting) by JoeMerchant on Wednesday March 13 2024, @04:03PM (8 children)
In 1984, I wrote a BBS "user bot" that would sign in a new user account on particular local BBSs that did not require authentication before allowing posting. The bot would then navigate to the message boards and start posting AI looking randomly constructed sentences (various structures like: Noun-Verb-Adverb-Preposition-Definite Article-Noun. Preposition-Definite Article-Noun-Verb-Adverb. etc.) populated from word lists scraped from other messages on the board. Since the BBSs were implemented on floppy drive storage, even on a 300 baud modem you could fill the message storage space rather quickly.
Sysops of the bot targets were philosophically committed to allowing anonymous postings, so they never denied access, and they rarely had enough programming skill to make anything resembling an effective Captcha, but they were committed to monitoring their boards, so they'd often make the modem audible and listen to the activity, even at 3:30am...
So, in typical arms-race fashion, I trained my bot to type the messages in at human-like cadence. Randomized delays between letters, extra pause length after most commas or periods and paragraph breaks. That was surprisingly effective at fooling the sysops into letting the bot-messages get posted. I never did get clever enough to do QUERTY specific delay tuning (less delay between keys on different hands, longer delays for keys handled by the pinkies, etc.) - didn't seem to need to be that clever. Of course, once the sysops got around to reading the bot-posts they'd eventually clue in and delete them, but that was a much longer interval, and when the word lists were populated with regularly seen names and places and other verbiage from "the community" sometimes those posts stayed up for a week or more. I wasn't the only such bot writer, but I think I was the first in our area. Watching the copycats spread was very satisfying.
Point?
>banned all employees from image synthesis rival Stability AI from its service indefinitely after it detected "botnet-like" activity suspected to be a Stability employee
Rookie mistake. 16 year olds 40 years ago learned to mask their bots so they appear like human users. Surely if you are playing in the big leagues for significant money you'd make the effort to "stealth" your bots. I bet they already have, less than a week after the banning.
🌻🌻 [google.com]
(Score: 5, Touché) by drussell on Wednesday March 13 2024, @04:31PM (1 child)
Ahh... So you were the kind of ass-hat that caused us to have to start using things like call-back-verification and tiered or limited access until manually verified, etc...
Thanks, ever so much for contributing to that kind of bullshit. /sarc 🙄
(Score: 2, Interesting) by JoeMerchant on Wednesday March 13 2024, @05:52PM
> the kind of ass-hat that caused us to have to start using things like call-back-verification and tiered or limited access until manually verified, etc...
As you should have been from day one.... at least I wasn't the ass-hat logging in manually and filling the message boards with racist hate speech, those users were around too, and in greater numbers, though they didn't usually log in at 4am.
Now, you have to remember, at this time there were boards run by all kinds of Sysops, and some of them weren't bright at all. One had coded his own system and insisted on assigning "secure" and un-editable passwords to new users. The user numbers were publicly displayed along with their handles. The "secure" password algorithm was something like pass = "(O*Ue" + userNumber * 3 + "Xs5"; The Sysop genuinely believed himself when he said that the assigned passwords were secure and unhackable.
Anonymous login with posting privileges never ends well... even on SN.
🌻🌻 [google.com]
(Score: 2) by Mojibake Tengu on Wednesday March 13 2024, @04:42PM (5 children)
Real sysops (like me) used to use 20MB hard drive just for tossing and echomail.
We knew our idiots even then already. If you did this to me, you got a lot of random disconnects...
Rust programming language offends both my Intelligence and my Spirit.
(Score: 3, Interesting) by JoeMerchant on Wednesday March 13 2024, @05:59PM (4 children)
>If you did this to me, you got a lot of random disconnects...
That was all part of the (teenage) game. How long can your bot run before the sysop notices? Nobody had anything resembling caller ID.
>Real sysops (like me) used to use 20MB hard drive
As I recall, around/before 1984 a 20MB hard drive cost about $5000+. Most of our BBS systems were in the $1000 range, about the price of your typical high school starter/beater car.
In the later 80s the whole FidoNet / early internet thing started making the 300 baud dialup systems look pitiful.
🌻🌻 [google.com]
(Score: 2) by DannyB on Wednesday March 13 2024, @07:19PM
You recall correctly. Boy do I remember that. But we were using it for accounting software where the cost of a shared hard drive and a few microcomputers with our software could be priced around $15K and be way cheaper than an IBM System 36.
You couldn't get real databases on micro computers at that time. We had to build our own.
Poverty exists not because we cannot feed the poor, but because we cannot satisfy the rich.
(Score: 3, Interesting) by drussell on Wednesday March 13 2024, @08:16PM (2 children)
Well yeah, since we sysops were mostly all using Courier HSTs at 9600 bps in 1986.
Nobody was using 300 baud "in the late 80s." Many BBSes were 1200 baud minimum to even connect by 1984/85. (I was lucky, as I had access to a Hayes SmartModem 2400. Power User!)
I lent the old Capetronic 2400 to a friend for his BBS since all he had was a 1200. He even ran that thing two-line for a while with the 2400 on the main line and the 1200 on the 2nd line. LOL!
I bought my first 14,400 version HST in '89 right after they finally came out. IIRC I paid something like $550 CAD to upgrade to that thing after exchange on the USR sysop program. It wasn't cheap. That one was only ever upgradeable to 16,800 HST, but all my later Couriers' hardware did everything up to V.90 with the correct, updated firmware loaded.
(Score: 2) by JoeMerchant on Wednesday March 13 2024, @08:58PM
>Nobody was using 300 baud "in the late 80s."
Agreed, depending on where you draw the "late 80s" line... there certainly were a lot of 300 baud BBSs operating (more than 1200 baud, in my local calling area) in the 1984 timeframe.
I forget if I owned a 300 baud modem at any point or not, I know I used a lot of 300 baud connections because I could burst-type just a bit faster than the connection could take the characters, and buffering wasn't so great in software (and hardware) of the day.
🌻🌻 [google.com]
(Score: 2) by JoeMerchant on Wednesday March 13 2024, @09:00PM
>were mostly all using Courier HSTs at 9600 bps in 1986.
Timeline wise, I ran a part-time BBS from the summer of 1984 through the spring of 1985, and concluded that I would never bother trying to run another BBS again unless I had a dedicated phone line for it.
🌻🌻 [google.com]
(Score: 3, Informative) by Opportunist on Wednesday March 13 2024, @04:17PM (2 children)
over ... another one.
You at least acknowledge the slew of pointless AI hype.
Thanks!
(Score: 2) by DannyB on Wednesday March 13 2024, @07:23PM (1 child)
The worst is endless questions of the genre: How soon before AI becomes more intelligent than humans and takes over? Can AI rewrite its own code? Isn't AI just rules programmed by a human? Can AI spontaneously become conscious and decide to kill all humans?
Poverty exists not because we cannot feed the poor, but because we cannot satisfy the rich.
(Score: 3, Informative) by quietus on Thursday March 14 2024, @07:43AM
You've made a typo there, I think -- these are really the best stories on the 'Net today. A fine example is an article on the Green Site of this morning, Cognition Emerges From Stealth To Launch AI Software Engineer ‘Devin’ [slashdot.org]. That is plainly a brilliant write-up, and the submitter deserves a Nobel Prize in Literature or something.
The trick is to think one step ahead.
The effect of all these stories is already that people are choosing not to get into an analyst/programmer/developer career, because that's all going to be taken over by AI. Instead they're gonna become data scientists, which I take as basic statistics (if that) combined with learning to use Excell. Technical jobs like HVAC technician or --gasp -- car mechanic are also out of the window: when Hollywood-style AI is here, human-like robots cannot be far behind.
The end result will be that everyone who was/is only looking for a quick buck is going to move out of the technically more demanding professions, and somewhere else -- probably into something with a lot of meetings. Competent technical people, meanwhile, will only see their salaries rise -- probably with a bit of fire-extinguishing bonuses added.
So, DannyB, rejoice, take one for the team, and start writing a sub about how Generalized Artificial Intelligence is just around the corner -- bonus points if you manage to smuggle a bitcoin-conquers-world and take-your-investment-advice-from-reddit sidekick in there: a plain of gold is waiting.
(Score: 3, Interesting) by JoeMerchant on Wednesday March 13 2024, @04:29PM (1 child)
I'm randomly generating unique per device initial passwords to be set on our (1000+ units per year sold) devices. I had a thought: if these passwords ever get fat-fingered, they'd be a lot less error prone to use the https://xkcd.com/936/ [xkcd.com] "correct.horse.battery.staple" approach than the "*T4ht#73-H7z" approach. So, I went out and found an open source list of "commonly used words" and filtered it down. First: only keep words 3 to 6 characters in length. Next: pull out non-English words, then pull out proper nouns (things that should be capitalized according to the Oxford English Dictionary) - don't want "Iraqis.cocks.stink" as a factory set password. Obviously, also take out the George Carlin list words - self explanatory. That left over 10,000 words, but the combinations were still too easily offensive. Seems that commonly used speech (such as found in the public domain Enron evidence e-mail dump) is filled with nigger balls, juicy cunts, shamed tarts, etc. O.K. - let's keep the priest, mullah, pope, nun, etc. out of the mix, there were probably 50 or more "religious" words in the remaining common list. Then running a "level down" from Carlin on Campus to words that were easily misconstrued, derogatory, or potentially violent (dead, died, death, kill, kills, killed, shoot, shot, gun, guns, knife, knives, knifed, bomb, bombs, bombed...) took the list down to 7000 words, and still some questionable combinations were possible. I tried refilling the list with more benign word lists like (the safer) types of bird, fish, pets, farm animals, element names, geometric shapes, etc. and finally just a vetted list of 3, 4 and 5 letter words from a scrabble dictionary, and... those 7000 remaining "common words" still come up with "what.she.said", "into.your.what", "eats.butt.loads" etc.
Conclusion: it is not possible to randomly generate 3 word combinations from a list of significant size without offending someone, somewhere. Which makes me wonder what https://what3words.com/sorry.liked.decay [what3words.com] does to keep their 57 trillion combinations "PC"?
🌻🌻 [google.com]
(Score: 2) by HiThere on Thursday March 14 2024, @12:08AM
You could use a combination of Project Gutenberg and a bunch of blogs that aren't for profit. Even so, the way copyright laws are written these days, that could be dubious. Perhaps use things like a bunch of mailing lists, and have them email the stuff to you.
Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
(Score: 4, Interesting) by khallow on Wednesday March 13 2024, @05:04PM (8 children)
Or copyright might be radically changed to accommodate AI products. Or AI might get enough market power to cause content providers to knuckle under. Both these assume that AI providers get enough market or political power.
Finally, AI products might get their own content providers who provide the content to AIs not humans.
Moving on, the covert racism thing is untenable. First, it consists of extremely maneuverable goalposts. Even if somehow the programmers were able to train the alleged bias out of the current models, the researchers with the covert racism criticism would easily be able to come up with more. Covert racism is only worth so much bother.
Second, reality is covert racist. For example, why did the researchers assume that Ebonics speech (spoken on a regular basis) was a racial indicator? Because it is. And when African Americans, the group most associated with Ebonics speech, is far more likely to see prison (I've heard that 20-25% of black males in the US will see some prison time during their lifetime), then there will be some degree of correlation between the speech and criminal activity in general, because there is such in real life. Probably speaking Ebonics on a regular basis is a similar indicator for poverty or lack of higher education.
(Score: 4, Insightful) by JoeMerchant on Wednesday March 13 2024, @07:44PM
>Or copyright might be radically changed to accommodate AI products.
I'll agree that copyright needs radical change - or maybe not so radical, maybe just revert to the expiry periods we had _before_ computers, population growth, and low cost global communication accelerated copyrightable material production by so many orders of magnitude...
>Even if somehow the programmers were able to train the alleged bias out of the current models
Colorblindness is a good start. Unfortunately, there's a distinct lack of colorblind training data out there.
>Covert racism is...
whatever the plaintiffs deem it to be, they'll "know it when they see it." Like pornography, it will provide endless entertainment for lawyers, judges, legislators, politicians, pundits, journalists, and people of every race who have nothing better to do than whine about what advantages / disadvantages "their people" have, had, might have in the future, etc.
Racism is just one tiny part of discrimination. Personally, I feel that h. sapiens is unforgivably discriminatory against our intelligent co-habitants of Earth, not only the mammals but also invertebrates like octopi and squid...
🌻🌻 [google.com]
(Score: 2) by Tork on Wednesday March 13 2024, @09:21PM (6 children)
It might be some flavor of futility, but it is worth the 'so much bother' for the tasks they're eventually going to want to sic this software on, especially if the intent is to remove humans from the workflow. If it cannot realistically be done that needs to be established sooner rather than later before PHBs with dollar signs in their eyes start using flimsy rationale like 'computers cannot be racist'.
🏳️🌈 Proud Ally 🏳️🌈
(Score: 2) by JoeMerchant on Wednesday March 13 2024, @09:40PM
>PHBs with dollar signs in their eyes start using flimsy rationale like 'computers cannot be racist'.
I believe that happened the day the first PHB had any kind of connection to computer output...
🌻🌻 [google.com]
(Score: 1) by khallow on Wednesday March 13 2024, @11:46PM (4 children)
Seems like a situation that wouldn't go any better with humans than with computers. The big difference is that you can't legally interrogate a person like you can an AI oracle. But if you're restricted in what you can ask the computer, you probably can get similar protection.
(Score: 2) by Tork on Wednesday March 13 2024, @11:54PM (3 children)
🏳️🌈 Proud Ally 🏳️🌈
(Score: 1) by khallow on Thursday March 14 2024, @12:46AM (2 children)
(Score: 2) by Tork on Thursday March 14 2024, @02:51AM (1 child)
🏳️🌈 Proud Ally 🏳️🌈
(Score: 1) by khallow on Thursday March 14 2024, @05:11AM
Did that happen here? Didn't look like there was missing data. Rather there is alleged to be racial/ethnic bias in the sources used. But they haven't actually established that there was such bias. It was merely assumed. Reading around, I think there's a strong case to be made that there's a strong bias against the language itself not based on ethnicity, but on its deviations from official English standards. For example [reddit.com]:
There's a lot of proper English grammar nazis out there. They won't be happy with either the simplifications (such as collapsing "because" to "cuz" or using "be" for multiple tenses of "is") or the complications (such as "feel" to "be feelin"). My take also is that language complexity and cleverness is social signaling for intelligence and education - like carrying a $1000 purse is social signaling for wealth. Ebonics seems to follow a similar signaling route, but with different emphases. Such signals get crossed.
(Score: 0) by Anonymous Coward on Wednesday March 13 2024, @07:08PM (1 child)
It's what Twatter crave.
I can't block this crap fast enough.
(Score: 3, Insightful) by DannyB on Wednesday March 13 2024, @07:29PM
AI is to make robots more like humans.
VR is to make humans more like robots.
So they cancel each other.
Poverty exists not because we cannot feed the poor, but because we cannot satisfy the rich.
(Score: 1, Touché) by Anonymous Coward on Wednesday March 13 2024, @07:40PM
"copyright might be radically changed to accommodate AI products"
and Disney might become a charity