The Association for Computing Machinery has a post by George Neville-Neil of FreeBSD fame comparing LLMs to drunken plagiarists [acm.org]:
Before trying to use these tools, you need to understand what they do, at least on the surface, since even their creators freely admit they do not understand how they work deep down in the bowels of all the statistics and text that have been scraped from the current Internet. The trick of an LLM is to use a little randomness and a lot of text to Gauss the next word in a sentence. Seems kind of trivial, really, and certainly not a measure of intelligence that anyone who understands the term might use. But it's a clever trick and does have some applications.
[...] While help with proper code syntax is a boon to productivity (consider IDEs that highlight syntactical errors before you find them via a compilation), it is a far cry from SEMANTIC knowledge of a piece of code. Note that it is semantic knowledge that allows you to create correct programs, where correctness means the code actually does what the developer originally intended. KV can show many examples of programs that are syntactically?but not semantically?correct. In fact, this is the root of nearly every security problem in deployed software. Semantics remains far beyond the abilities of the current AI fad, as is evidenced by the number of developers who are now turning down these technologies for their own work.
He continues by pointing out how LLMs are not only based on plagiarism, they are unable provide useful annotation in the comments or otherwise address the semantics of the code they swipe.
Previously:
(2024) Make Illegally Trained LLMs Public Domain as Punishment [soylentnews.org]
(2024) The Open Secret Of Open Washing [soylentnews.org]
(2023) A Jargon-Free Explanation of How AI Large Language Models Work [soylentnews.org]
(2019) AI Training is *Very* Expensive [soylentnews.org]
… and many more.