AI techniques in medical imaging may lead to incorrect diagnoses:
A team of researchers, led by the University of Cambridge and Simon Fraser University, designed a series of tests for medical image reconstruction algorithms based on AI and deep learning, and found that these techniques result in myriad artefacts, or unwanted alterations in the data, among other major errors in the final images. The effects were typically not present in non-AI based imaging techniques.
The phenomenon was widespread across different types of artificial neural networks, suggesting that the problem will not be easily remedied. The researchers caution that relying on AI-based image reconstruction techniques to make diagnoses and determine treatment could ultimately do harm to patients. Their results are reported in the Proceedings of the National Academy of Sciences.
"There's been a lot of enthusiasm about AI in medical imaging, and it may well have the potential to revolutionise modern medicine: however, there are potential pitfalls that must not be ignored," said Dr Anders Hansen from Cambridge's Department of Applied Mathematics and Theoretical Physics, who led the research with Dr Ben Adcock from Simon Fraser University. "We've found that AI techniques are highly unstable in medical imaging, so that small changes in the input may result in big changes in the output."
A typical MRI scan can take anywhere between 15 minutes and two hours, depending on the size of the area being scanned and the number of images being taken. The longer the patient spends inside the machine, the higher resolution the final image will be. However, limiting the amount of time patients spend inside the machine is desired, both to reduce the risk to individual patients and to increase the overall number of scans that can be performed.
Using AI techniques to improve the quality of images from MRI scans or other types of medical imaging is an attractive possibility for solving the problem of getting the highest quality image in the smallest amount of time: in theory, AI could take a low-resolution image and make it into a high-resolution version. AI algorithms 'learn' to reconstruct images based on training from previous data, and through this training procedure aim to optimise the quality of the reconstruction. This represents a radical change compared to classical reconstruction techniques that are solely based on mathematical theory without dependency on previous data. In particular, classical techniques do not learn.
[...] The researchers are now focusing on providing the fundamental limits to what can be done with AI techniques. Only when these limits are known will we be able to understand which problems can be solved. "Trial and error-based research would never discover that the alchemists could not make gold: we are in a similar situation with modern AI," said Hansen. "These techniques will never discover their own limitations. Such limitations can only be shown mathematically."
Journal Reference
Vegard Antun, Francesco Renna, Clarice Poon, et al. On instabilities of deep learning in image reconstruction and the potential costs of AI [$], Proceedings of the National Academy of Sciences (DOI: 10.1073/pnas.1907377117)
(Score: 2) by Rosco P. Coltrane on Wednesday May 13 2020, @02:24PM (3 children)
When my doctor is drunk, he tells me bullcrap. Honestly, I go back to see him because he's a laugh, I've known him for years and I'm not a hypocondriac. But he really should lay off the sauce: I can't count the number of times he told me I may have something serious when even I could tell I didn't. Would an AI do better than him when he's boozed up? Hell yeah...
Both real and artificial intelligence are fuzzy logic systems. By definition, they give false positives and false negatives. The two criteria that matter are:
- Which has the lowest percentage of false positives or negatives.
- Which has the lowest percentage of false positives or negatives that lead to devastating life-changing consequences - i.e. which of the real or artificial doctor tells healthy patients they have 2 months left to live more often.
(Score: 1, Touché) by Anonymous Coward on Wednesday May 13 2020, @02:47PM
Thanks for that fascinating anecdote with the all important HELLS YEAH proof I needed. I'll be happy to use the AI on your kids next time they get sick.
(Score: 1, Touché) by Anonymous Coward on Wednesday May 13 2020, @02:58PM (1 child)
Artificial stupidity is just as incurable as the natural one.
(Score: 3, Touché) by c0lo on Wednesday May 13 2020, @03:08PM
False. Counter-example: Ethanol-Fuelled.
Correction
https://www.youtube.com/watch?v=aoFiw2jMy-0
(Score: 2) by looorg on Wednesday May 13 2020, @02:52PM (1 child)
Clearly they need some professional human input -- perhaps a "cancer-captcha" or whatnot -- please click all the images with cancer and then you get some kind of reward.
(Score: 1, Insightful) by Anonymous Coward on Wednesday May 13 2020, @03:00PM
The funny thing is you think you're joking.
(Score: 3, Interesting) by Anonymous Coward on Wednesday May 13 2020, @03:08PM (11 children)
Literally, automated lying.
These things are fine and dandy in artsy contexts, but slipping them into anything related to natural sciences is crazy at best, malicious at worst.
(Score: 2) by JoeMerchant on Wednesday May 13 2020, @04:23PM (5 children)
These things have been out performing humans since the 1990s. Literally, in 1998, I stood in Rockville M.D. at FDA show and tell day with my crazy medical device, between a knee repair drill and epoxy system on one side, and an automated cancer screener on the other side. The automated system had better false positive and false negative scores than the humans who were routinely performing the screenings at the time - and, better, it was more consistent: humans would have good days and bad days, so if you tested a human on a good day, their actual performance on the job was measurably and often significantly worse.
Like all things, before you do something like getting a lung removed, you want to have a second, third - maybe fourth opinion on the reading, additional readings, multiple modalities of confirmation, etc. you don't just run out and make major medical decisions off of what one glance at a diagnostic image tells you.
If you want to slip on the tinfoil hat, these systems can be tweaked to err on the side of minimal false negatives, at the expense of additional false positives. That's an expense mostly borne by the patients since the other side of the relationship gets reimbursed for every additional test and procedure performed. Still - the automated systems are more consistent than human readers, so if such undesirable biases are present, they can simply be tuned out of the system (by those in control) instead of having to retrain a bunch of arrogant overpaid sons of bitches (aka radiologists.) This is why it's important to put "good" people in control, the BEST people, and keep the Cheetos where they belong: in an entertainment snack bowl.
Україна досі не є частиною Росії Слава Україні🌻 https://www.pravda.com.ua/eng/news/2023/06/24/7408365/
(Score: 2) by captain normal on Wednesday May 13 2020, @05:08PM (4 children)
" ...small changes in the input may result in big changes in the output..."
Garbage in, garbage out. If you use AI programs to flag a potential problem, that is one thing. But in my opinion a trained radiologist should always check the output.
"It is easier to fool someone than it is to convince them that they have been fooled" Mark Twain
(Score: 2) by JoeMerchant on Wednesday May 13 2020, @05:22PM
The problem is, we're doing too much screening for the trained radiologists to handle. If they'd open up training for more radiologists, I'd agree - but the present system asks too much of them, and even when radiologists have all the time in the world to examine an image, they're going to miss stuff that the AI will catch.
AI for screening, hope it doesn't get too many false negatives. Human radiologist confirmation before costly and especially irreversible interventions. Using meat bags to screen all the routine stuff just doesn't make sense, the AI is too good and the humans are too flaky when you look at large volumes of work.
As for this article, speeding up MR scans using AI to "fill in the gaps" - that all depends... if you're in a 3T magnet and you could get away with the kind of image that your average strip mall 0.3T permanent magnet imager provides, sure... interpolate away, you'll still get a better image in way less than half the time. If, on the other hand, you really need the normal 3T resolution (and, truth be told, for most things you want even better than normal 3T resolution) then this interpolation is just a sharper looking kind of blurry. It's important that anyone interpreting the image knows it's been "sharped up by AI" - it's probably a great deal easier to interpret than the blocky blurry thing you'd get without the AI sharping, but it's nowhere near as diagnostically valuable as the full resolution image it's trying to look like.
Україна досі не є частиною Росії Слава Україні🌻 https://www.pravda.com.ua/eng/news/2023/06/24/7408365/
(Score: 2) by darkfeline on Wednesday May 13 2020, @09:23PM (1 child)
> small changes in the input may result in big changes in the output
Uh, of course that's true. The tiniest speck can mean the difference between healthy and cancer. "If we add random specks, the AI gives drastically different results, hence the AI must be bad."
Join the SDF Public Access UNIX System today!
(Score: 2) by captain normal on Thursday May 14 2020, @04:55AM
No, not bad, just not quite ready for prime time.
"It is easier to fool someone than it is to convince them that they have been fooled" Mark Twain
(Score: 0) by Anonymous Coward on Wednesday May 13 2020, @09:53PM
Maybe they need to take the pictures when it is sunny, rather than cloudy or night. We all know how that worked for the Patriot Missile system.
(Score: 2) by DeathMonkey on Wednesday May 13 2020, @05:27PM (4 children)
Thanks for the reminder than SN is more of an anti-science forum than a science forum.
(Score: 1) by khallow on Wednesday May 13 2020, @06:43PM (2 children)
A single troll from an AC confirms DeathMonkey's biases - "Thanks for the reminder". It's interesting how much of the problem the most vocal pro-science posters are.
The first thing I wondered when reading this story is whose AI was causing the problem. Since it is claimed that the problem is "widespread", it then becomes a matter of what general principle is causing the problem. The AC claims to think it's an irreversible flaw of all AI. I suspect rather it's a characteristic of scale invariance - that's widespread and has the potential for noise (or worse deliberately introduced) artifacts that look like important features to throw the AI program.
Any diagnostic tool/observer whether human or AI needs to be robust to moderate perturbations of images and the like. Perhaps existing or near future neural nets can be trained on these more difficult cases to improve their robustness? If that can be successful, then this is merely fixing a training oversight with no serious long term consequence to the use of these tools.
(Score: 2) by DeathMonkey on Wednesday May 13 2020, @07:28PM (1 child)
No, a pattern I have been observing since this site was created has led me to that conclusion.
(Score: 1) by khallow on Wednesday May 13 2020, @07:36PM
(Score: 0) by Anonymous Coward on Wednesday May 13 2020, @07:32PM
Your failure at reading comprehension is noted as well.
In the interest of science, which part(s) of "AI could take a low-resolution image and make it into a high-resolution version. AI algorithms 'learn' to reconstruct images based on training from previous data" are too hard for you to understand?
(Score: 0) by Anonymous Coward on Wednesday May 13 2020, @03:27PM (6 children)
This is like looking at patterns in the clouds. Sure, maybe you'll see boobs but that doesn't make God a pervert.
Seriously. AI is a pattern matching system. The less details you have, the more garbage will come out. AI doesn't make your camera infinite resolution!
https://www.youtube.com/watch?v=I_8ZH1Ggjk0 [youtube.com]
(Score: 0) by Anonymous Coward on Wednesday May 13 2020, @03:45PM
But imagine if it could. The future now. That's why we at $company believe in the power of YOU to take us on a journey. Together.
(Score: 2) by All Your Lawn Are Belong To Us on Wednesday May 13 2020, @04:22PM (3 children)
Yeah, but this isn't just about image interpretation, where pattern matching is important. This is about image construction. This is about trying to take shortcuts in what should be a deterministic process, namely gathering data and sequencing it to become information in the form of images. The end goal is not just making patients more comfortable by reducing bore time (a laudable goal) but also being able to perform more scans per day and increase profitability (which also has its upside to patients but is also questionable). This really isn't a place for a machine to make judgement calls to interpolate data, when by doing so that interpolation actually ends up yielding false data.
This sig for rent.
(Score: 4, Interesting) by JoeMerchant on Wednesday May 13 2020, @04:38PM (2 children)
What matters is the actual results.
Run the proposed system in parallel with the accepted systems. Measure changes in outcomes, decide if the benefits outweigh the costs.
If additional people dying is showing up in the costs column, the system should never get accepted. If people end up having to come back for a second scan 5% of the time, but throughput is increased 50%, that's probably a win overall.
I worked with a 3D scanning system that got direct data off of two intersecting planes, and these were of course the two planes of primary concern, but being a 3D system, there was additional volume off those planes that was still of some interest. We could interpolate cylinders around the line of intersection between the planes and get what would be a reasonable projection of what should be happening there, but not anything that was measured directly. There was a lot of debate about how to present this information, because on the one hand the interpolation made it much easier to visualize when treatment was sufficient and could be stopped before un-necessary collateral damage occurred. On the other hand, interpolation is "made up" information, not measured by the system directly but inferred from the measurements that are being made on the planes. Ultimately, the whole thing was put on the shelf and the "old" system continues to be used now 7 years after the advanced system was developed (different owners, different risk appetites), but in the interim we decided that the interpolated information should be shown but clearly labeled as such to avoid any misconception on the part of the surgeons that they were looking at actual measurements.
Україна досі не є частиною Росії Слава Україні🌻 https://www.pravda.com.ua/eng/news/2023/06/24/7408365/
(Score: 2) by All Your Lawn Are Belong To Us on Wednesday May 13 2020, @06:17PM (1 child)
Not quite sure what you mean by, "in parallel." This would be hard to do Level 1 randomized control trials on. Lest you have to say, "Yeah, sorry your father died of that tumor. If we'd only seen it in the initial scans we could have excised it, but that AI extrapolator just wasn't working correctly and your father was in the test group instead of the control group." One can't, AFAICT, take the images and then have the raw images interpreted, have the raw data reconstructed, and then interpreted again as a reconstruction and compare the interpretations, because the essence is that a different (faster) sampling rate is used in the AI generated reconstruction run. One could theoretically do two runs, one after the other at the different rates, but a big confounding variable is patient movement (even minor inadvertent movement) during the scan resulting in different data. Plus you encounter an ethical question about having the patient be in the test significantly longer than normal. (At least there isn't an ionizing radiation concern like a CT or Fluoro would have).
So we end up with the methodology that these researchers used - take the raw data, introduce extremely small data anomalies to the image and then see what the interpretation algorithms do to that known data to see if they distort them significantly in a way that an interpreter might not be able to recognize it.
I'm not saying the study was correct, just that this is what they seem to have done and found. Maybe this does open the door to allow more active testing of the current algorithms. (And that's the other aspect... this certainly reads like these are currently-in-use algorithms that are being tested. So maybe there is need to raise concerns now rather than wait).
This sig for rent.
(Score: 2) by JoeMerchant on Wednesday May 13 2020, @06:48PM
The sequential scans should not be a major concern to the patients - they've already showed up, been prepped, etc. If they happen to be claustrophobic, then they should ethically be excused from the double scan process - there's no need to include claustrophobics in the study, even though they are one of the major beneficiaries of the fast scan process - and differentially the fast scan should be better on squirmy subjects.
If the sequential scans show an acceptable level of interpretation agreement, then there is no reason to believe that that a fast scan alone would be any worse than the fast scan obtained sequentially with a traditional scan.
Україна досі не є частиною Росії Слава Україні🌻 https://www.pravda.com.ua/eng/news/2023/06/24/7408365/
(Score: 3, Insightful) by DeathMonkey on Wednesday May 13 2020, @05:29PM
Imagine that, observing things to learn about them!
(Score: 2) by HiThere on Wednesday May 13 2020, @04:15PM (1 child)
What they're trying to do is take a low resolution image and turn it into a high resolution image without errors. That cannot be done. That the AI makes different guesses than the doctor about how to "fill in the blanks" doesn't make the doctor right. He's more likely to be right because he puts in into a larger context...which is the answer about "how to improve the AI", but he's inventing the missing details too, and can be wrong.
One can interpolate missing data based on experience, but that interpolation can always be wrong.
Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
(Score: 3, Interesting) by JoeMerchant on Wednesday May 13 2020, @04:40PM
Of course, the probability distributions of who's right more often depends entirely on the AI and the doctor you are comparing.
The one thing I will say in favor of AI is that it's more consistent and repeatable 7 days a week 24 hours a day, much less likely to rush a diagnosis that's requested when there's a hot date scheduled at the same time...
Україна досі не є частиною Росії Слава Україні🌻 https://www.pravda.com.ua/eng/news/2023/06/24/7408365/