Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 17 submissions in the queue.
posted by mrpg on Monday May 05, @10:18PM   Printer-friendly
from the use-lha dept.

Arthur T Knackerbracket has processed the following story:

Diallo says he made a 1MB file that decompresses into 1GB to disable bots trying to break into his system. He also has a 10MB-to-10GB compressed file for bots with more resources, ensuring that their memory is overwhelmed by this massive archive.

This is how this defensive bombing system works: when Diallo detects an offending bot, his server returns a 200 OK response and then serves up the zip bomb. The file’s metadata tells the bot that it’s a compressed file, so it will then open it in an attempt to scrape as much information as possible. However, since the file is at least 1GB when unpacked, it will overwhelm the memory of most simple — and even some advanced — bots. If he faces a more advanced scraper with a few gigabytes of memory, he’ll feed it the 10GB zip bomb, which will most likely crash it.

If you want to try this system for yourself, Diallo outlines how you can create your own bot-targeting zip bomb on his blog. He notes that you should be careful when doing that, though, as you can potentially self-detonate (i.e., accidentally open the zip bomb), and crash your own server. They’re also not 100% effective, as there are ways to detect zip and disregard zip bombs. But for most simple bots, this should be more than enough to cause its server to freeze and take it out — at least until its system is restarted.


Original Submission

This discussion was created by mrpg (5708) for logged-in users only. Log in and try again!
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 5, Funny) by psa on Monday May 05, @11:46PM (3 children)

    by psa (220) on Monday May 05, @11:46PM (#1402857) Homepage

    It's very kind of people like this to provide sanity checking tests to make tomorrow's bots more robust and intelligent about how they scrape content. This reminds of a previous article about people creating navigational traps. An excellent way, really, to ensure that bot makers produce proper unique content checks, depth and breadth checks on the site graph, and intelligent detection of when links are actually worth following.

    This is, after all, exactly the service that bug finders, app store audits, and even heuristic antivirus provide.

    • (Score: 3, Touché) by Anonymous Coward on Monday May 05, @11:58PM

      by Anonymous Coward on Monday May 05, @11:58PM (#1402859)

      Just the usual cat and mouse, but it's going from Tom and Jerry to Itchy and Scratchy Land. Please turn off your flash!

    • (Score: 1, Interesting) by Anonymous Coward on Tuesday May 06, @02:29AM (1 child)

      by Anonymous Coward on Tuesday May 06, @02:29AM (#1402872)

      Indeed, and one wonders whether they have done this and tested this scenario. Or if they're even downloading large binary attachments.

      Has this been validation-tested? Does crawling slow after they hit a large zip? Does it slow more than the proportional download size? Has there been any testing at all, prior to large binary blobs being offered to the world?

      Has there been any attempt to create a maximally-large dictionary for maximal memory-consumption while decompressing, or is this solely about the written-to-disk size of the extracted object?

      What is this idiot actually accomplishing? Does any part of this seem more advanced than a middle schooler?

      • (Score: 2, Touché) by Anonymous Coward on Tuesday May 06, @12:36PM

        by Anonymous Coward on Tuesday May 06, @12:36PM (#1402899)

        Indeed, and one wonders whether they have done this and tested this scenario. Or if they're even downloading large binary attachments.

        Has this been validation-tested? Does crawling slow after they hit a large zip? Does it slow more than the proportional download size? Has there been any testing at all, prior to large binary blobs being offered to the world?

        Has there been any attempt to create a maximally-large dictionary for maximal memory-consumption while decompressing, or is this solely about the written-to-disk size of the extracted object?

        What is this idiot actually accomplishing? Does any part of this seem more advanced than a middle schooler?

        Try reading the developer's linked blog post [idiallo.com], then get back to us.

  • (Score: 2) by VLM on Tuesday May 06, @01:17AM (2 children)

    by VLM (445) Subscriber Badge on Tuesday May 06, @01:17AM (#1402862)

    I wonder what various backup systems would think about this. I think the guy is safe but it might get interesting if they ever do a restore. Maybe.

    • (Score: 1, Informative) by Anonymous Coward on Tuesday May 06, @02:14AM (1 child)

      by Anonymous Coward on Tuesday May 06, @02:14AM (#1402871)
      Shouldn't be a problem for most backup/restore systems.

      Many malware scanning systems on the other hand might try to unpack it. Most shouldn't crash though, since zip bombs are ancient stuff.

      That said might be new to the generation of noobs taking over...
      • (Score: 4, Funny) by PiMuNu on Tuesday May 06, @11:40AM

        by PiMuNu (3823) on Tuesday May 06, @11:40AM (#1402895)

        Back in the noughties a work colleague found out about zip bombs and decided to see if it would take down our email server by sending himself a zip bomb on email. It was about the same time that they started scanning incoming email for viruses, often hidden in zip files. He didn't tell the IT people about his informal test however, with predictable results...

  • (Score: 5, Interesting) by Mojibake Tengu on Tuesday May 06, @05:32AM

    by Mojibake Tengu (8598) on Tuesday May 06, @05:32AM (#1402878) Journal

    That's a fine way to fight. Appreciated. Kolmogorov complexity of data is a dreadful weapon against scrapers and observers, if used properly (very big data, relatively low complexity). Unzip algo is even still polynomial and finite.

    Since leechers use mostly rented clouds, their resources (memory/CPU time) are very limited or costly. Costs burning means pain for them.

    Also, I have already been informed about NP-sized xml/html produced by funny webassembly, inevitably crashing all browsers. Total recursion FTW!
    Not a construct of mine, but I have predicted that coming long ago and it's certainly within reach of everyone out there.

    Catastrophically bad ideas (like WASM) deserve catastrophically big punisment.

    --
    Rust programming language offends both my Intelligence and my Spirit.
  • (Score: 4, Interesting) by pkrasimirov on Tuesday May 06, @10:11AM (1 child)

    by pkrasimirov (3358) Subscriber Badge on Tuesday May 06, @10:11AM (#1402888)

    When spam emerged back in time a novel idea arose: decode the incoming data before the transfer is complete and malicious content is detected (spam, bot, attack) the connection will be degraded, as in traffic shaped, packet loss, retries, SSL-handshakes redo, and generally wasting resources instead of just closing. Of course it takes custom coding and for some things sudo privileges. Not sure if many bots can handle that.

    • (Score: 1, Interesting) by Anonymous Coward on Tuesday May 06, @11:08AM

      by Anonymous Coward on Tuesday May 06, @11:08AM (#1402892)

      Reminds me of the old "Lad Vampire" [en-academic.com] page. It would continuously reload images from scam pages until they stopped responding. Back when kilobytes were expensive and upload bandwidth was metered.
      I had a slow but unlimited connection and I used to leave it running. Good times.

(1)