The BBC reports that the UK-based Internet Watch Foundation is sharing hash lists with Google, Facebook, and Twitter to prevent the upload of child abuse imagery:
Web giants Google, Facebook and Twitter have joined forces with a British charity in a bid to remove millions of indecent child images from the net. In a UK first, anti-abuse organisation Internet Watch Foundation (IWF) has begun sharing lists of indecent images, identified by unique "hash" codes. Wider use of the photo-tagging system could be a "game changer" in the fight against paedophiles, the charity said. Internet security experts said images on the "darknet" would not be detected.
The IWF, which works to take down indecent images of children, allocates to each picture it finds a "hash" - a unique code, sometimes referred to as a digital finger-print. By sharing "hash lists" of indecent pictures of children, Google, Facebook and Twitter will be able to stop those images from being uploaded to their sites.
(Score: 3, Interesting) by lentilla on Tuesday August 11 2015, @12:10PM
I've played with creating an unique identifier for images and found it to be surprisingly effective.
This article [linux-mag.com] describes the strategy well. Basically one scales each image to a four-by-four pixel image which produces 48 bytes of "hash". (Being three bytes per pixel for Red/Green/Blue.) Candidate images are then compared with that hash, within a fuzz factor for each of those 48 values.
I was so surprised that such a tiny amount of data could so effectively encapsulate an entire image. It's a hash of course (a one-way mathematical function), so one can't reproduce the image from the hash... but one can certainly identify "identical" images with a very high degree of accuracy.
Putting a border around an image, transcoding into different formats, changing resolutions and modifying EXIF data all fail to escape this detection mechanism. So I don't believe you are correct when you say "this will not really prevent anything, except for technically inept people". The moment a new image is entered into that database it becomes poisoned.
As for hash collisions - yes - they will occur. I'd hope this simply raises a flag and a human goes and checks it out. (And how thankful I am that this is not my job.)