Every year, millions of images, videos and posts that allegedly contain terrorist or violent extremist content are removed from social media platforms like YouTube, Facebook, or Twitter. A key force behind these takedowns is the Global Internet Forum to Counter Terrorism (GIFCT), an industry-led initiative that seeks to "prevent terrorists and violent extremists from exploiting digital platforms."
[...] Hashes are digital "fingerprints" of content that companies use to identify and remove content from their platforms. They are essentially unique, and allow for easy identification of specific content. When an image is identified as "terrorist content," it is tagged with a hash and entered into a database, allowing any future uploads of the same image to be easily identified.
This is exactly what the GIFCT initiative aims to do: Share a massive database of alleged 'terrorist' content, contributed voluntarily by companies, amongst members of its coalition. The database collects 'hashes', or unique fingerprints, of alleged 'terrorist', or extremist and violent content, rather than the content itself. GIFCT members can then use the database to check in real time whether content that users want to upload matches material in the database. While that sounds like an efficient approach to the challenging task of correctly identifying and taking down terrorist content, it also means that one single database might be used to determine what is permissible speech, and what is taken down—across the entire Internet.
Countless examples have proven that it is very difficult for human reviewers—and impossible for algorithms—to consistently get the nuances of activism, counter-speech, and extremist content itself right. The result is that many instances of legitimate speech are falsely categorized as terrorist content and removed from social media platforms. Due to the proliferation of the GIFCT database, any mistaken classification of a video, picture or post as 'terrorist' content echoes across social media platforms, undermining users' right to free expression on several platforms at once. And that, in turn, can have catastrophic effects on the Internet as a space for memory and documentation.
(Score: 1, Insightful) by Anonymous Coward on Saturday August 29 2020, @02:28AM (4 children)
"until... the hashers get wind of these kinds of tweaks being made and make their hashes less sensitive to insignificant changes."
That's not how hashes work. Change a single byte and the hash is completely different. What you describe would need image comparison not hash comparison.
(Score: 0) by Anonymous Coward on Saturday August 29 2020, @06:50PM (3 children)
That is how cryptographic hashes work. There are other kinds of hash families around, and fuzzy matching is a big one.
(Score: 0) by Anonymous Coward on Sunday August 30 2020, @04:25AM (1 child)
Wouldn't extreme lossy compression work better for that than taking hashes?
(Score: 2) by hendrikboom on Sunday August 30 2020, @10:53AM
Hashing reduces the number of pairwise comparisons you have to do.
(Score: 2) by hendrikboom on Sunday August 30 2020, @10:51AM
For example, there's Near Duplicate Image Detection:min-Hash and tf-idf Weighting [ox.ac.uk].
Found it after following the link in a stack-overflow page linked from a Google search for "fuzzy image hashing".
No doubt there are other techniques.
-- hendrik