Every year, millions of images, videos and posts that allegedly contain terrorist or violent extremist content are removed from social media platforms like YouTube, Facebook, or Twitter. A key force behind these takedowns is the Global Internet Forum to Counter Terrorism (GIFCT), an industry-led initiative that seeks to "prevent terrorists and violent extremists from exploiting digital platforms."
[...] Hashes are digital "fingerprints" of content that companies use to identify and remove content from their platforms. They are essentially unique, and allow for easy identification of specific content. When an image is identified as "terrorist content," it is tagged with a hash and entered into a database, allowing any future uploads of the same image to be easily identified.
This is exactly what the GIFCT initiative aims to do: Share a massive database of alleged 'terrorist' content, contributed voluntarily by companies, amongst members of its coalition. The database collects 'hashes', or unique fingerprints, of alleged 'terrorist', or extremist and violent content, rather than the content itself. GIFCT members can then use the database to check in real time whether content that users want to upload matches material in the database. While that sounds like an efficient approach to the challenging task of correctly identifying and taking down terrorist content, it also means that one single database might be used to determine what is permissible speech, and what is taken down—across the entire Internet.
Countless examples have proven that it is very difficult for human reviewers—and impossible for algorithms—to consistently get the nuances of activism, counter-speech, and extremist content itself right. The result is that many instances of legitimate speech are falsely categorized as terrorist content and removed from social media platforms. Due to the proliferation of the GIFCT database, any mistaken classification of a video, picture or post as 'terrorist' content echoes across social media platforms, undermining users' right to free expression on several platforms at once. And that, in turn, can have catastrophic effects on the Internet as a space for memory and documentation.
(Score: 2) by hendrikboom on Saturday August 29 2020, @02:09PM (4 children)
Aren't there UTF-8 encodings for EBCDIC symbols? And even for the symbols in all the variants of EBCDIC that have appeared over the ages?
(Score: 2) by HiThere on Saturday August 29 2020, @08:49PM (3 children)
If you encode EBCDIC in some other coding, it's no longer EBCDIC. EBCDIC specifies a bit pattern to symbol correspondence.
Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
(Score: 2) by hendrikboom on Sunday August 30 2020, @12:55AM (2 children)
Correct.
But EBCDIC also consists of a number of different character sets. Converting from some popular EBCDIC variants (such as the one for the TN print chains) to other character codes was difficult before Unicode and UTF-8.
(Score: 2) by HiThere on Sunday August 30 2020, @03:41AM (1 child)
I always wanted to use the TN print train.
Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
(Score: 2) by hendrikboom on Sunday August 30 2020, @10:41AM
I got to use the TN print chain for my Phd thesis back in 1974. I think mine may have been one of the first Phd theses at the university to have been produced by a document compiler.
Nowadays of course I'd use a laser printer with TeX or some other free software.
-- hendrik