Stories
Slash Boxes
Comments

SoylentNews is people

posted by chromas on Friday August 28 2020, @04:50PM   Printer-friendly
from the operation-google-2:-electric...google-fu dept.

One Database to Rule Them All: The Invisible Content Cartel that Undermines the Freedom of Expression Online:

Every year, millions of images, videos and posts that allegedly contain terrorist or violent extremist content are removed from social media platforms like YouTube, Facebook, or Twitter. A key force behind these takedowns is the Global Internet Forum to Counter Terrorism (GIFCT), an industry-led initiative that seeks to "prevent terrorists and violent extremists from exploiting digital platforms."

[...] Hashes are digital "fingerprints" of content that companies use to identify and remove content from their platforms. They are essentially unique, and allow for easy identification of specific content. When an image is identified as "terrorist content," it is tagged with a hash and entered into a database, allowing any future uploads of the same image to be easily identified.

This is exactly what the GIFCT initiative aims to do: Share a massive database of alleged 'terrorist' content, contributed voluntarily by companies, amongst members of its coalition. The database collects 'hashes', or unique fingerprints, of alleged 'terrorist', or extremist and violent content, rather than the content itself. GIFCT members can then use the database to check in real time whether content that users want to upload matches material in the database. While that sounds like an efficient approach to the challenging task of correctly identifying and taking down terrorist content, it also means that one single database might be used to determine what is permissible speech, and what is taken down—across the entire Internet.

Countless examples have proven that it is very difficult for human reviewers—and impossible for algorithms—to consistently get the nuances of activism, counter-speech, and extremist content itself right. The result is that many instances of legitimate speech are falsely categorized as terrorist content and removed from social media platforms. Due to the proliferation of the GIFCT database, any mistaken classification of a video, picture or post as 'terrorist' content echoes across social media platforms, undermining users' right to free expression on several platforms at once. And that, in turn, can have catastrophic effects on the Internet as a space for memory and documentation.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by hendrikboom on Saturday August 29 2020, @02:09PM (4 children)

    by hendrikboom (1125) Subscriber Badge on Saturday August 29 2020, @02:09PM (#1043733) Homepage Journal

    Aren't there UTF-8 encodings for EBCDIC symbols? And even for the symbols in all the variants of EBCDIC that have appeared over the ages?

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by HiThere on Saturday August 29 2020, @08:49PM (3 children)

    by HiThere (866) Subscriber Badge on Saturday August 29 2020, @08:49PM (#1043928) Journal

    If you encode EBCDIC in some other coding, it's no longer EBCDIC. EBCDIC specifies a bit pattern to symbol correspondence.

    --
    Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
    • (Score: 2) by hendrikboom on Sunday August 30 2020, @12:55AM (2 children)

      by hendrikboom (1125) Subscriber Badge on Sunday August 30 2020, @12:55AM (#1044017) Homepage Journal

      Correct.

      But EBCDIC also consists of a number of different character sets. Converting from some popular EBCDIC variants (such as the one for the TN print chains) to other character codes was difficult before Unicode and UTF-8.

      • (Score: 2) by HiThere on Sunday August 30 2020, @03:41AM (1 child)

        by HiThere (866) Subscriber Badge on Sunday August 30 2020, @03:41AM (#1044065) Journal

        I always wanted to use the TN print train.

        --
        Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
        • (Score: 2) by hendrikboom on Sunday August 30 2020, @10:41AM

          by hendrikboom (1125) Subscriber Badge on Sunday August 30 2020, @10:41AM (#1044131) Homepage Journal

          I got to use the TN print chain for my Phd thesis back in 1974. I think mine may have been one of the first Phd theses at the university to have been produced by a document compiler.

          Nowadays of course I'd use a laser printer with TeX or some other free software.

          -- hendrik