Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Sunday October 02 2016, @07:46AM   Printer-friendly
from the we-don't-know-what-we-know dept.

In the age of Big Data, automated systems can track societal events on a global scale. These systems code and collect vast stores of real-time "event data"—happenings gleaned from news articles covering everything from political protests to ecological shifts around the world.

In new research published Thursday in the journal Science, Northeastern network scientist David Lazer and his colleagues analyzed the effectiveness of four global-scale databases and found they are falling short when tested for reliability and validity.

[...] The fully-automated systems studied were the International Crisis Early Warning System, or ICEWS, maintained by Lockheed Martin, and Global Data on Events Language and Tone, or GDELT, developed and run out of Georgetown University. The others were the hand-coded Gold Standard Report, or GSR, generated by the nonprofit MITRE Corp., and the Social, Political, and Economic Event Database, or SPEED, at the University of Illinois, which uses both human and automated coding.

First the researchers tested the systems' reliability: Did they all detect the same protest events in Latin America? The answer was "not very well." ICEWS and GDELT, they found, rarely reported the same protests, and ICEWS and SPEED agreed on just 10.3 percent of them.

Next they assessed the systems' validity: Did the protest events reported actually occur? Here they found that only 21 percent of GDELT's reported events referred to real protests. ICEWS' track record was better, but the system reported the same event more than once, jacking up the protest count.

Vast reams of data are analyzed every millisecond of every day about the stock market, but still nobody can predict which way it will go...


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by gidds on Monday October 03 2016, @05:02PM

    by gidds (589) on Monday October 03 2016, @05:02PM (#409512)

    The case of stock markets is a little different, though, as those have massive levels of feedback and second-guessing.  All the participants are trying to guess what all the other participants are going to do.

    (That's why the only ways to get ahead are (A) know something other participants don't know — which the markets really try to avoid, and is often illegal — and (B) be lucky.)

    Yes, political and social events are also influenced by others, but not to anywhere near the same level, or at the same speed.

    As to the real story, I wonder if this is just a case of GIGO (garbage in, garbage out).  It looks like all those databases are based on news reports. As one of the quotes in the article says, "If something doesn't get reported in a newspaper or a similar outlet, it will not appear in any of these databases, no matter how important it really is."  And similarly, if the reporting isn't complete and accurate, then neither humans nor automated systems will be able to accurately classify them.

    And that's the thing about Big Data.  Sometimes the data quality matters more than the data size.

    --
    [sig redacted]
    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2