Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 16 submissions in the queue.
posted by Fnord666 on Tuesday December 06 2016, @09:07AM   Printer-friendly
from the is-a-love-of-statistics-a-normal-distribution? dept.

Over on the npr blog 13.7 Cosmos and Culture , contributor Adam Frank has written a commentary on how he learned to love statistics.

What I loved about physics were its laws. They were timeless. They were eternal. Most of all, I believed they fully and exactly determined everything about the behavior of the cosmos.

Statistics, on the other hand, was about the imperfect world of imperfect equipment taking imperfect data. For me, that realm was just a crappy version of the pure domain of perfect laws I was interested in. Measurements, by their nature, would always be messy. A truck goes by and jiggles your equipment. The kid you paid to do the observations isn't really paying attention. The very need to account for those variations made me sad.

Now, however, I see things very differently. My change of heart can be expressed in just two words — Big Data. Over the last 10 years, I've been watching in awe as the information we have been inadvertently amassing has changed society for better and worse. There is so much power, promise and peril for everyone in this brave new world that I knew I had to get involved. That's where my new life in statistics began.


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Insightful) by maxwell demon on Tuesday December 06 2016, @09:22AM

    by maxwell demon (1608) on Tuesday December 06 2016, @09:22AM (#437596) Journal

    What I loved about physics were its laws. They were timeless. They were eternal.

    Like the second law of thermodynamics. Which is explained in statistical mechanics using statistics.

    --
    The Tao of math: The numbers you can count are not the real numbers.
    • (Score: 2) by snufu on Tuesday December 06 2016, @06:45PM

      by snufu (5855) on Tuesday December 06 2016, @06:45PM (#437933)

      The only 'pure' law is uncertainty. The only constant is change.

  • (Score: 3, Funny) by Dunbal on Tuesday December 06 2016, @09:55AM

    by Dunbal (3515) on Tuesday December 06 2016, @09:55AM (#437599)

    I stand by an axiom which has been true since the dawn of computing. Garbage in, garbage out.

  • (Score: 3, Insightful) by Anonymous Coward on Tuesday December 06 2016, @10:05AM

    by Anonymous Coward on Tuesday December 06 2016, @10:05AM (#437600)

    Everyone knows that 90% of people hate statistics and the other half know that they're totally made up.

    • (Score: 4, Insightful) by looorg on Tuesday December 06 2016, @11:57AM

      by looorg (578) on Tuesday December 06 2016, @11:57AM (#437625)

      I'm not sure if I should mod this as funny, insightful or informative. I'm sure it would average out to something significant eventually.

      • (Score: 1, Funny) by Anonymous Coward on Tuesday December 06 2016, @02:52PM

        by Anonymous Coward on Tuesday December 06 2016, @02:52PM (#437720)

        That's some deviant thinking right there

      • (Score: 1, Interesting) by Anonymous Coward on Tuesday December 06 2016, @10:54PM

        by Anonymous Coward on Tuesday December 06 2016, @10:54PM (#438106)

        I'm not sure if I should mod this as funny, insightful or informative.

        What this needs is a Yogi Berra [aladat.com] moderation.

  • (Score: 3, Funny) by riT-k0MA on Tuesday December 06 2016, @10:29AM

    by riT-k0MA (88) on Tuesday December 06 2016, @10:29AM (#437606)

    "Don trust any statistics you haven forged yourself"

    • (Score: 2) by choose another one on Tuesday December 06 2016, @10:43AM

      by choose another one (515) Subscriber Badge on Tuesday December 06 2016, @10:43AM (#437608)

      Don Corleone? Which haven?

      [Don't trust any apostrophes you haven't typed yourself]

      • (Score: 1) by pTamok on Tuesday December 06 2016, @11:11AM

        by pTamok (3042) on Tuesday December 06 2016, @11:11AM (#437612)

        Were they stainless statistics they wouldn't rust no matter who had forged them.

        • (Score: 3, Funny) by gidds on Tuesday December 06 2016, @02:15PM

          by gidds (589) on Tuesday December 06 2016, @02:15PM (#437702)

          Were they stainless statistics

          No, that's the problem — they were created with Big Iron!

          --
          [sig redacted]
    • (Score: 2) by looorg on Tuesday December 06 2016, @11:59AM

      by looorg (578) on Tuesday December 06 2016, @11:59AM (#437626)

      Other peoples statistics or data are the best and the worst at the same time. They are usually free to use but they never really show exactly what you want them to ...

  • (Score: 1, Informative) by Anonymous Coward on Tuesday December 06 2016, @12:20PM

    by Anonymous Coward on Tuesday December 06 2016, @12:20PM (#437631)

    The link to the coursera class this is an advert for contains a description of hypothesis testing that it calls significance testing. Also I couldn't find the name Bayes, or really any hint of historical content at all. I would not recommend such a class.

  • (Score: 0) by Anonymous Coward on Tuesday December 06 2016, @01:07PM

    by Anonymous Coward on Tuesday December 06 2016, @01:07PM (#437645)

    was correctly predicted by Michael Moore instead of Nate Silver and all the other experts. Similar with Brexit, although I don't know if 538 had a dog in that race.

    So what, that's just an election? That's supposed to be an easy problem for forecasters. They have tons of data and many motivated people who have spent careers looking at it.

    • (Score: 0) by Anonymous Coward on Tuesday December 06 2016, @01:53PM

      by Anonymous Coward on Tuesday December 06 2016, @01:53PM (#437673)

      No. The error was sampling methods.
      When you sample 3,000 people nationwide... You are getting Popular Vote, and that was not wrong. But that is not how US Elections work.
      When you sample at least 100 in each Electoral College District, so 53,800, then you are modeling the right information.

      Remember:
      1) There are Liars, Damn Lairs and then Statisticians.
      2) 83.5% of statistics are made up on the spot.
      3) A statistician can support ANY claim with ANY data set.

      • (Score: 0) by Anonymous Coward on Tuesday December 06 2016, @01:55PM

        by Anonymous Coward on Tuesday December 06 2016, @01:55PM (#437677)

        Except that many of the forecasters were taking the electoral college into account when they were calling the race.

        • (Score: 1, Troll) by BK on Tuesday December 06 2016, @02:25PM

          by BK (4868) on Tuesday December 06 2016, @02:25PM (#437707)

          The problem was the sampling method and weighting strategy. They don't ask 1000 people something and then publish the raw percentages. The massage they numbers... if they have too many or too few black transsexuals or hispanic women or skinheads or coal miners or unemployed asian computer programmers, they 'fix it'.

          They try to adjust for turnout but there is no way to _know_ the correct weights for samples before the election shows you what actual turnout is going to be. In the end, the adjustments probably tell you more about the pollsters than anything else.

          --
          ...but you HAVE heard of me.
      • (Score: 0) by Anonymous Coward on Tuesday December 06 2016, @02:07PM

        by Anonymous Coward on Tuesday December 06 2016, @02:07PM (#437691)

        2) 83.5% of statistics are made up on the spot.

        Of course statistics are 54.68% more credible for each digit after the decimal point.

    • (Score: 1, Insightful) by Anonymous Coward on Tuesday December 06 2016, @05:07PM

      by Anonymous Coward on Tuesday December 06 2016, @05:07PM (#437843)

      > was correctly predicted by Michael Moore instead of Nate Silver and all the other experts.

      You weren't paying attention.
      Nate Silver consistently said there was a ~30% chance of Trump winning.
      I read Silver's site multiple times per day and he got a lot of shit by actual partisans for saying that.

      http://www.mediaite.com/online/nate-silver-warns-media-against-dangerous-assumption-trump-isnt-really-closing-in-on-hillary/ [mediaite.com]
      http://www.businessinsider.com/nate-silver-hillary-clinton-donald-trump-election-prediction-2016-11 [businessinsider.com]
      https://fivethirtyeight.com/features/trump-is-just-a-normal-polling-error-behind-clinton/ [fivethirtyeight.com]

  • (Score: 0) by Anonymous Coward on Tuesday December 06 2016, @01:53PM

    by Anonymous Coward on Tuesday December 06 2016, @01:53PM (#437674)

    The computer diagnosed a woman with a rare disorder that no one else guessed. That nailed it for me.

    • (Score: 2) by Osamabobama on Tuesday December 06 2016, @06:38PM

      by Osamabobama (5842) on Tuesday December 06 2016, @06:38PM (#437928)

      I must have missed that story the first time around; is it also about the US presidential election?

      --
      Appended to the end of comments you post. Max: 120 chars.
  • (Score: 2, Interesting) by shrewdsheep on Tuesday December 06 2016, @04:12PM

    by shrewdsheep (5215) on Tuesday December 06 2016, @04:12PM (#437799)

    Coming from the traditional viewpoint of biostatistics, the so called "big data" analysis is seen with quite some concern. I share come of these concerns although I see some positive aspects of "big data" as well. The main criticism goes as follows: The underlying distribution of the data is often unknown for big data sets. This can make the models estimated from such data fail to generalize. One famous example is the google flu prediction model that was based on search terms used. At some point the model started to fail as the underlying distribution changed. Another problem is that of correlation in the data. If data points are not independent, this needs to be taken into account. Otherwise models are simply invalid. The problem is that many "big data" analysts are simply unaware of these problems.
    On the other hand, recent successes in image classification centered around deep learning methods demonstrate that the correlation issue might be circumvented by observing enough (if you observe everything relevant for the prediction it does no longer matter whether data is independent or not). Still the last word is out on deep learning as almost no theory exists as yet and they are unidentifiable, heuristic methods at the moment (something the traditionalist would not dare to touch). At least one Google paper punched some bigger wholes in their own image classifier.

    • (Score: 0) by Anonymous Coward on Tuesday December 06 2016, @06:10PM

      by Anonymous Coward on Tuesday December 06 2016, @06:10PM (#437902)

      The problem is that many "big data" analysts are simply unaware of these problems.

      That's because all you need to call yourself a "Data Scientist" is to be able to claim credit for "some" math courses, and have the ability to run complex statistical tests in R without it throwing an error. You'll see there is very very little "science" required of "Data Scientists".