Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Tuesday December 06 2016, @09:07AM   Printer-friendly
from the is-a-love-of-statistics-a-normal-distribution? dept.

Over on the npr blog 13.7 Cosmos and Culture , contributor Adam Frank has written a commentary on how he learned to love statistics.

What I loved about physics were its laws. They were timeless. They were eternal. Most of all, I believed they fully and exactly determined everything about the behavior of the cosmos.

Statistics, on the other hand, was about the imperfect world of imperfect equipment taking imperfect data. For me, that realm was just a crappy version of the pure domain of perfect laws I was interested in. Measurements, by their nature, would always be messy. A truck goes by and jiggles your equipment. The kid you paid to do the observations isn't really paying attention. The very need to account for those variations made me sad.

Now, however, I see things very differently. My change of heart can be expressed in just two words — Big Data. Over the last 10 years, I've been watching in awe as the information we have been inadvertently amassing has changed society for better and worse. There is so much power, promise and peril for everyone in this brave new world that I knew I had to get involved. That's where my new life in statistics began.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2, Interesting) by shrewdsheep on Tuesday December 06 2016, @04:12PM

    by shrewdsheep (5215) Subscriber Badge on Tuesday December 06 2016, @04:12PM (#437799)

    Coming from the traditional viewpoint of biostatistics, the so called "big data" analysis is seen with quite some concern. I share come of these concerns although I see some positive aspects of "big data" as well. The main criticism goes as follows: The underlying distribution of the data is often unknown for big data sets. This can make the models estimated from such data fail to generalize. One famous example is the google flu prediction model that was based on search terms used. At some point the model started to fail as the underlying distribution changed. Another problem is that of correlation in the data. If data points are not independent, this needs to be taken into account. Otherwise models are simply invalid. The problem is that many "big data" analysts are simply unaware of these problems.
    On the other hand, recent successes in image classification centered around deep learning methods demonstrate that the correlation issue might be circumvented by observing enough (if you observe everything relevant for the prediction it does no longer matter whether data is independent or not). Still the last word is out on deep learning as almost no theory exists as yet and they are unidentifiable, heuristic methods at the moment (something the traditionalist would not dare to touch). At least one Google paper punched some bigger wholes in their own image classifier.

    Starting Score:    1  point
    Moderation   +1  
       Interesting=1, Total=1
    Extra 'Interesting' Modifier   0  

    Total Score:   2  
  • (Score: 0) by Anonymous Coward on Tuesday December 06 2016, @06:10PM

    by Anonymous Coward on Tuesday December 06 2016, @06:10PM (#437902)

    The problem is that many "big data" analysts are simply unaware of these problems.

    That's because all you need to call yourself a "Data Scientist" is to be able to claim credit for "some" math courses, and have the ability to run complex statistical tests in R without it throwing an error. You'll see there is very very little "science" required of "Data Scientists".