Stories
Slash Boxes
Comments

SoylentNews is people

posted by azrael on Wednesday July 30 2014, @05:29PM   Printer-friendly
from the install-this-census-trojan dept.

In 2013 the data of an Internet census was released anonymously, with an accompanying report describing the methodology. The trouble with this data was that it gathered using a bot-net abusing default passwords.

The purpose of this paper is to shed light on these and related questions and put the contributions of this anonymous Internet census study into perspective. Indeed, our findings suggest that the released data set is real and not faked, but that the measurements suffer from a number of methodological flaws and also lack adequate meta-data information. As a result, we have not been able to verify several claims that the anonymous author(s) made in the published report. In the process, we use this study as an educational example for illustrating how to deal with a large data set of unknown quality, hint at pitfalls in Internet-scale measurement studies, and discuss ethical considerations concerning third-party use of this released data set for publications.

The authors also discuss the ethical considerations for this study, and for doing Internet measurements in general. The conclusion however is that these guidelines do not yet exist, and these kinds of studies show that they are very necessary. (Also to figure out how to deal with this kind of anonymous data).

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 4, Interesting) by d on Wednesday July 30 2014, @08:12PM

    by d (523) on Wednesday July 30 2014, @08:12PM (#75664)

    As I already said on the other website - apparently the researchers didn't analyze OS fingerprints at all. There is some metadata that the original researcher(s) forgot to remove (as well as a lot more mess). Service fingerprints are interesting as well. I did a lot of research on this data set and I have to say that while messy, this is also a really amazing data set. This article is IMHO biased.

    • (Score: 0) by Anonymous Coward on Thursday July 31 2014, @10:22AM

      by Anonymous Coward on Thursday July 31 2014, @10:22AM (#75845)

      As said on the other web site, this statement is apparently not true. They focused on the ICMP data set but also looked into others, in particular the service probes that you mentioned. One of their validation sets is using that data set.

      • (Score: 2) by d on Thursday July 31 2014, @10:29AM

        by d (523) on Thursday July 31 2014, @10:29AM (#75851)

        Okay, but it's service fingerprints, not OS fingerprints. If they looked at the data format that is there, they could get much more out of the set. They'd also find more mess.

        • (Score: 0) by Anonymous Coward on Thursday July 31 2014, @01:29PM

          by Anonymous Coward on Thursday July 31 2014, @01:29PM (#75902)

          Oh I see. Sorry, than I confused the data set you mentioned with the others. I will have a look at the OS fingerprints one. That seems interesting.

  • (Score: 0) by Anonymous Coward on Thursday July 31 2014, @09:43AM

    by Anonymous Coward on Thursday July 31 2014, @09:43AM (#75839)

    +1, NMAP worked, i can has 2014 data tor pls?> k, thx =]