In 2013 the data of an Internet census was released anonymously, with an accompanying report describing the methodology. The trouble with this data was that it gathered using a bot-net abusing default passwords.
The purpose of this paper is to shed light on these and related questions and put the contributions of this anonymous Internet census study into perspective. Indeed, our findings suggest that the released data set is real and not faked, but that the measurements suffer from a number of methodological flaws and also lack adequate meta-data information. As a result, we have not been able to verify several claims that the anonymous author(s) made in the published report. In the process, we use this study as an educational example for illustrating how to deal with a large data set of unknown quality, hint at pitfalls in Internet-scale measurement studies, and discuss ethical considerations concerning third-party use of this released data set for publications.
The authors also discuss the ethical considerations for this study, and for doing Internet measurements in general. The conclusion however is that these guidelines do not yet exist, and these kinds of studies show that they are very necessary. (Also to figure out how to deal with this kind of anonymous data).
(Score: 2) by d on Thursday July 31 2014, @10:29AM
Okay, but it's service fingerprints, not OS fingerprints. If they looked at the data format that is there, they could get much more out of the set. They'd also find more mess.
(Score: 0) by Anonymous Coward on Thursday July 31 2014, @01:29PM
Oh I see. Sorry, than I confused the data set you mentioned with the others. I will have a look at the OS fingerprints one. That seems interesting.