Stories
Slash Boxes
Comments

SoylentNews is people

posted by n1 on Wednesday August 20 2014, @01:11PM   Printer-friendly
from the internet-never-forgets dept.

Researchers from Columbia University have developed a system called XRay that aims to make why adverts appear on the web more transparent.

The researchers have developed XRay, a new tool that reveals which data in a web account, such as emails, searches, or viewed products, are being used to target which outputs, such as ads, recommended products, or prices. They will be presenting the prototype, which is designed to make the online use of personal data more transparent, at USENIX Security on August 20. The researchers have posted the open source system, as well as their findings, online for other researchers interested in studying how web services use personal data to leverage and extend.

“Today we have a problem: the web is not transparent. We see XRay as an important first step in exposing how websites are using your personal data,” says Geambasu, who is also a member of Columbia’s Institute for Data Sciences and Engineering’s Cybersecurity Center.

We live in a “big data” world, where staggering amounts of personal data—our locations, search histories, emails, posts, photos, and more—are constantly being collected and analyzed by Google, Amazon, Facebook, and many other web services. While harnessing big data can certainly improve our daily lives (Amazon offerings, Netflix suggestions, emergency response Tweets, etc.), these beneficial uses have also generated a big data frenzy, with web services aggressively pursuing new ways to acquire and commercialize the information.

“It’s critical, now more than ever, to reconcile our privacy needs with the exponential progress in leveraging this big data,” says Chaintreau, a member of the Institute for Data Sciences and Engineering’s New Media Center. Geambasu adds, “If we leave it unchecked, big data’s exciting potential could become a breeding ground for data abuses, privacy vulnerabilities, and unfair or deceptive business practices.

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Insightful) by zafiro17 on Wednesday August 20 2014, @01:36PM

    by zafiro17 (234) on Wednesday August 20 2014, @01:36PM (#83520) Homepage

    Good but useless quote: "If we leave it unchecked, big data’s exciting potential could become a breeding ground for data abuses, privacy vulnerabilities, and unfair or deceptive business practices." WTF, that's the situation today! I like the idea of a tool that shows which data sources have been used and what factors have gone into showing you an advert. But it sounds impossible. It would require at a minimum, changes to legislation - some of it international - requiring data hubs to provide data they are not currently inclined to provide, and potentially some sort of work rationalizing or standardizing parameters. The latter means governments going in and telling private sector companies how to do their business, the work of a good regulator. Governments are bad at this sort of thing: slow, inefficient, not that smart.

    I love the idea, but this duck is dead in the water before it even gets a chance to flap its wings.

    --
    Dad always thought laughter was the best medicine, which I guess is why several of us died of tuberculosis - Jack Handey
    • (Score: 2) by Boxzy on Wednesday August 20 2014, @02:02PM

      by Boxzy (742) on Wednesday August 20 2014, @02:02PM (#83532) Journal

      Any tool I can use to poison the well and obfuscate my online presence is a good thing, so I'm all for it. As unlikely as it is, I would grab a tracking blocker with both hands. Maybe a counterpart to Adblock, that randomly clicks on ads to create a false profile?

      --
      Go green, Go Soylent.
      • (Score: 2) by nitehawk214 on Wednesday August 20 2014, @02:37PM

        by nitehawk214 (1304) on Wednesday August 20 2014, @02:37PM (#83541)

        That is actually a fantastic idea. Introduce noise in the system and then things you do that can be tracked despite your best efforts of add and script blocking, and it becomes difficult to see what anything about you.

        Perhaps even create fake social media profiles and use those against sites with twitter and facebook script integration.

        --
        "Don't you ever miss the days when you used to be nostalgic?" -Loiosh
        • (Score: 0) by Anonymous Coward on Wednesday August 20 2014, @05:42PM

          by Anonymous Coward on Wednesday August 20 2014, @05:42PM (#83621)

          The self-destructing cookies add-on [mozilla.org] is kind of like that.
          It lets any site set cookies on your browser, but it deletes them a few seconds after you leave the site.

          That gives you whatever useful functionality there might be in the cookie, like a login session but it causes the trackers to build a new profile each time you hit a new site or even return to the same site. It is far from perfect, but combine it with a VPN that lets you switch IP addresses and a user-agent spoofer that lets you pretend to be different browsers on different OSes (one hour you are using an Ipad, another hour you are using a WinXP box, the next hour it is a MacOS box, etc) and you end up injecting enough noise to make it hard to pin down your identity.

  • (Score: 3, Insightful) by MrGuy on Wednesday August 20 2014, @02:50PM

    by MrGuy (1007) on Wednesday August 20 2014, @02:50PM (#83548)

    There's no metadata served with an ad that tells you WHY that ad was served, nor that tells you what information in what profiles were available to the ad server. There's nothing to "analyze" by just looking at an ad. You don't know.

    What this approach ACTUALLY is, is an ATTEMPT to reverse engineer what the companies tracking you MIGHT be doing. In other words, it's an attempt to reverse (among others) Doubleclick's ad serving model and try to project why it MIGHT have chosen to show you an ad.

    And how do we do this, given the ad servers and their algorithms are closed? Why, by TRACKING YOU! What this tool does is replicate every potentially underhanded tracking technique out there. It reads your e-mails. Tracks your clickstreams. Follows your IP. Tracks every single thing you do. It extracts (as well as it can) the semantic content of all your messages, and the themes and keywords of every site you visit or search you perform. And then it looks at every single ad you see.

    What does it do with all this data, the exact "spying on you" that it's worried the advertisers are doing? Why, it shares it with a central server, of course. That server can then pick apart all kinds of aspects of what each user has done in the past, what you're clicked and visited recently, and then cross-correlate across all users ads served and determine some things that are LIKELY in the advertisers' profile.

    Yes, there's some benefit to this - if users who have sent an e-mail on a particular topic see a certain ad disproportionately frequently, we can be fairly confident the advertising system in question has access to your e-mail message contents. That's a useful thing.

    What it CAN'T do is truly reverse engineer the algorithm concerned. We can state with some probability that certain algorithms likely use certain data. We can't get close to the goal of "Why did I see this ad?" These are complex algorithms, made up of complex profiles, which include data that was collected before you install XRay, so you'll never reverse engineer all of it. And the approach suffers heavily from the fact that you can only do statistics on the variables you look for - there's a large "omitted variable bias" in any results.

    But to make things worse, what they're doing is assembling the Holy Grail of Tracking Databases. They're useless without it. Their tool tracks everything an advertiser MIGHT want to know (regardless of whether they actually use it), so that they can build a dataset to do their analysis on. That dataset is the motherlode of a profile database - it needs to be (by design) as good or better than anything an advertiser will have. Can you imagine the value of that dataset (and the risk of it being stolen)? There's no way to anonymize this data - that would defeat the whole purpose.

    Make no mistake - joining XRay is volunteering to submit every bit of persona data you own to be part of someone's academic research dataset. I'm glad they've agreed to at least share some of the results with us. But I'm going to pass.