Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 17 submissions in the queue.
posted by janrinok on Monday March 02 2015, @02:24PM   Printer-friendly
from the and-that's-the-truth dept.

Google's search engine currently uses the number of incoming links to a web page as a proxy for quality, determining where it appears in search results. So pages that many other sites link to are ranked higher. This system has brought us the search engine as we know it today, but the downside is that websites full of misinformation can rise up the rankings, if enough people link to them.

A Google research team is adapting that model to measure the trustworthiness of a page, rather than its reputation across the web. Instead of counting incoming links, the system – which is not yet live – counts the number of incorrect facts within a page. "A source that has few false facts is considered to be trustworthy," says the team (arxiv.org/abs/1502.03519v1). The score they compute for each page is its Knowledge-Based Trust score.

The software works by tapping into the Knowledge Vault, the vast store of facts that Google has pulled off the internet. Facts the web unanimously agrees on are considered a reasonable proxy for truth. Web pages that contain contradictory information are bumped down the rankings.

http://www.newscientist.com/article/mg22530102.600-google-wants-to-rank-websites-based-on-facts-not-links.html

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Insightful) by snick on Monday March 02 2015, @02:34PM

    by snick (1408) on Monday March 02 2015, @02:34PM (#151877)

    Facts the web unanimously agrees on are considered a reasonable proxy for truth. Web pages that contain contradictory information are bumped down the rankings.

    How is it possible for a web page to contradict a fact that is unanimously agreed upon by web pages?

    • (Score: 3, Insightful) by ikanreed on Monday March 02 2015, @04:07PM

      by ikanreed (3164) Subscriber Badge on Monday March 02 2015, @04:07PM (#151923) Journal

      By generally being "accurate" outside the thing they're trying to correct.

      And this piece is hella misleading too. Google's stated approach for a while now has been combining the results from several different ranking algorithms, including the traditional link based pagerank, content analysis, trust rankings, and a few others.

      This is probably just one new(or reweighted) input to that algorithm, and won't dramatically change anything.

      • (Score: 2) by SlimmPickens on Monday March 02 2015, @11:15PM

        by SlimmPickens (1056) on Monday March 02 2015, @11:15PM (#152206)

        This is probably just one new(or reweighted) input to that algorithm, and won't dramatically change anything.

        I didn't read the whole paper. just abstract, conclusion plus a bit of searching. It doesn't appear to say much, but there is:

        We discuss new research opportunities for improving it and using it in conjunction with existing signals such as PageRank (Section 5.4.2).

        5.4.2 calls it "orthogonal" compared to pagerank but mainly discusses ways of avoiding triviality with basically no discussion of integration.

  • (Score: 5, Insightful) by Anonymous Coward on Monday March 02 2015, @02:42PM

    by Anonymous Coward on Monday March 02 2015, @02:42PM (#151881)

    Seems to me like the most common and the most effective lies on the net are not fabrications, but half-truths.
    A website can be 100% factual and still lying its ass off. Without strong AI they'll never be able to automate detection of that.

    • (Score: 5, Insightful) by Sir Finkus on Monday March 02 2015, @02:47PM

      by Sir Finkus (192) on Monday March 02 2015, @02:47PM (#151883) Journal

      Pretty much all HUMANS have trouble with this kind of thing too. The task looks pretty daunting.

      • (Score: 1, Interesting) by Anonymous Coward on Monday March 02 2015, @03:57PM

        by Anonymous Coward on Monday March 02 2015, @03:57PM (#151915)

        Even if an AI can tell the difference how can a human tell that the AI is right?

        • (Score: 0) by Anonymous Coward on Monday March 02 2015, @04:14PM

          by Anonymous Coward on Monday March 02 2015, @04:14PM (#151929)

          An AI that powerful should be able to convince the humans. You may not exactly know that the AI is right, but you'll believe it.

          • (Score: 3, Funny) by bob_super on Tuesday March 03 2015, @12:59AM

            by bob_super (1357) on Tuesday March 03 2015, @12:59AM (#152249)

            I for one, would like to welcome our new AI prophets...

      • (Score: 0) by Anonymous Coward on Monday March 02 2015, @05:27PM

        by Anonymous Coward on Monday March 02 2015, @05:27PM (#151968)

        > Pretty much all HUMANS have trouble with this kind of thing too.

        Not true. Yes, humans without much domain knowledge are easily subject to that kind of manipulation.
        But, it is a matter of knowing what you don't know. The ignorant don't even realize that counter-facts exist.

        But intelligence + expertise immunizes a person from that kind of deception.

        • (Score: 2) by Sir Finkus on Monday March 02 2015, @05:52PM

          by Sir Finkus (192) on Monday March 02 2015, @05:52PM (#151988) Journal

          But intelligence + expertise immunizes a person from that kind of deception.

          I disagree. Even "experts" are fooled all the time, especially if the conclusions of the claim correspond with their own expectations. It's one of the reasons things like peer review and reproducibility are so important for science.

          • (Score: 1, Informative) by Anonymous Coward on Monday March 02 2015, @06:24PM

            by Anonymous Coward on Monday March 02 2015, @06:24PM (#152005)

            > Even "experts" are fooled all the time

            For a definition of "all" that means on rare occasion.

            > It's one of the reasons things like peer review and reproducibility are so important for science.

            That's an entirely different situation, that's about catching mistakes, not about deliberate deception.
            Sure there is the occasional bad actor, but if they were a common problem we would be overwhelmed with crap rather than the occasional high-profile embarrassment.

  • (Score: 3, Insightful) by Anonymous Coward on Monday March 02 2015, @02:47PM

    by Anonymous Coward on Monday March 02 2015, @02:47PM (#151884)

    If Google decides on the trustworthiness of information, then how can we have a guarantee that information that Google doesn't like won't "magically" be rated less trustworthy and thus vanish from the searches of top pages?

    And BTW, what if I want to specifically search for something I know not to be true? Say, I'd like to look specifically at pages that claim the moon landings have not happened (let's say I'm making a study about that believe and want to visit some such sites in order to analyse their argumentation patterns, or something like that). Wouldn't that mean that I'm out of luck if I try to find those pages with Google?

    • (Score: 4, Insightful) by Sir Finkus on Monday March 02 2015, @03:30PM

      by Sir Finkus (192) on Monday March 02 2015, @03:30PM (#151904) Journal

      If Google decides on the trustworthiness of information, then how can we have a guarantee that information that Google doesn't like won't "magically" be rated less trustworthy and thus vanish from the searches of top pages?

      Kind of like now? Google delists results all the time.

    • (Score: 5, Informative) by q.kontinuum on Monday March 02 2015, @04:08PM

      by q.kontinuum (532) on Monday March 02 2015, @04:08PM (#151925) Journal

      what if I want to specifically search for something I know not to be true?

      Use [yandex.com] one [bing.com] of [qwant.com] the [gigablast.com] other [blekko.com] search [ixquick.com]-engines...

      For me, currently Google gives the best results. Nevertheless I try to use other search engines as well, just so I won't miss if any of them gets better.

      --
      Registered IRC nick on chat.soylentnews.org: qkontinuum
      • (Score: 2) by SlimmPickens on Monday March 02 2015, @11:31PM

        by SlimmPickens (1056) on Monday March 02 2015, @11:31PM (#152211)

        A bit OT, but who remembers the old searchlores.org? It was the first web page I ever visited aside from the search engine that got me there. I returned many times.

        PS, if anyone has a complete copy of it they're willing to share, you'll get respect for life!

        • (Score: 2) by Reziac on Tuesday March 03 2015, @05:23PM

          by Reziac (2489) on Tuesday March 03 2015, @05:23PM (#152621) Homepage

          Now that you mention it, I have most or all of the original searchlores.org's visible site archived (if there were invisible portions, not so much), but it'd be on one of the hard disks currently residing in a shoebox. Remind me in a few months, after I get the mess from this cross-country move thing squared up. Tho some 15+ years later I don't imagine it's particularly current.

          --
          And there is no Alkibiades to come back and save us from ourselves.
          • (Score: 2) by SlimmPickens on Wednesday March 04 2015, @09:35PM

            by SlimmPickens (1056) on Wednesday March 04 2015, @09:35PM (#153253)

            No not current, I think the philosophy is still valuable however, and the nostalgia!

            • (Score: 2) by Reziac on Wednesday March 04 2015, @09:56PM

              by Reziac (2489) on Wednesday March 04 2015, @09:56PM (#153258) Homepage

              That's kinda why I archived it, as I recall... just because! it was already going out of date, but still, an icon of its era, and all that.

              --
              And there is no Alkibiades to come back and save us from ourselves.
              • (Score: 2) by SlimmPickens on Wednesday March 04 2015, @11:02PM

                by SlimmPickens (1056) on Wednesday March 04 2015, @11:02PM (#153278)

                You may have noticed, but it refused my wget attempt ;)

                • (Score: 2) by Reziac on Thursday March 05 2015, @01:49AM

                  by Reziac (2489) on Thursday March 05 2015, @01:49AM (#153337) Homepage

                  I doubt I used anything more sophisticated than Netscape!

                  --
                  And there is no Alkibiades to come back and save us from ourselves.
                  • (Score: 2) by SlimmPickens on Thursday March 05 2015, @04:23AM

                    by SlimmPickens (1056) on Thursday March 05 2015, @04:23AM (#153375)

                    Yikes I got wget working with a user agent but it pulled a lot more than just searchlores (interesting forum ;). I have no idea if I have the whole searchlores, the folder I've got is 69,385,866 k
                    Hop that connection doesn't cost you much ;(

                    • (Score: 2) by Reziac on Thursday March 05 2015, @04:41AM

                      by Reziac (2489) on Thursday March 05 2015, @04:41AM (#153389) Homepage

                      Hmm. We must not be talking about the same one?
                      I was thinking of Fravia's old site; found a copy (no idea if it's complete):
                      http://www.woodmann.com/searchlores/ [woodmann.com]

                      --
                      And there is no Alkibiades to come back and save us from ourselves.
                      • (Score: 2) by SlimmPickens on Thursday March 05 2015, @05:03AM

                        by SlimmPickens (1056) on Thursday March 05 2015, @05:03AM (#153399)

                        That's the one, all the links I clicked seemed to work so I think I have most of it, however I did notice a few 404's in the output from wget. A lot of other interesting stuff came through in the process.

                        • (Score: 2) by Reziac on Thursday March 05 2015, @06:16AM

                          by Reziac (2489) on Thursday March 05 2015, @06:16AM (#153417) Homepage

                          Might be stuff linked that wasn't back-when. I might have to take another look myself.

                          I hadn't even thought of it since Fravia passed away, so thanks for the reminder!

                          --
                          And there is no Alkibiades to come back and save us from ourselves.
        • (Score: 1, Informative) by Anonymous Coward on Wednesday March 04 2015, @09:00PM

          by Anonymous Coward on Wednesday March 04 2015, @09:00PM (#153239)
    • (Score: 0, Disagree) by Anonymous Coward on Monday March 02 2015, @05:28PM

      by Anonymous Coward on Monday March 02 2015, @05:28PM (#151969)

      how can we have a guarantee that information that Google doesn't like won't "magically" be rated less trustworthy and thus vanish from the searches of top pages?

      You mean ... I don't know ... something negative about Google? Or Google's AI? Or a bill proposed in Congress that benefits Google?

      You can't spell "Ain't good" without "AI" ;-)

    • (Score: 2) by Nuke on Tuesday March 03 2015, @12:03AM

      by Nuke (3162) on Tuesday March 03 2015, @12:03AM (#152230)

      As another example, I was recently looking for how many people believe that Bill Gates invented computers, or at least invented personal computers. A hell of a lot, actually. So many that it is possible that the rankings engine could end up believing such falsehoods itself and down rate the truth.

  • (Score: 5, Insightful) by Ryuugami on Monday March 02 2015, @02:54PM

    by Ryuugami (2925) on Monday March 02 2015, @02:54PM (#151886)

    Facts the web unanimously agrees on are considered a reasonable proxy for truth.

    There's no way this could go wrong.

    --
    If a shit storm's on the horizon, it's good to know far enough ahead you can at least bring along an umbrella. - D.Weber
    • (Score: 2, Interesting) by Anonymous Coward on Monday March 02 2015, @05:32PM

      by Anonymous Coward on Monday March 02 2015, @05:32PM (#151973)
      "Knowledge Vault has pulled in 1.6 billion facts to date. Of these, 271 million are rated as 'confident facts', to which Google's model ascribes a more than 90 per cent chance of being true." - google. 90% is pretty low confidence for facts unless you want plausible deniability that any propaganda promoted is seen merely coincidental, and the other 1.3 billion presumably have an even lower confidence and are still being called facts.

      Predicted google response to a search for TPP in the future... The TPP is a job creation bill to help all citizens of all stimulate the economies of all, opposed only by pirates and ne'er-do-wells. With the leaked info ranked about 4987598437598... nothing else to see here, move along Citizen.

    • (Score: 2) by Common Joe on Tuesday March 03 2015, @02:34PM

      by Common Joe (33) <common.joe.0101NO@SPAMgmail.com> on Tuesday March 03 2015, @02:34PM (#152513) Journal

      That depends. A few edits to Wikipedia [xkcd.com] and my website could be very truthful about anything I want.

  • (Score: 4, Insightful) by Jaruzel on Monday March 02 2015, @02:59PM

    by Jaruzel (812) on Monday March 02 2015, @02:59PM (#151888) Homepage Journal

    What's a website?

    Is it a factual archive on specific topics, is it a web shop selling unique items, is it's a forum full of contradictory opinions, is it blog of images of cats in mittens?

    All of the above are valid, and there are many more types to boot. How does 'fact checking' work on the bespoke web shop? If the shop is the only shop on the web selling say, purple unicorns made out of coconuts, because there is no such thing anywhere else, does Google then decided it's a 'lie' and demotes the site accordingly?

    Google have too much power in this regard, they are ultimately driving the internet into 10 or so mega-sites all with much higher search ranking than everyone else. If you are a small-to-medium vendor trying to sell online, you might as well just forget it.

    (Yes, I have a personal beef with Google over this.)

    -Jar

    --
    This is my opinion, there are many others, but this one is mine.
    • (Score: 2) by SlimmPickens on Monday March 02 2015, @11:41PM

      by SlimmPickens (1056) on Monday March 02 2015, @11:41PM (#152217)

      It doesn't fact-check everything, the paper specifically talks about handling triviality. It's something that gets used in conjunction with pagerank.

      Of course there are problems, but obviously we can improve on what we've got, because it's not like there's no bullshit on the first page of results as it is.

  • (Score: 4, Insightful) by TWX on Monday March 02 2015, @02:59PM

    by TWX (5124) on Monday March 02 2015, @02:59PM (#151889)

    I can still see this being a problem for forums. Someone posts a question. A dozen people attempt to answer the question, but only some of those answers are good. The good ones are identified and discussed, but the bad ones, abandoned threads of the discussion if you will, break the page with good information from being weighted properly for its usefulness.

    --
    IBM had PL/1, with syntax worse than JOSS...
    and everywhere the language went, it was a total loss.
  • (Score: 3, Interesting) by VLM on Monday March 02 2015, @03:06PM

    by VLM (445) on Monday March 02 2015, @03:06PM (#151894)

    so its kinda like stack exchange without the deletionist jerks, OK interesting.

    All I need to do is create two pages, one with "the zeros of the Riemann zeta function all have real part one half" and the other page contradicting, and wait for the wisdom of google to indicate which is correct via rank.

    One thing that concerns me is the almighty GOOG might implement something like an idea futures market where a 98% odds of "correct" are implemented by displaying 98 "yes" pages and 2 "no" pages.

    Also I wonder about poorly formed questions GOOG is famous for asking "How is babby formed" and "magnets how the F do they work"

    • (Score: 0) by Anonymous Coward on Monday March 02 2015, @05:33PM

      by Anonymous Coward on Monday March 02 2015, @05:33PM (#151974)

      Hey, that's a great idea. I can set up web pages to help me pick what to wear each day, what to have for lunch, which laws to break, which side of conspiracy theories to believe, and even heads or tails while wasting hours on end flipping a coin. Who needs a Magic 8-Ball when you've got Google?

  • (Score: 5, Interesting) by Thexalon on Monday March 02 2015, @03:17PM

    by Thexalon (636) on Monday March 02 2015, @03:17PM (#151897)

    I can guarantee you that if this is implemented there will be efforts by well-funded organizations to flood the web with pages from a diversity of domains that all agree with a desired position, whether that position is "$product is absolutely wonderful!" or "$religion is the only true faith" or "$politician spoke the truth". Since (apparent) reality is now determined by who can put their version on the most domain names, those with enough money targeting an issue with even more money at stake will be able to turn lies into "truths".

    The fact in question doesn't even have to be all that controversial to be unreliable: Once I was researching the mythology of Orpheus, and I noticed that while lots of websites claimed a particular part of his story there didn't seem to be an ancient Greek source for this claim, and in fact every single one of them pointed to a single fairly modern book on the subject, which itself cited no sources whatsoever. As you can imagine, I considered the claim completely unproven and treated it accordingly, but I would hardly have been surprised to see lots of people treating it as fact, and Google definitely would have treated it as fact.

    --
    The only thing that stops a bad guy with a compiler is a good guy with a compiler.
    • (Score: 5, Interesting) by GreatAuntAnesthesia on Monday March 02 2015, @03:32PM

      by GreatAuntAnesthesia (3275) on Monday March 02 2015, @03:32PM (#151906) Journal

      > those with enough money targeting an issue with even more money at stake will be able to turn lies into "truths".

      Money, or whoever has the biggest botnet.

      Either way, this is the most worrying aspect of the proposed system. I'm not one of those MS shills who takes every opportunity to paint Google as the next Satan, but what Google have created here (and maybe it wasn't their intention) is very close to a 1984-style constant editing and re-editing of reality.

      He who controls the present controls controls the past. He who controls the past controls the future.

    • (Score: 2) by jmorris on Monday March 02 2015, @08:05PM

      by jmorris (4844) on Monday March 02 2015, @08:05PM (#152074)

      I too, as a member of a political minority, worry about the abuse of this new tech. But I'm not nearly as worried as most here.

      flood the web with pages from a diversity of domains

      Like everyone currently does with gaming of current Google PageRank? All this will do is cause the SEO community to observe the changes and learn how to keep on gaming them, pretty much what they have been doing every day since Google.com went live.

      No, the threat will come from within Google itself. As they taint the results, something they are already quite adept at. Their news stream is almost as one sided as MSNBC or the DNC homepage already. This threatens to make the entirety of Google search results as useless. At which point my response will be to consider it useless and look for an alternative that IS useful. The only free market solution.

  • (Score: 2) by scruffybeard on Monday March 02 2015, @06:10PM

    by scruffybeard (533) on Monday March 02 2015, @06:10PM (#151996)

    Discuss. (BTW, the only correct answer is Picard. Google will now enforce this.)

    • (Score: 0) by Anonymous Coward on Monday March 02 2015, @08:39PM

      by Anonymous Coward on Monday March 02 2015, @08:39PM (#152100)

      Discuss. (BTW, the only correct answer is Picard. Google will now enforce this.)

      How the hell did you get modded up for this? You just broke the internet!!!

    • (Score: 0) by Anonymous Coward on Monday March 02 2015, @09:02PM

      by Anonymous Coward on Monday March 02 2015, @09:02PM (#152114)

      Picard.

      But Spock beats out Data pretty handily.

    • (Score: 4, Interesting) by marcello_dl on Monday March 02 2015, @10:47PM

      by marcello_dl (2685) on Monday March 02 2015, @10:47PM (#152186)

      Which reminds me of a haiku review of Star Trek TNG:

      Teleport beams across the autumn night
      the future is full of marvels
      still no cure for baldness.

      P.S. John Koenig eats any trekkie captain for breakfast.

  • (Score: 2, Informative) by GoonDu on Wednesday March 04 2015, @12:01PM

    by GoonDu (2623) on Wednesday March 04 2015, @12:01PM (#152977)

    Well, at the risk of letting Google determining what is true or not, at least Gawker is gonna tank for this: http://dailycaller.com/2015/03/03/gawker-to-be-penalized-by-new-fact-based-search-algorithm-says-google/ [dailycaller.com]

    >If websites include information contradictory to the Knowledge Vault, their Knowledge-Based Trust score suffers — and in the case of Gawker and others, they suffer significantly.

    >Under the classic search results system, Gawker ranks in the top 15 percent of Google search results. Under Knowledge-Based Trust — which has yet to go live — Gawker falls to the bottom 50 percent of Knowledge-Based Trust scored websites, according to the report.

    >“In other words, they are considered less trustworthy than half of the websites,” Google researchers wrote.

    >Among the other “gossip” sites to rank in the top 15 percent of classic search rankings, but bottom 50 percent of Knowledge-Based Trust scores, are Yahoo! OMG!, TMZ, E! Online, People, and USMagazine.