Google's search engine currently uses the number of incoming links to a web page as a proxy for quality, determining where it appears in search results. So pages that many other sites link to are ranked higher. This system has brought us the search engine as we know it today, but the downside is that websites full of misinformation can rise up the rankings, if enough people link to them.
A Google research team is adapting that model to measure the trustworthiness of a page, rather than its reputation across the web. Instead of counting incoming links, the system – which is not yet live – counts the number of incorrect facts within a page. "A source that has few false facts is considered to be trustworthy," says the team (arxiv.org/abs/1502.03519v1). The score they compute for each page is its Knowledge-Based Trust score.
The software works by tapping into the Knowledge Vault, the vast store of facts that Google has pulled off the internet. Facts the web unanimously agrees on are considered a reasonable proxy for truth. Web pages that contain contradictory information are bumped down the rankings.
(Score: 5, Insightful) by snick on Monday March 02 2015, @02:34PM
How is it possible for a web page to contradict a fact that is unanimously agreed upon by web pages?
(Score: 3, Insightful) by ikanreed on Monday March 02 2015, @04:07PM
By generally being "accurate" outside the thing they're trying to correct.
And this piece is hella misleading too. Google's stated approach for a while now has been combining the results from several different ranking algorithms, including the traditional link based pagerank, content analysis, trust rankings, and a few others.
This is probably just one new(or reweighted) input to that algorithm, and won't dramatically change anything.
(Score: 2) by SlimmPickens on Monday March 02 2015, @11:15PM
This is probably just one new(or reweighted) input to that algorithm, and won't dramatically change anything.
I didn't read the whole paper. just abstract, conclusion plus a bit of searching. It doesn't appear to say much, but there is:
We discuss new research opportunities for improving it and using it in conjunction with existing signals such as PageRank (Section 5.4.2).
5.4.2 calls it "orthogonal" compared to pagerank but mainly discusses ways of avoiding triviality with basically no discussion of integration.
(Score: 5, Insightful) by Anonymous Coward on Monday March 02 2015, @02:42PM
Seems to me like the most common and the most effective lies on the net are not fabrications, but half-truths.
A website can be 100% factual and still lying its ass off. Without strong AI they'll never be able to automate detection of that.
(Score: 5, Insightful) by Sir Finkus on Monday March 02 2015, @02:47PM
Pretty much all HUMANS have trouble with this kind of thing too. The task looks pretty daunting.
Join our Folding@Home team! [stanford.edu]
(Score: 1, Interesting) by Anonymous Coward on Monday March 02 2015, @03:57PM
Even if an AI can tell the difference how can a human tell that the AI is right?
(Score: 0) by Anonymous Coward on Monday March 02 2015, @04:14PM
An AI that powerful should be able to convince the humans. You may not exactly know that the AI is right, but you'll believe it.
(Score: 3, Funny) by bob_super on Tuesday March 03 2015, @12:59AM
I for one, would like to welcome our new AI prophets...
(Score: 0) by Anonymous Coward on Monday March 02 2015, @05:27PM
> Pretty much all HUMANS have trouble with this kind of thing too.
Not true. Yes, humans without much domain knowledge are easily subject to that kind of manipulation.
But, it is a matter of knowing what you don't know. The ignorant don't even realize that counter-facts exist.
But intelligence + expertise immunizes a person from that kind of deception.
(Score: 2) by Sir Finkus on Monday March 02 2015, @05:52PM
But intelligence + expertise immunizes a person from that kind of deception.
I disagree. Even "experts" are fooled all the time, especially if the conclusions of the claim correspond with their own expectations. It's one of the reasons things like peer review and reproducibility are so important for science.
Join our Folding@Home team! [stanford.edu]
(Score: 1, Informative) by Anonymous Coward on Monday March 02 2015, @06:24PM
> Even "experts" are fooled all the time
For a definition of "all" that means on rare occasion.
> It's one of the reasons things like peer review and reproducibility are so important for science.
That's an entirely different situation, that's about catching mistakes, not about deliberate deception.
Sure there is the occasional bad actor, but if they were a common problem we would be overwhelmed with crap rather than the occasional high-profile embarrassment.
(Score: 3, Insightful) by Anonymous Coward on Monday March 02 2015, @02:47PM
If Google decides on the trustworthiness of information, then how can we have a guarantee that information that Google doesn't like won't "magically" be rated less trustworthy and thus vanish from the searches of top pages?
And BTW, what if I want to specifically search for something I know not to be true? Say, I'd like to look specifically at pages that claim the moon landings have not happened (let's say I'm making a study about that believe and want to visit some such sites in order to analyse their argumentation patterns, or something like that). Wouldn't that mean that I'm out of luck if I try to find those pages with Google?
(Score: 4, Insightful) by Sir Finkus on Monday March 02 2015, @03:30PM
If Google decides on the trustworthiness of information, then how can we have a guarantee that information that Google doesn't like won't "magically" be rated less trustworthy and thus vanish from the searches of top pages?
Kind of like now? Google delists results all the time.
Join our Folding@Home team! [stanford.edu]
(Score: 5, Informative) by q.kontinuum on Monday March 02 2015, @04:08PM
Use [yandex.com] one [bing.com] of [qwant.com] the [gigablast.com] other [blekko.com] search [ixquick.com]-engines...
For me, currently Google gives the best results. Nevertheless I try to use other search engines as well, just so I won't miss if any of them gets better.
Registered IRC nick on chat.soylentnews.org: qkontinuum
(Score: 2) by SlimmPickens on Monday March 02 2015, @11:31PM
A bit OT, but who remembers the old searchlores.org? It was the first web page I ever visited aside from the search engine that got me there. I returned many times.
PS, if anyone has a complete copy of it they're willing to share, you'll get respect for life!
(Score: 2) by Reziac on Tuesday March 03 2015, @05:23PM
Now that you mention it, I have most or all of the original searchlores.org's visible site archived (if there were invisible portions, not so much), but it'd be on one of the hard disks currently residing in a shoebox. Remind me in a few months, after I get the mess from this cross-country move thing squared up. Tho some 15+ years later I don't imagine it's particularly current.
And there is no Alkibiades to come back and save us from ourselves.
(Score: 2) by SlimmPickens on Wednesday March 04 2015, @09:35PM
No not current, I think the philosophy is still valuable however, and the nostalgia!
(Score: 2) by Reziac on Wednesday March 04 2015, @09:56PM
That's kinda why I archived it, as I recall... just because! it was already going out of date, but still, an icon of its era, and all that.
And there is no Alkibiades to come back and save us from ourselves.
(Score: 2) by SlimmPickens on Wednesday March 04 2015, @11:02PM
You may have noticed, but it refused my wget attempt ;)
(Score: 2) by Reziac on Thursday March 05 2015, @01:49AM
I doubt I used anything more sophisticated than Netscape!
And there is no Alkibiades to come back and save us from ourselves.
(Score: 2) by SlimmPickens on Thursday March 05 2015, @04:23AM
Yikes I got wget working with a user agent but it pulled a lot more than just searchlores (interesting forum ;). I have no idea if I have the whole searchlores, the folder I've got is 69,385,866 k
Hop that connection doesn't cost you much ;(
(Score: 2) by Reziac on Thursday March 05 2015, @04:41AM
Hmm. We must not be talking about the same one?
I was thinking of Fravia's old site; found a copy (no idea if it's complete):
http://www.woodmann.com/searchlores/ [woodmann.com]
And there is no Alkibiades to come back and save us from ourselves.
(Score: 2) by SlimmPickens on Thursday March 05 2015, @05:03AM
That's the one, all the links I clicked seemed to work so I think I have most of it, however I did notice a few 404's in the output from wget. A lot of other interesting stuff came through in the process.
(Score: 2) by Reziac on Thursday March 05 2015, @06:16AM
Might be stuff linked that wasn't back-when. I might have to take another look myself.
I hadn't even thought of it since Fravia passed away, so thanks for the reminder!
And there is no Alkibiades to come back and save us from ourselves.
(Score: 1, Informative) by Anonymous Coward on Wednesday March 04 2015, @09:00PM
Try here:
http://www.woodmann.com/searchlores/ [woodmann.com]
http://www.woodmann.com/fravia/ [woodmann.com]
(Score: 2) by SlimmPickens on Wednesday March 04 2015, @09:34PM
awesome, cheers
(Score: 0, Disagree) by Anonymous Coward on Monday March 02 2015, @05:28PM
how can we have a guarantee that information that Google doesn't like won't "magically" be rated less trustworthy and thus vanish from the searches of top pages?
You mean ... I don't know ... something negative about Google? Or Google's AI? Or a bill proposed in Congress that benefits Google?
You can't spell "Ain't good" without "AI" ;-)
(Score: 2) by Nuke on Tuesday March 03 2015, @12:03AM
As another example, I was recently looking for how many people believe that Bill Gates invented computers, or at least invented personal computers. A hell of a lot, actually. So many that it is possible that the rankings engine could end up believing such falsehoods itself and down rate the truth.
(Score: 5, Insightful) by Ryuugami on Monday March 02 2015, @02:54PM
Facts the web unanimously agrees on are considered a reasonable proxy for truth.
There's no way this could go wrong.
If a shit storm's on the horizon, it's good to know far enough ahead you can at least bring along an umbrella. - D.Weber
(Score: 2, Interesting) by Anonymous Coward on Monday March 02 2015, @05:32PM
Predicted google response to a search for TPP in the future... The TPP is a job creation bill to help all citizens of all stimulate the economies of all, opposed only by pirates and ne'er-do-wells. With the leaked info ranked about 4987598437598... nothing else to see here, move along Citizen.
(Score: 2) by Common Joe on Tuesday March 03 2015, @02:34PM
That depends. A few edits to Wikipedia [xkcd.com] and my website could be very truthful about anything I want.
(Score: 4, Insightful) by Jaruzel on Monday March 02 2015, @02:59PM
What's a website?
Is it a factual archive on specific topics, is it a web shop selling unique items, is it's a forum full of contradictory opinions, is it blog of images of cats in mittens?
All of the above are valid, and there are many more types to boot. How does 'fact checking' work on the bespoke web shop? If the shop is the only shop on the web selling say, purple unicorns made out of coconuts, because there is no such thing anywhere else, does Google then decided it's a 'lie' and demotes the site accordingly?
Google have too much power in this regard, they are ultimately driving the internet into 10 or so mega-sites all with much higher search ranking than everyone else. If you are a small-to-medium vendor trying to sell online, you might as well just forget it.
(Yes, I have a personal beef with Google over this.)
-Jar
This is my opinion, there are many others, but this one is mine.
(Score: 2) by SlimmPickens on Monday March 02 2015, @11:41PM
It doesn't fact-check everything, the paper specifically talks about handling triviality. It's something that gets used in conjunction with pagerank.
Of course there are problems, but obviously we can improve on what we've got, because it's not like there's no bullshit on the first page of results as it is.
(Score: 4, Insightful) by TWX on Monday March 02 2015, @02:59PM
I can still see this being a problem for forums. Someone posts a question. A dozen people attempt to answer the question, but only some of those answers are good. The good ones are identified and discussed, but the bad ones, abandoned threads of the discussion if you will, break the page with good information from being weighted properly for its usefulness.
IBM had PL/1, with syntax worse than JOSS...
and everywhere the language went, it was a total loss.
(Score: 3, Interesting) by VLM on Monday March 02 2015, @03:06PM
so its kinda like stack exchange without the deletionist jerks, OK interesting.
All I need to do is create two pages, one with "the zeros of the Riemann zeta function all have real part one half" and the other page contradicting, and wait for the wisdom of google to indicate which is correct via rank.
One thing that concerns me is the almighty GOOG might implement something like an idea futures market where a 98% odds of "correct" are implemented by displaying 98 "yes" pages and 2 "no" pages.
Also I wonder about poorly formed questions GOOG is famous for asking "How is babby formed" and "magnets how the F do they work"
(Score: 0) by Anonymous Coward on Monday March 02 2015, @05:33PM
Hey, that's a great idea. I can set up web pages to help me pick what to wear each day, what to have for lunch, which laws to break, which side of conspiracy theories to believe, and even heads or tails while wasting hours on end flipping a coin. Who needs a Magic 8-Ball when you've got Google?
(Score: 5, Interesting) by Thexalon on Monday March 02 2015, @03:17PM
I can guarantee you that if this is implemented there will be efforts by well-funded organizations to flood the web with pages from a diversity of domains that all agree with a desired position, whether that position is "$product is absolutely wonderful!" or "$religion is the only true faith" or "$politician spoke the truth". Since (apparent) reality is now determined by who can put their version on the most domain names, those with enough money targeting an issue with even more money at stake will be able to turn lies into "truths".
The fact in question doesn't even have to be all that controversial to be unreliable: Once I was researching the mythology of Orpheus, and I noticed that while lots of websites claimed a particular part of his story there didn't seem to be an ancient Greek source for this claim, and in fact every single one of them pointed to a single fairly modern book on the subject, which itself cited no sources whatsoever. As you can imagine, I considered the claim completely unproven and treated it accordingly, but I would hardly have been surprised to see lots of people treating it as fact, and Google definitely would have treated it as fact.
The only thing that stops a bad guy with a compiler is a good guy with a compiler.
(Score: 5, Interesting) by GreatAuntAnesthesia on Monday March 02 2015, @03:32PM
> those with enough money targeting an issue with even more money at stake will be able to turn lies into "truths".
Money, or whoever has the biggest botnet.
Either way, this is the most worrying aspect of the proposed system. I'm not one of those MS shills who takes every opportunity to paint Google as the next Satan, but what Google have created here (and maybe it wasn't their intention) is very close to a 1984-style constant editing and re-editing of reality.
He who controls the present controls controls the past. He who controls the past controls the future.
(Score: 2) by jmorris on Monday March 02 2015, @08:05PM
I too, as a member of a political minority, worry about the abuse of this new tech. But I'm not nearly as worried as most here.
flood the web with pages from a diversity of domains
Like everyone currently does with gaming of current Google PageRank? All this will do is cause the SEO community to observe the changes and learn how to keep on gaming them, pretty much what they have been doing every day since Google.com went live.
No, the threat will come from within Google itself. As they taint the results, something they are already quite adept at. Their news stream is almost as one sided as MSNBC or the DNC homepage already. This threatens to make the entirety of Google search results as useless. At which point my response will be to consider it useless and look for an alternative that IS useful. The only free market solution.
(Score: 2) by scruffybeard on Monday March 02 2015, @06:10PM
Discuss. (BTW, the only correct answer is Picard. Google will now enforce this.)
(Score: 0) by Anonymous Coward on Monday March 02 2015, @08:39PM
How the hell did you get modded up for this? You just broke the internet!!!
(Score: 0) by Anonymous Coward on Monday March 02 2015, @09:02PM
Picard.
But Spock beats out Data pretty handily.
(Score: 4, Interesting) by marcello_dl on Monday March 02 2015, @10:47PM
Which reminds me of a haiku review of Star Trek TNG:
Teleport beams across the autumn night
the future is full of marvels
still no cure for baldness.
P.S. John Koenig eats any trekkie captain for breakfast.
(Score: 2, Informative) by GoonDu on Wednesday March 04 2015, @12:01PM
Well, at the risk of letting Google determining what is true or not, at least Gawker is gonna tank for this: http://dailycaller.com/2015/03/03/gawker-to-be-penalized-by-new-fact-based-search-algorithm-says-google/ [dailycaller.com]
>If websites include information contradictory to the Knowledge Vault, their Knowledge-Based Trust score suffers — and in the case of Gawker and others, they suffer significantly.
>Under the classic search results system, Gawker ranks in the top 15 percent of Google search results. Under Knowledge-Based Trust — which has yet to go live — Gawker falls to the bottom 50 percent of Knowledge-Based Trust scored websites, according to the report.
>“In other words, they are considered less trustworthy than half of the websites,” Google researchers wrote.
>Among the other “gossip” sites to rank in the top 15 percent of classic search rankings, but bottom 50 percent of Knowledge-Based Trust scores, are Yahoo! OMG!, TMZ, E! Online, People, and USMagazine.