Stories
Slash Boxes
Comments

SoylentNews is people

posted by n1 on Saturday August 09 2014, @04:09AM   Printer-friendly
from the substandard-quality-assurance dept.

Computer Scientists in China have developed an algorithm that can automatically rank Wikipedia articles on quality using Bayesian statistics.

The notion of finding evidence based on an analysis of probabilities was first described by 18th Century mathematician and theologian Thomas Bayes. Bayesian probabilities were then utilized by Pierre-Simon Laplace to pioneer a new statistical method. Today, Bayesian analysis is commonly used to assess the content of emails and to determine the probability that the content is spam, junk mail, and so filter it from the user's inbox if the probability is high.

Han and Chen have now used dynamic Bayesian network (DBN) to analyze in a similar manner the content of Wikipedia entries. They apply multivariate Gaussian distribution modeling to the DBN analysis, which gives them a distribution of the quality of each article so that entries might be ranked. Very low-ranking entries might be flagged for editorial attention to raise the quality. By contrast, high-ranking entries could be marked in some way as the definitive entry so that such an entry is not subsequently overwritten with lower quality information.

The team has tested its algorithm on sets of several hundred articles comparing the automated quality assessment by the computer with assessment by a human user. Their algorithm out-performs a human user by up to 23 percent in correctly classifying the quality rank of a given article in the set, the team reports. The use of a computerized system to provide a quality standard for Wikipedia entries would avoid the subjective need to have people classify each entry. It could thus improve the standard as well as provide a basis for an improved reputation for the online encyclopedia.

Abstract: http://www.inderscience.com/offer.php?id=64056

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by kaszz on Saturday August 09 2014, @11:38AM

    by kaszz (4211) on Saturday August 09 2014, @11:38AM (#79282) Journal

    Now the state may rank which articles are most likely to upset the people to being screwed all their life. And subsequently censor and autoedit any entry. Progress!

    Though it is an interesting work that may have uses. Like finding articles in need of improvement and comparing versions of more mature articles. But the abuse possibility is there too.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2