from the substandard-quality-assurance dept.
Computer Scientists in China have developed an algorithm that can automatically rank Wikipedia articles on quality using Bayesian statistics.
The notion of finding evidence based on an analysis of probabilities was first described by 18th Century mathematician and theologian Thomas Bayes. Bayesian probabilities were then utilized by Pierre-Simon Laplace to pioneer a new statistical method. Today, Bayesian analysis is commonly used to assess the content of emails and to determine the probability that the content is spam, junk mail, and so filter it from the user's inbox if the probability is high.
Han and Chen have now used dynamic Bayesian network (DBN) to analyze in a similar manner the content of Wikipedia entries. They apply multivariate Gaussian distribution modeling to the DBN analysis, which gives them a distribution of the quality of each article so that entries might be ranked. Very low-ranking entries might be flagged for editorial attention to raise the quality. By contrast, high-ranking entries could be marked in some way as the definitive entry so that such an entry is not subsequently overwritten with lower quality information.
The team has tested its algorithm on sets of several hundred articles comparing the automated quality assessment by the computer with assessment by a human user. Their algorithm out-performs a human user by up to 23 percent in correctly classifying the quality rank of a given article in the set, the team reports. The use of a computerized system to provide a quality standard for Wikipedia entries would avoid the subjective need to have people classify each entry. It could thus improve the standard as well as provide a basis for an improved reputation for the online encyclopedia.