New Algorithm Simplifies the Categorization of Large Amount of Data

posted by n1 on Saturday June 28 2014, @09:51AM

from the /dev/null-grouping dept.

lhsi writes:

A new algorithm has been published that simplifies grouping data sets together according to their similarity, sometimes referred to as Cluster Analysis [CA].

Data sets can be imagined as "clouds" of data points in a multidimensional space. These points are generally differently distributed: more widely scattered in one area and denser in another. CA is used to identify the denser areas efficiently, grouping the data in a certain number of significant subsets on the basis of this criterion. Each subset corresponds to a category.
"Think of a database of facial photographs ", explains Alessandro Laio, professor of Statistical and Biological Physics at SISSA. "The database may contain more than one photo of the same person, so CA us used to group all the pictures of the same individual. This type of analysis is carried out by automatic facial recognition systems, for example".
"We tried to devise a more efficient algorithm than those currently used, and one capable of solving some of the classic problems of CA", continues Laio.

"Our approach is based on a new way of identifying the centre of the cluster, i.e., the subsets", explains Alex Rodrigez, co-author of the paper. "Imagine having to identify all the cities in the world, without having access to a map. A huge task", says Rodriguez. "We therefore identified a heuristic, that is, a simple rule or a sort of shortcut to achieve the result".
To find out if a place is a city, we can ask each inhabitant to count his "neighbours", in other words, how many people live within 100 metres from his house. Once we have this number, we then go on to find, for each inhabitant, the shortest distance at which another inhabitant with a greater number of neighbours lives. "Together, these two data", explains Laio, "tell us how densely populated is the area where an individual lives and the distance between individuals who have the most neighbours. By automatically cross-checking these data, for the entire world population, we can identify the individuals who represent the centres of the clusters, which correspond to the various cities". "Our algorithm performs precisely this kind of calculation, and it can be applied to many different settings", adds Rodriguez.

Abstract: http://www.sciencemag.org/content/344/6191/1492

This discussion has been archived. No new comments can be posted.

New Algorithm Simplifies the Categorization of Large Amount of Data | Log In/Create an Account | Top | 11 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Log In

New Algorithm Simplifies the Categorization of Large Amount of Data

it's racist profiling! it's racist profiling! (Score: 0) by Anonymous Coward on Saturday June 28 2014, @10:18AM

Re:it's racist profiling! Re:it's racist profiling! (Score: 4, Insightful) by c0lo on Saturday June 28 2014, @10:39AM

Re:it's racist profiling! Re:it's racist profiling! (Score: 1, Informative) by Anonymous Coward on Saturday June 28 2014, @01:46PM

Re:it's racist profiling! (Score: 3, Insightful) by opinionated_science on Saturday June 28 2014, @03:36PM

Re:it's racist profiling! (Score: 2) by Geotti on Saturday June 28 2014, @09:48PM

Considerations Considerations (Score: 2) by meisterister on Saturday June 28 2014, @03:41PM

Re:Considerations (Score: 3, Interesting) by c0lo on Sunday June 29 2014, @12:14AM

Re:Considerations Re:Considerations (Score: 2, Interesting) by TGV on Sunday June 29 2014, @05:19AM

Re:Considerations (Score: 2) by opinionated_science on Sunday June 29 2014, @02:36PM

A new spin? (Score: 0) by Anonymous Coward on Saturday June 28 2014, @03:58PM

algorithm to find different groups of preferences? (Score: 3, Interesting) by TheLink on Saturday June 28 2014, @06:18PM

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Log In

Related Links

New Algorithm Simplifies the Categorization of Large Amount of Data

it's racist profiling! it's racist profiling! (Score: 0) by Anonymous Coward on Saturday June 28 2014, @10:18AM

Re:it's racist profiling! Re:it's racist profiling! (Score: 4, Insightful) by c0lo on Saturday June 28 2014, @10:39AM

Re:it's racist profiling! Re:it's racist profiling! (Score: 1, Informative) by Anonymous Coward on Saturday June 28 2014, @01:46PM

Re:it's racist profiling! (Score: 3, Insightful) by opinionated_science on Saturday June 28 2014, @03:36PM

Re:it's racist profiling! (Score: 2) by Geotti on Saturday June 28 2014, @09:48PM

Considerations Considerations (Score: 2) by meisterister on Saturday June 28 2014, @03:41PM

Re:Considerations (Score: 3, Interesting) by c0lo on Sunday June 29 2014, @12:14AM

Re:Considerations Re:Considerations (Score: 2, Interesting) by TGV on Sunday June 29 2014, @05:19AM

Re:Considerations (Score: 2) by opinionated_science on Sunday June 29 2014, @02:36PM

A new spin? (Score: 0) by Anonymous Coward on Saturday June 28 2014, @03:58PM

algorithm to find different groups of preferences? (Score: 3, Interesting) by TheLink on Saturday June 28 2014, @06:18PM