Slash Boxes

SoylentNews is people

posted by martyb on Wednesday August 15 2018, @01:02PM   Printer-friendly
from the won't-you-be-my-neighbor? dept.

The nearest neighbor problem asks where a new point fits in to an existing data set. A few researchers set out to prove that there was no universal way to solve it. Instead, they found such a way.

If you were opening a coffee shop, there's a question you'd want answered: Where's the next closest cafe? This information would help you understand your competition.

This scenario is an example of a type of problem widely studied in computer science called "nearest neighbor" search. It asks, given a data set and a new data point, which point in your existing data is closest to your new point? It's a question that comes up in many everyday situations in areas such as genomics research, image searches and Spotify recommendations.

And unlike the coffee shop example, nearest neighbor questions are often very hard to answer. Over the past few decades, top minds in computer science have applied themselves to finding a better way to solve the problem. In particular, they've tried to address complications that arise because different data sets can use very different definitions of what it means for two points to be "close" to one another.

Now, a team of computer scientists has come up with a radically new way of solving nearest neighbor problems. In a pair of papers, five computer scientists have elaborated the first general-purpose method of solving nearest neighbor questions for complex data.

Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by wonkey_monkey on Wednesday August 15 2018, @06:37PM (1 child)

    by wonkey_monkey (279) on Wednesday August 15 2018, @06:37PM (#721880) Homepage

    A few researchers set out to prove that there was no universal way to solve it.

    What about simply testing every point in turn and seeing which one is closest?

    It didn't say anything about it being efficient...

    systemd is Roko's Basilisk
    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 3, Informative) by fyngyrz on Wednesday August 15 2018, @11:09PM

    by fyngyrz (6567) on Wednesday August 15 2018, @11:09PM (#721952) Journal

    As TFS indicated, data sets can differ significantly on what "near" is.

    For instance, geographically near doesn't mean near by road. The other side of Ft. Peck Lake here is only a couple miles away from where I am right now... but requires almost 200 miles of driving over some of the worst roads you can imagine if you don't have a boat. You need a 4WD, a great suspension, lots of power, and should bring shovels, traction pads, and emergency food.

    On the plus side, you might find a T. Rex fossil, or a Triceratops, or a duckbill, something of equal interest, while digging your 4WD out. Again. :) Not that you're allowed to pick up the big 'saur remains, but there are large geodes (I have brought back a couple of 300 lb-plus ones), and invertebrate fossils are fair game (and some of those can be huge as well.) Which makes (most) boats impractical; large rocks in a boat are a very bad idea where large waves are not uncommon. Way too easy to get "that sinking feeling."

    The way I (as a user of such data in my applications) have approached this is to first triage by a radius using lat/long blocked data, then do a second level pass by road using road path data. It can still be quite a tricky process, especially when various means of access are considered, and the interim paths chosen have to comply with the various means of transport chosen.

    Quite aside from the classical search problem, it's not an easy hill to climb. So to speak.