SoylentNews Comments | Universal Method to Sort Complex Information Found

Universal Method to Sort Complex Information Found

posted by martyb on Wednesday August 15 2018, @01:02PM

from the won't-you-be-my-neighbor? dept.

edIII writes:

The nearest neighbor problem asks where a new point fits in to an existing data set. A few researchers set out to prove that there was no universal way to solve it. Instead, they found such a way.

If you were opening a coffee shop, there's a question you'd want answered: Where's the next closest cafe? This information would help you understand your competition.
This scenario is an example of a type of problem widely studied in computer science called "nearest neighbor" search. It asks, given a data set and a new data point, which point in your existing data is closest to your new point? It's a question that comes up in many everyday situations in areas such as genomics research, image searches and Spotify recommendations.
And unlike the coffee shop example, nearest neighbor questions are often very hard to answer. Over the past few decades, top minds in computer science have applied themselves to finding a better way to solve the problem. In particular, they've tried to address complications that arise because different data sets can use very different definitions of what it means for two points to be "close" to one another.
Now, a team of computer scientists has come up with a radically new way of solving nearest neighbor problems. In a pair of papers, five computer scientists have elaborated the first general-purpose method of solving nearest neighbor questions for complex data.

Original Submission

This discussion has been archived. No new comments can be posted.

Universal Method to Sort Complex Information Found | Log In/Create an Account | Top | 16 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

(1)

The two papers - PDFsThe two papers - PDFs (Score: 5, Informative) by c0lo on Wednesday August 15 2018, @01:12PM (1 child)

by c0lo (156) on Wednesday August 15 2018, @01:12PM (#721757) Journal

Data-Dependent Hashing via Nonlinear Spectral Gaps [ilyaraz.org]
and
Hölder Homeomorphisms and Approximate Nearest Neighbors [ilyaraz.org]
Happy reading.

--
https://www.youtube.com/@ProfSteveKeen https://soylentnews.org/~MichaelDavidCrawford
- Re:The two papers - PDFs(Score: 1) by khallow on Sunday August 19 2018, @01:35AM
  
  by khallow (3766) on Sunday August 19 2018, @01:35AM (#723234) Journal
  
  There's some serious machinery hiding in those papers. What's interesting about this is that the machinery appears new. For example, I've half-heartedly traced back the use of "non-linear spectral gaps" and "expanders" to a 2007 paper [projecteuclid.org] by Vincent Lafforgue (which is at present impenetrable to me since my command of French is, shall we say, very distant from perfect?). To move from an abstract paper in normed Banach spaced to an application in computer science (even if it's a bit on the theoretical side) in around 11 years is a pretty impressive display of cross-disciplinary activity.
  
  Parent
Hasn't Google solved this?Hasn't Google solved this? (Score: 1, Interesting) by Anonymous Coward on Wednesday August 15 2018, @02:12PM (5 children)

by Anonymous Coward on Wednesday August 15 2018, @02:12PM (#721779)

and if not, why not
- Re:Hasn't Google solved this?Re:Hasn't Google solved this? (Score: 2) by takyon on Wednesday August 15 2018, @02:18PM (3 children)
  
  by takyon (881) <reversethis-{gro ... s} {ta} {noykat}> on Wednesday August 15 2018, @02:18PM (#721781) Journal
  
  Why solve when you can buy?
  
  --
  [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
  
  Parent
  - Re:Hasn't Google solved this?Re:Hasn't Google solved this? (Score: 0) by Anonymous Coward on Wednesday August 15 2018, @06:06PM (2 children)
    
    by Anonymous Coward on Wednesday August 15 2018, @06:06PM (#721868)
    
    Why buy when you can steal?
    
    Parent
    - Re:Hasn't Google solved this?Re:Hasn't Google solved this? (Score: 0) by Anonymous Coward on Wednesday August 15 2018, @07:03PM (1 child)
      
      by Anonymous Coward on Wednesday August 15 2018, @07:03PM (#721889)
      
      Why steal when you can control?
      
      Parent
      - Why, o why(Score: 2) by fyngyrz on Wednesday August 15 2018, @10:56PM
        
        by fyngyrz (6567) on Wednesday August 15 2018, @10:56PM (#721950) Journal
        
        Why control when you can simply disenfranchise?
        
        Parent
- Re:Hasn't Google solved this?(Score: 4, Informative) by Non Sequor on Wednesday August 15 2018, @02:40PM
  
  by Non Sequor (1005) on Wednesday August 15 2018, @02:40PM (#721789) Journal
  
  Well, the graph theory version of the problem actually has intractable cases, so-called expander graphs that are both highly connected (you have to remove a lot of vertices or edges to split the graph in two pieces) and with low average degree (most vertices aren’t connected to many others).
  
  --
  Write your congressman. Tell him he sucks.
  
  Parent
I figured this out years agoI figured this out years ago (Score: 5, Funny) by Kalas on Wednesday August 15 2018, @02:41PM (4 children)

by Kalas (4247) on Wednesday August 15 2018, @02:41PM (#721791)

When I find I'm having too much trouble sorting data sets I just send it all to /dev/null.
It may not make the boss too happy but it damn sure relieves my stress, at least until the next job hunt begins. :P
- Re:I figured this out years agoRe:I figured this out years ago (Score: 4, Funny) by shrewdsheep on Wednesday August 15 2018, @03:35PM (1 child)
  
  by shrewdsheep (5215) on Wednesday August 15 2018, @03:35PM (#721821)
  
  I suggest you pipe to /dev/random next time. For starters, you safe the world from global warming by providing free random numbers (they are not sorted, you said) to others. If they data is from HR or marketing, they can even be used for cryptographic purposes (undistinguishable from pure white noise). Finally, your boss will promote you for having gotten rid of his incriminating emails while simultaneously having created marketable property.
  
  Parent
  - Re:I figured this out years ago(Score: 0) by Anonymous Coward on Wednesday August 15 2018, @05:36PM
    
    by Anonymous Coward on Wednesday August 15 2018, @05:36PM (#721858)
    
    Shouldn't you wipe incriminating emails with a cloth instead?
    
    Parent
- Re:I figured this out years ago(Score: 3, Funny) by requerdanos on Wednesday August 15 2018, @05:01PM
  
  by requerdanos (5997) on Wednesday August 15 2018, @05:01PM (#721845) Journal
  
  When I find I'm having too much trouble sorting data sets I just send it all to /dev/null.
  If /dev/null is fast I will use it... You turn it on and it scales right up.
  
  Parent
- Re:I figured this out years ago(Score: 4, Funny) by jasassin on Wednesday August 15 2018, @10:32PM
  
  by jasassin (3566) <jasassin@gmail.com> on Wednesday August 15 2018, @10:32PM (#721941) Homepage Journal
  
  hen I find I'm having too much trouble sorting data sets I just send it all to /dev/null.
  It may not make the boss too happy but it damn sure relieves my stress, at least until the next job hunt begins. :P
  That technique really speeds up backups also!
  
  --
  jasassin@gmail.com GPG Key ID: 0xE6462C68A9A3DB5A
  
  Parent
That's important research(Score: 4, Funny) by bob_super on Wednesday August 15 2018, @05:25PM

by bob_super (1357) on Wednesday August 15 2018, @05:25PM (#721853)

Remember, folks: You need to optimize finding the nearest neighbor, so you know where to build your wall. Everyone else, you can handle with tariffs and insults.
Presumably I'm missing something...Presumably I'm missing something... (Score: 2) by wonkey_monkey on Wednesday August 15 2018, @06:37PM (1 child)

by wonkey_monkey (279) on Wednesday August 15 2018, @06:37PM (#721880) Homepage

A few researchers set out to prove that there was no universal way to solve it.
What about simply testing every point in turn and seeing which one is closest?
It didn't say anything about it being efficient...

--
systemd is Roko's Basilisk
- Yes, you are(Score: 3, Informative) by fyngyrz on Wednesday August 15 2018, @11:09PM
  
  by fyngyrz (6567) on Wednesday August 15 2018, @11:09PM (#721952) Journal
  
  As TFS indicated, data sets can differ significantly on what "near" is.
  For instance, geographically near doesn't mean near by road. The other side of Ft. Peck Lake here is only a couple miles away from where I am right now... but requires almost 200 miles of driving over some of the worst roads you can imagine if you don't have a boat. You need a 4WD, a great suspension, lots of power, and should bring shovels, traction pads, and emergency food.
  On the plus side, you might find a T. Rex fossil, or a Triceratops, or a duckbill, something of equal interest, while digging your 4WD out. Again. :) Not that you're allowed to pick up the big 'saur remains, but there are large geodes (I have brought back a couple of 300 lb-plus ones), and invertebrate fossils are fair game (and some of those can be huge as well.) Which makes (most) boats impractical; large rocks in a boat are a very bad idea where large waves are not uncommon. Way too easy to get "that sinking feeling."
  The way I (as a user of such data in my applications) have approached this is to first triage by a radius using lat/long blocked data, then do a second level pass by road using road path data. It can still be quite a tricky process, especially when various means of access are considered, and the interim paths chosen have to comply with the various means of transport chosen.
  Quite aside from the classical search problem, it's not an easy hill to climb. So to speak.
  
  Parent

Moderator Help

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Universal Method to Sort Complex Information Found

The two papers - PDFsThe two papers - PDFs (Score: 5, Informative) by c0lo on Wednesday August 15 2018, @01:12PM (1 child)

Re:The two papers - PDFs(Score: 1) by khallow on Sunday August 19 2018, @01:35AM

Hasn't Google solved this?Hasn't Google solved this? (Score: 1, Interesting) by Anonymous Coward on Wednesday August 15 2018, @02:12PM (5 children)

Re:Hasn't Google solved this?Re:Hasn't Google solved this? (Score: 2) by takyon on Wednesday August 15 2018, @02:18PM (3 children)

Re:Hasn't Google solved this?Re:Hasn't Google solved this? (Score: 0) by Anonymous Coward on Wednesday August 15 2018, @06:06PM (2 children)

Re:Hasn't Google solved this?Re:Hasn't Google solved this? (Score: 0) by Anonymous Coward on Wednesday August 15 2018, @07:03PM (1 child)

Why, o why(Score: 2) by fyngyrz on Wednesday August 15 2018, @10:56PM

Re:Hasn't Google solved this?(Score: 4, Informative) by Non Sequor on Wednesday August 15 2018, @02:40PM

I figured this out years agoI figured this out years ago (Score: 5, Funny) by Kalas on Wednesday August 15 2018, @02:41PM (4 children)

Re:I figured this out years agoRe:I figured this out years ago (Score: 4, Funny) by shrewdsheep on Wednesday August 15 2018, @03:35PM (1 child)

Re:I figured this out years ago(Score: 0) by Anonymous Coward on Wednesday August 15 2018, @05:36PM

Re:I figured this out years ago(Score: 3, Funny) by requerdanos on Wednesday August 15 2018, @05:01PM

Re:I figured this out years ago(Score: 4, Funny) by jasassin on Wednesday August 15 2018, @10:32PM

That's important research(Score: 4, Funny) by bob_super on Wednesday August 15 2018, @05:25PM

Presumably I'm missing something...Presumably I'm missing something... (Score: 2) by wonkey_monkey on Wednesday August 15 2018, @06:37PM (1 child)

Yes, you are(Score: 3, Informative) by fyngyrz on Wednesday August 15 2018, @11:09PM