Stories
Slash Boxes
Comments

SoylentNews is people

Submission Preview

Link to Story

Undergraduate Upends a 40-Year-Old Data Science Conjecture

Accepted submission by RamiK at 2025-02-11 11:59:35
Software

In a 1985 paper, the computer scientist Andrew Yao, who would go on to win the A.M. Turing Award, asserted that among hash tables with a specific set of properties, the best way to find an individual element or an empty spot is to just go through potential spots randomly — an approach known as uniform probing. He also stated that, in the worst-case scenario, where you’re searching for the last remaining open spot, you can never do better than x. for 40 years, most computer scientists assumed that Yao’s conjecture was true.

Krapivin was not held back by the conventional wisdom for the simple reason that he was unaware of it. “I did this without knowing about Yao’s conjecture,” he said. His explorations with tiny pointers led to a new kind of hash table — one that did not rely on uniform probing. And for this new hash table, the time required for worst-case queries and insertions is proportional to (log x)2 — far faster than x. This result directly contradicted Yao’s conjecture.
...
“It’s not just that they disproved [Yao’s conjecture], they also found the best possible answer to his question,” said Sepehr Assadi of the University of Waterloo.
...
In addition to refuting Yao’s conjecture, the new paper also contains what many consider an even more astonishing result. It pertains to a related, though slightly different, situation: In 1985, Yao looked not only at the worst-case times for queries, but also at the average time taken across all possible queries. He proved that hash tables with certain properties — including those that are labeled “greedy,” which means that new elements must be placed in the first available spot — could never achieve an average time better than log x.
...
They showed that it did not by providing a counterexample, a non-greedy hash table with an average query time that’s much, much better than log x. In fact, it doesn’t depend on x at all. “You get a number,” Farach-Colton said, “something that is just a constant and doesn’t depend on how full the hash table is.” The fact that you can achieve a constant average query time, regardless of the hash table’s fullness, was wholly unexpected — even to the authors themselves.

( https://www.quantamagazine.org/undergraduate-upends-a-40-year-old-data-science-conjecture-20250210/ [quantamagazine.org] )


Original Submission