The Rust programming language continues to grow in popularity and now developer platform GitHub has used it to build its new code-focused search engine, Blackbird.
Instead of perusing forums for answers, GitHub wants users to use its search engine, which is currently in beta.
[...] "At first glance, building a search engine from scratch seems like a questionable decision. Why would you do that? Aren't there plenty of existing, open source solutions out there already? Why build something new?" writes GitHub's Timothy Clem.
His short answer is that GitHub hasn't found success using general text search products to power code search.
"The user experience is poor, indexing is slow, and it's expensive to host. There are some newer, code-specific open source projects out there, but they definitely don't work at GitHub's scale," he writes.
[...] The Rust-written custom search engine, Blackbird, is more efficient and gives GitHub "substantial storage savings via deduplication and guarantees a uniform load distribution across shards", according to Pavel Avgustinov, VP of software engineering at GitHub.
He argues GitHub's scale means it can't use a Unix 'grep' (global regular expression print) for search. In effect, it would be too slow when considering the possibility of processing hundred of terabytes of code in memory. Queries would take too long.
(Score: 3, Funny) by turgid on Saturday February 11, @10:50AM (5 children)
Well if Microsoft says we should all be using Rust, I suppose I'd better start using it. For years they told us all we should be using C++.
I refuse to engage in a battle of wits with an unarmed opponent [wikipedia.org].
(Score: 0) by Anonymous Coward on Saturday February 11, @03:33PM (4 children)
That was before they decided that Windows 10 would be the last Windows ever.
(Score: 2) by turgid on Saturday February 11, @04:37PM (3 children)
Is the new one going to be written in Rust?
I refuse to engage in a battle of wits with an unarmed opponent [wikipedia.org].
(Score: 3, Funny) by Freeman on Monday February 13, @02:52PM (2 children)
Nah, it's too rusty. They need something more shiny to peddle.
Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
(Score: 2) by turgid on Monday February 13, @03:06PM (1 child)
And I suppose they didn't invent Rust so they can't control it, can they? Perhaps they'll create a Rust++ or a Rust# which is slightly incompatible and broken in subtle ways but PHB-friendly so millions of Microsoft developers the world over will be forced to learn it.
I refuse to engage in a battle of wits with an unarmed opponent [wikipedia.org].
(Score: 3, Funny) by Freeman on Monday February 13, @05:59PM
Rust# has a nice ring too. Just try not to get tetanus from using it.
Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
(Score: 2, Informative) by Anonymous Coward on Saturday February 11, @01:29PM (17 children)
Haven't these bunch heard of indexing and caching? You don't have to support the fancier regex stuff. Prefix and suffix indexing has been around for ages.
Also indexing hundreds of TB of text isn't that much nowadays for a global level company. Just a few dozen computers and hundreds of SSDs should do it.
FWIW there are some papers out there on fast regex indexing/search.
(Score: 3, Informative) by turgid on Saturday February 11, @01:55PM (4 children)
GitHub is Microsoft, remember. They're not the sharpest implements in the box.
I refuse to engage in a battle of wits with an unarmed opponent [wikipedia.org].
(Score: 0) by Anonymous Coward on Saturday February 11, @03:34PM (3 children)
SQL Management Studio takes ages to launch nowadays. Teams search often doesn't search - it finds stuff but you can't click on it and go to the message and the context.
Maybe Microsoft's ChatGPT stuff will give about as useless/inaccurate results but with more human-like prose.
(Score: 0) by Anonymous Coward on Saturday February 11, @06:57PM (2 children)
> And that explains the dismal state of Windows (8, 10, 11).
Was there was a time when Windows was not in a dismal state?
(Score: 1, Informative) by Anonymous Coward on Sunday February 12, @10:56AM
In contrast Desktop Linux has been dismal for ages. With developers wasting time on stuff like "wobbly windows".
(Score: 2) by maxwell demon on Sunday February 12, @11:25AM
Yes, before they started development, Windows was in the best state any Microsoft operating system has ever been in: Nonexistence.
The Tao of math: The numbers you can count are not the real numbers.
(Score: 4, Interesting) by HiThere on Saturday February 11, @02:26PM (10 children)
My first take was that it was the work of some Rust fan who wanted to prove that the language had some merit. Well, it has, but so have Haskell and Erlang. I surveyed the languages out there for my current project and even tried a couple. I decided on C++. Rust didn't make the first cut. It's not that I couldn't do it in Rust (I'm not sure), it's that I didn't like it. (The one I liked was D, but C would actually have been best except that I need hash tables.)
Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
(Score: 3, Touché) by Rosco P. Coltrane on Saturday February 11, @03:23PM (1 child)
With such impartial and technically-grounded arguments, I'm totally convinced!
(Score: 4, Informative) by HiThere on Saturday February 11, @08:58PM
No single datapoint will be decisive. It depends on your project, your existing skills, and how much you feel like trying another language. But I saw little in Rust to recommend it over go, D, erlang, Each of those would be better than Rust for some projects, and worse for others. C would have been best, because libraries generated in C are the most portable, but I needed a hash table, and didn't want to code my own or depend on some other (non-standard) library. (Also the glib hash table was...well, the documentation was difficult to parse. If I really need C I can rewrite later. It's not quite a trivial rewrite, as I use vector quite a lot, but pretty easy. But I'd probably use a hash table from one of my reference books rather than add an external library, even that one.)
Rust? What languages can easily import libraries written in Rust? They may exist, or even be common, but I haven't run across references to them, which indicated to me that they'd be poorly documented.
Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
(Score: 3, Interesting) by RS3 on Saturday February 11, @03:42PM (6 children)
I haven't worked with hash tables in programming yet. You mean like this?
https://www.tutorialspoint.com/data_structures_algorithms/hash_data_structure.htm [tutorialspoint.com]
(Score: 3, Informative) by HiThere on Saturday February 11, @09:03PM (5 children)
Hash tables, in Python the name is Dictionary, in C++ unordered map. I don't remember what it's called in Java, but it's there. Most modern languages come with hash tables built in, but C dates from back when RAM was really precious. But I don't know why they haven't added them to the standard library.
Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
(Score: 2) by RS3 on Saturday February 11, @09:07PM (4 children)
Would these [thoughtco.com] help?
(Score: 2) by HiThere on Sunday February 12, @12:24AM (3 children)
Not really, as I don't want to depend on an external library, but I've bookmarked that link because it may be what I want for some other project.
Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
(Score: 2) by RS3 on Sunday February 12, @12:49AM (2 children)
I wouldn't either. My thought was get some open source, check it over, and use it into your project, or compile it to your own static or dynamic library. I think? Maybe? Unless you're bound in some situation where you can't use open source. BTW, I don't do enough programming so maybe I'm completely off base here, but AFAIK, most C functions are in some kind of library, right? https://en.wikipedia.org/wiki/C_standard_library [wikipedia.org]
maybe a little cleaner format: https://en.wikibooks.org/wiki/C_Programming/Standard_libraries [wikibooks.org]
https://en.cppreference.com/w/cpp/header [cppreference.com]
(Score: 2) by HiThere on Sunday February 12, @01:47AM (1 child)
Ah. OK. I plan to GPL the code after it's done, so that's not a problem. But I've already got source, I'd just need to adapt it.
But C++ is nearly as good as C for this purpose, as most things can take C++ libraries. (I could switch the vectors out pretty easily, as they're all vector. So If I need to do the conversion it's no big thing. I'd need to switch fstream to FILE*, and things like that.)
Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
(Score: 2) by RS3 on Sunday February 12, @02:36AM
Yay, he gets me! I'm terrible at concisely verbalizing my ideas.
The link I included to the open-source hash stuff was straight C, I believe (too tired to look...)
BTW, I like your tagline (did I mention that before?) My first comments on /. more than 20 years ago were about my worries with the trouble javascript can cause. It has far too much power / ability / functionality (hence most malware comes through javascript functionality...)
(Score: 2) by JoeMerchant on Saturday February 11, @04:38PM
Nevermind the drag-drop language for children jokes...
Was this "from scratch" effort really clean room coding from requirements, or did they run a C++ -> Rust translator on the existing code base?
Україна досі не є частиною Росії Слава Україні🌻 https://news.stanford.edu/2023/02/17/will-russia-ukraine-war-end
(Score: 2) by Freeman on Monday February 13, @06:03PM
"Fast" regex indexing/search, sounds like an oxymoron. You need something to take 10x longer and be 10x harder to figure out what you did last time. Just do it in RegEx.
Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
(Score: 2) by Rosco P. Coltrane on Saturday February 11, @02:55PM (1 child)
That's surprising. They've been flogging their new acquisition so hard lately, it's hard to believe they came up with a search product that doesn't use it.
(Score: 1) by psa on Sunday February 12, @02:28AM
Yet. Doesn't use AI, yet. I'm sure it's on the roadmap.
(Score: 2) by turgid on Monday February 13, @06:35PM
Don't forget the Hammerite [hammerite.co.uk] to cover all those holes the borrow checker can't reach.
I refuse to engage in a battle of wits with an unarmed opponent [wikipedia.org].