While people are starting to understand the importance of privacy it is a major hurdle to get them to select a different search engine.
Search engines eat resources like crazy, so operating costs are non-negligible.
Some sites (including e.g. github) use a whitelist in robots.txt, blocking new crawlers.
The amount of spam, link-farms, referrer-linking, etc. is beyond your worst nightmare.
Returning good results takes a long time to fine-tune.
Monetizing is nearly impossible because advertising networks want to know everything about the users, going against privacy concerns.
Buying search results from other search engines is impossible until you have least x million searches/month. Getting x million searches/month is impossible unless you buy search results from other search engines (or sink a lot of cash into making it yourself).
So what do you soylentils think can be done to increase privacy for ordinary users, search-engine-wise ?
There is Gigablast which is on GitHub that did implement an open source search engine.
Findx used the gigablast open-soure-search-engine. In hindsight that was a mistake. Email me if you want details. I'm not going to rant about it here.
Regarding sharding: Yes, you have to shard. You also have to build in the assumptions that shards will fail, so you need redundancy and a way to deal with inconsistencies.
(Score: 3, Informative) by isj on Thursday November 22 2018, @05:34PM
Findx used the gigablast open-soure-search-engine. In hindsight that was a mistake. Email me if you want details. I'm not going to rant about it here.
Regarding sharding: Yes, you have to shard. You also have to build in the assumptions that shards will fail, so you need redundancy and a way to deal with inconsistencies.