While people are starting to understand the importance of privacy it is a major hurdle to get them to select a different search engine.
Search engines eat resources like crazy, so operating costs are non-negligible.
Some sites (including e.g. github) use a whitelist in robots.txt, blocking new crawlers.
The amount of spam, link-farms, referrer-linking, etc. is beyond your worst nightmare.
Returning good results takes a long time to fine-tune.
Monetizing is nearly impossible because advertising networks want to know everything about the users, going against privacy concerns.
Buying search results from other search engines is impossible until you have least x million searches/month. Getting x million searches/month is impossible unless you buy search results from other search engines (or sink a lot of cash into making it yourself).
So what do you soylentils think can be done to increase privacy for ordinary users, search-engine-wise ?
The problem with a run of the mill MySQL is that anything that uses a single monolithic SQL server would crash and burn under the load. It would need to be highly distributed, the word is "scale out". Hundreds of SQL server instances over a farm. Which means a lot of load balancing, mirroring, slicing, distribution, etc going in a kind of a mesh architecture avoiding a single point of load. Sharding has been used in dealing with these sorts of things as well. SQL may not even be the big problem, the big problem are implementations of it, which usually revolve around a single monolithic server or a few and where replication is primative . The most research however have gone into scale-out however for NoSQL databases.
There is Gigablast which is on GitHub that did implement an open source search engine.
There is Gigablast which is on GitHub that did implement an open source search engine.
Findx used the gigablast open-soure-search-engine. In hindsight that was a mistake. Email me if you want details. I'm not going to rant about it here.
Regarding sharding: Yes, you have to shard. You also have to build in the assumptions that shards will fail, so you need redundancy and a way to deal with inconsistencies.
(Score: 2) by eravnrekaree on Thursday November 22 2018, @05:12PM (1 child)
The problem with a run of the mill MySQL is that anything that uses a single monolithic SQL server would crash and burn under the load. It would need to be highly distributed, the word is "scale out". Hundreds of SQL server instances over a farm. Which means a lot of load balancing, mirroring, slicing, distribution, etc going in a kind of a mesh architecture avoiding a single point of load. Sharding has been used in dealing with these sorts of things as well. SQL may not even be the big problem, the big problem are implementations of it, which usually revolve around a single monolithic server or a few and where replication is primative . The most research however have gone into scale-out however for NoSQL databases.
There is Gigablast which is on GitHub that did implement an open source search engine.
(Score: 3, Informative) by isj on Thursday November 22 2018, @05:34PM
Findx used the gigablast open-soure-search-engine. In hindsight that was a mistake. Email me if you want details. I'm not going to rant about it here.
Regarding sharding: Yes, you have to shard. You also have to build in the assumptions that shards will fail, so you need redundancy and a way to deal with inconsistencies.