Stories
Slash Boxes
Comments

SoylentNews is people

posted by takyon on Wednesday November 21 2018, @04:00PM   Printer-friendly
from the found-and-lost dept.

The privacy-oriented search engine Findx has shut down: https://privacore.github.io/

The reasons cited are:

  • While people are starting to understand the importance of privacy it is a major hurdle to get them to select a different search engine.
  • Search engines eat resources like crazy, so operating costs are non-negligible.
  • Some sites (including e.g. github) use a whitelist in robots.txt, blocking new crawlers.
  • The amount of spam, link-farms, referrer-linking, etc. is beyond your worst nightmare.
  • Returning good results takes a long time to fine-tune.
  • Monetizing is nearly impossible because advertising networks want to know everything about the users, going against privacy concerns.
  • Buying search results from other search engines is impossible until you have least x million searches/month. Getting x million searches/month is impossible unless you buy search results from other search engines (or sink a lot of cash into making it yourself).

So what do you soylentils think can be done to increase privacy for ordinary users, search-engine-wise ?

Dislaimer: I worked at Findx.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Interesting) by ledow on Wednesday November 21 2018, @04:58PM (2 children)

    by ledow (5567) on Wednesday November 21 2018, @04:58PM (#764808) Homepage

    Agreed.

    No idea who these people are and they're already dead in the water.

    "While people are starting to understand the importance of privacy it is a major hurdle to get them to select a different search engine." - sure, if I've never heard of you.
    "Search engines eat resources like crazy, so operating costs are non-negligible." - depends on what you're doing. If you're just indexing the Internet a rack or two of servers and a decent backbone connection in a datacentre would be a start. That's far from prohibitively expensive but, hey, maybe you could have asked people to help you out and had software they could run to contribute to your indexing efforts, etc. - you know, like collaboration, open-source, projects like Streetmap, etc.
    "Some sites (including e.g. github) use a whitelist in robots.txt, blocking new crawlers." - Sure, that's a problem - for any search engine whatsoever. So don't index github and also DON'T use their services, like you are for your shutdown message!
    "The amount of spam, link-farms, referrer-linking, etc. is beyond your worst nightmare." - Probably. That's what graph analysis is for, you only need suck in the HTML and index it.
    "Returning good results takes a long time to fine-tune." - Welcome to the problem of search engines... it's not about plucking data from a database, but apart finding human-relevant data within it. Can't be solved by brute-force alone, so why would you try?
    "Monetizing is nearly impossible because advertising networks want to know everything about the users, going against privacy concerns." - Advertising is just one way to monetise. If you were relying on it, you picked a really bad business model.
    "Buying search results from other search engines is impossible until you have least x million searches/month. Getting x million searches/month is impossible unless you buy search results from other search engines (or sink a lot of cash into making it yourself)." - Why on earth would you buy other people's data, or expect millions of searches a month as a nobody that no-one's heard of?

    Honestly, if you were going to do this, really seriously, then I'd have had a SETI@Home-like software that people could (voluntarily) run on their desktops or servers, which would help index sites (at the very least URL 1 contains links to URLs 2, 3 and 4 and contains these keywords - that's a ton of processing and network bandwidth saved right there) and report back to a central server. That server would almost certainly be a Elastic Cloud or similar service so it could grow with demand and cost nothing when idle (and I suspect their servers were doing far more indexing than ever serving results). That would have to have a business model not "we'll stick adverts in and hope for the best" (especially when claiming to be privacy-conscious!). And then I would spend the rest of the time/effort/money getting the word out on geeky sites, into things like Linux and open-source community, try to get deals with TorBrowser and similar to just be a simple "other search engine" in some way.

    These people set up a miniscule search engine a year ago between three of them and expect to be inside Firefox and challenging the big boys with zero money... it's just laughable.

    Starting Score:    1  point
    Moderation   +1  
       Interesting=1, Total=1
    Extra 'Interesting' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   3  
  • (Score: 1, Insightful) by Anonymous Coward on Wednesday November 21 2018, @05:24PM

    by Anonymous Coward on Wednesday November 21 2018, @05:24PM (#764826)

    Your idea is interesting but could be used to manipulate results very easily. Client controlled is very tricky to get just right. Even things like SETI had cheaters. There is nothing to gain there but 'internet points'.

    You would have to be very careful to make it so your results would be verified in some way (double the crawling of each site and a verification). Using a cloud service is a decent idea. But you need to be careful early as the thing could spider quite quickly (meaning your costs are large as you point out). You also need to look out for SEO tricks. Such as having a 2 servers look like 100 servers serving 10 million pages that point to 1 site to make it look important. You need to watch out for malicious actors. They have honeypots that like to morass spiders. Because they have had issues with them in the past. Also that they lead off with censorship would not sit well with privacy conscious people either. It is a form of manipulation most people in that crowd will violently react to. Basically they missed their audience. It would be like having a store that sells the most amazing drinks in the world. But you refuse to make them. Everything else on your menu is pretty much the same as everyone else. You have nothing new to offer and people will shop with what they know. Poof you are out of business all because you for some reason decided you know what your customers want. Even though they came in every day saying 'hey make me this drink'. Now that is not always a good plan. But when your audience is saying "I want X" and you tell them "nope only have Y too bad" they *will* go elsewhere.

    But like I said they needed to 'hit the streets' as they call it in the advertising world. They needed to tell others that they even existed. Dropping a note on SD and hacker news once 2 years ago does not count. It means posting a lot about it. Blogging about it. What sort of challenges are you having? What sort of tech stack are you using and why? Are you building your own or just gluing something together? Getting your blogs picked up by the typical news aggregators. Tell the world why you are special. Tossing up a web page does not mean people know about you. You know about you, but no one else does. When you work for a largish company you usually do not have to worry about such things. But if you are a small company, you personally do or you hire someone to do just that.

    it's just laughable
    Their business plan was not great. But hopefully they 'fail upwards'. Meaning they learned what not to do and maybe some things to do. Most businesses fail. Use that for your next venture. Good luck!

  • (Score: 3, Insightful) by Pino P on Thursday November 22 2018, @01:45AM

    by Pino P (4721) on Thursday November 22 2018, @01:45AM (#765026) Journal

    "While people are starting to understand the importance of privacy it is a major hurdle to get them to select a different search engine." - sure, if I've never heard of you.

    If you were running that business, how would you have advertised?

    That's what graph analysis is for, you only need suck in the HTML and index it.

    I thought Google LLC still had the exclusive license to the PageRank patent.

    "Monetizing is nearly impossible because advertising networks want to know everything about the users, going against privacy concerns." - Advertising is just one way to monetise. If you were relying on it, you picked a really bad business model.

    If you were running that business, how would you have raised revenue instead?