Stories
Slash Boxes
Comments

SoylentNews is people

posted by takyon on Wednesday November 21 2018, @04:00PM   Printer-friendly
from the found-and-lost dept.

The privacy-oriented search engine Findx has shut down: https://privacore.github.io/

The reasons cited are:

  • While people are starting to understand the importance of privacy it is a major hurdle to get them to select a different search engine.
  • Search engines eat resources like crazy, so operating costs are non-negligible.
  • Some sites (including e.g. github) use a whitelist in robots.txt, blocking new crawlers.
  • The amount of spam, link-farms, referrer-linking, etc. is beyond your worst nightmare.
  • Returning good results takes a long time to fine-tune.
  • Monetizing is nearly impossible because advertising networks want to know everything about the users, going against privacy concerns.
  • Buying search results from other search engines is impossible until you have least x million searches/month. Getting x million searches/month is impossible unless you buy search results from other search engines (or sink a lot of cash into making it yourself).

So what do you soylentils think can be done to increase privacy for ordinary users, search-engine-wise ?

Dislaimer: I worked at Findx.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Interesting) by isj on Thursday November 22 2018, @12:06AM (1 child)

    by isj (5249) on Thursday November 22 2018, @12:06AM (#764995) Homepage

    I'm just saying that matching them at [google's] own game is not trivial.

    I agree. That doesn't mean that there isn't room for improvement in google's results.

    Examples I can think of I encountered in my work at Findx:

    Bias toward shops
    If you search for a single word, eg "plasterboard", the google results will have a strong bias toward shops where you can buy them. No reviews. No building codes. No evaluation by consumer organisations. So if you search for a single word google thinks you want to buy stuff
    Still vulnerable to SEO
    An acquaintance noticed that google never showed links to where you could buy the cheapest plasterboards. So apparently the sites with SEO and link farms made it to page 1 every time, but the most useful link for the user was buried on page 3. There isn't much quality difference in plasterboards so wouldn't the cheapest be the "best" result?
    Handling of compound words
    I noticed that google's handling of compound words isn't that great. They claim they solved "the Swedish problem" (which is what they called the compound words challenge) in 2006. But I recently saw that a news paper's front page had a new compound word in an article link, and the article had the compound word in a different infliction. Google did have the main article crawled (verified with osearch for other unique words), but couldn't find it using the compound word. First after 3 days did it work. I'm not sure what is going on there, but I have a suspicion that analyzing compound words and generating inflictions is done offline and in batch, and there is some lag there. If you're curious then it was the Danish word "smølfedomptør"
    Old documents ignored?
    I noticed that findx could find an old usenet post that google couldn't find. It was a 10 year old post made available on a webpage. No clue why google didn't find it. So google apparently doesn't crawl everything, or they drop old documents
    Apparently doesn't use third-party quality indicators
    When looking to buy something google apparently doesn't use third-party quality seals/approval/badges (at least we couldn't find any indications that it does). Many countries have consumer organisations that provide badges to well-behaved webshops. That is a useful ranking parameter.

    One more note on compound words: If you want to handle Danish/Norwegian/German/Swedish/Icelandic/Finnish/Russian (and to some extend Italian) you have to deal with compound words. Findx solved it for Danish using a morphological dictionary (STO [cst.ku.dk]). I did some (incomplete) analysis of Danish webpages and it seemed that up to 10-30% of the unique words were compound words made-on-the-spot. So you can never have a complete dictionary for languages that easily form compounds, and you have to deal with them in some other way.

    Starting Score:    1  point
    Moderation   +1  
       Interesting=1, Total=1
    Extra 'Interesting' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   3  
  • (Score: 2) by bobthecimmerian on Sunday November 25 2018, @03:41PM

    by bobthecimmerian (6834) on Sunday November 25 2018, @03:41PM (#766174)

    Thanks for the detailed response. Everything you wrote makes sense. For what it's worth, I'm sorry FindX failed. I too was unaware of it, and I had tried Yacy and Searx and a few other options that have since disappeared.