Stories
Slash Boxes
Comments

SoylentNews is people

posted by jelizondo on Friday April 03, @04:43PM   Printer-friendly

40 Google features to find exactly what you need, the alternative search engines that do things Google won't, and the reference desk framework underneath all of it:

Most of us search Google the same way we always have: type a few words, scroll, click something that looks close enough, and hope. For a while, that worked. Google handed us a list of links and let us take it from there.

What's happening now is something different. A 2024 study by SparkToro found that nearly 60% of Google searches end without anyone clicking through to a website, and the trend has accelerated since. By February 2026, Ahrefs found that queries triggering AI Overviews now see a 58% reduction in clicks. Google has been systematically inserting itself between you and the original source, answering questions with AI-generated summaries before you ever reach the page those answers came from. The results you do see are filtered through an algorithm that weighs your search history, your location, and the billions of dollars advertisers have spent to appear for particular queries. Two people searching identical phrases on the same day can get meaningfully different results without either of them knowing it. And because Google controls roughly 90% of the world's search traffic, most people have no frame of reference for what a less mediated search experience would even look like.

The search bar replaced the reference desk without replacing the skills behind it: knowing how to ask a question precisely, understanding how information is organized and who funds it, knowing the difference between a primary source and a summary of one. The assumption was that the technology made all of that unnecessary, which suited Google; a user who can't navigate information independently is a user who keeps coming back to be guided.

The search bar you already have is more capable than that arrangement requires you to know. With the right syntax, it becomes a precision instrument: narrow by domain, by date, by file type, by exact phrase. We can pull up archived pages, surface open file directories, and even find what people said in forums instead of what brands want us to find. None of it requires a new tool or a paid account. The capability has been there the whole time.

Google is constantly interpreting you. It swaps in synonyms, personalizes results based on your history, and decides what you probably meant rather than returning what you typed. Most of the time that interpretation is invisible. These tools are how you override it.

Anybody have any tips or pointers to add to this?


Original Submission

This discussion was created by jelizondo (653) for logged-in users only. Log in and try again!
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 5, Insightful) by mcgrew on Friday April 03, @05:19PM (7 children)

    by mcgrew (701) <publish@mcgrewbooks.com> on Friday April 03, @05:19PM (#1438828) Homepage Journal

    Not only the minus operator that weeded out hits with the - in front of it, there were a lot of other very useful search tools that have been missing for a very long time.

    Google's heyday was 2005. They have been less and less useable ever since. If I was forty again I'd write a new search engine.

    --
    Are the Republicans really in favor of genocide, or are they just cowards terrified of terrorist twit Trump?
    • (Score: 5, Informative) by corey on Friday April 03, @11:32PM (2 children)

      by corey (2202) on Friday April 03, @11:32PM (#1438848)

      I use Startpage (https://www.startpage.com), and much prefer it than Google and DuckDuckGo. It also listens when you use the - operator, and quotes around things. I find DuckDuckGo tends to go “yeah I reckon I know what you are asking for”, and spits back results that are all the same. Like there’s a layer of interpretation there before it searches the website database for keywords. Whereas Startpage tends to give me the raw results without the it-thinks-it-knows-what-I’m-talking-about crap.

      • (Score: 2) by Bentonite on Saturday April 04, @03:09AM

        by Bentonite (56146) on Saturday April 04, @03:09AM (#1438858)

        Startpage is a frontend for google - it's mostly okay when it isn't demanding JavaScript execution.

        It seems the least sucky search engine frontend is now ironically; https://4get.ca/ [4get.ca] as everything works without JavaScript.

        Now only if the captcha didn't have a ⅓ chance of asking the user to seriously select pictures of proprietary software.

      • (Score: 2) by driverless on Saturday April 04, @07:57AM

        by driverless (4770) on Saturday April 04, @07:57AM (#1438872)

        Same here. And for the OP's:

        Anybody have any tips or pointers to add to this?

        "Don't use Google" would be a start. In that regard StartPage is a disenshittified Google, no AI slop, no product placement, no ads, just search results.

    • (Score: 5, Interesting) by Bentonite on Saturday April 04, @03:04AM (3 children)

      by Bentonite (56146) on Saturday April 04, @03:04AM (#1438855)

      Writing a search engine isn't even required anymore - any decent database contains text search functions.

      The issue with hosting a search engine is the crawling - there are billions of websites now and many sites will block even well-behaved crawlers and many sites don't have any text on the page without JavaScript execution (which means that if you don't have an entire datacentre full of servers, you won't have the processing resources to index those websites - but it seems the proper solution is not to index those websites).

      • (Score: 4, Interesting) by Unixnut on Saturday April 04, @11:20AM

        by Unixnut (5779) on Saturday April 04, @11:20AM (#1438878)

        but it seems the proper solution is not to index those websites

        Agreed, I'd prefer a search engine that only indexed websites that either didn't use javascript, or were designed properly to degrade gracefully when JS is not available. Avoiding JS will filter out a large chunk of the slop that ruins the web currently, not to mention the reduced requirements of if your web crawler as you mentioned.

      • (Score: 2) by mcgrew on Sunday April 05, @04:48PM (1 child)

        by mcgrew (701) <publish@mcgrewbooks.com> on Sunday April 05, @04:48PM (#1438969) Homepage Journal

        The spider is the honest webmaster's friend. Putting up a robots file on a page you want seen is stupid, as stupid as only allowing javascript. Any coder worth his salt wouldn't have a site like that, and no one should want to visit one.

        There are, these days, billions of sites, as you say. Most of them are not worth visiting.

        --
        Are the Republicans really in favor of genocide, or are they just cowards terrified of terrorist twit Trump?
        • (Score: 2) by Bentonite on Monday April 06, @05:30AM

          by Bentonite (56146) on Monday April 06, @05:30AM (#1439037)

          A robots.txt file is still useful and not stupid on a website you want to be seen, as it allows specifying pages useful and not useful to crawl and a crawl delay can be specified.

          In robots.txt you would disallow pages with dynamically generated information of only temporary use (like the IP address the page was accessed from), folders like cgi-bin (it just wastes processing power for a spider to be calling cgi programs without arguments), a folder that ip blocks scrapers when those scrape it and the folder full of GNUzip bombs.

          Yes, robots.txt allows implementing limited LLM scraper mitigations (those ignore robots.txt), while such mitigations are excluded in robots.txt and therefore legitimate spiders continue to crawl the website without trouble.

          `Crawl-delay: ` is also useful for websites hosted by computers with limited resources and/or a limited connection, where fetching >10 pages/second would stop the site from working correctly.

          Note that some spiders like google's ignore crawl-delay (which is known for randomly aggressively fetching >10 pages/second from websites).

  • (Score: 5, Insightful) by hendrikboom on Friday April 03, @05:31PM (1 child)

    by hendrikboom (1125) on Friday April 03, @05:31PM (#1438830) Homepage Journal

    When do I not click through to a website? When *none* of the sites Google sends me to answer my query. In this case the AI summary usually doesn't either.

    • (Score: 2) by VLM on Saturday April 04, @04:07PM

      by VLM (445) on Saturday April 04, @04:07PM (#1438890)

      True but I bet the followup search stats are FAR more interesting.

      This morning I searched something like "K8S TCP default keepalive" and didn't click a single returned result but I see there are three super specific options I can choose from that go in a securityContext and eventually searching on them you end up clicking on

      https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/ [kubernetes.io]

      And there are exactly one zillion possible sysctls you can set in K8S, although many of them require weird enough combos of K8S version and linux kernel version and sometimes have odd security requirements.

      Google being a time waster rather than a time saver, this leads down rabbit holes that are not useful. Its interesting that you've only been able to set net.ipv4.tcp_rmem and net.ipv4.tcp_wmem for about a year in a K8S cluster. I would guess you'd usually want small windows for latency reasons and less wasted memory ("buffer bloat elimination"), but this is approaching the limits of stuff I'd want to get involved with and has nothing to do with the timeout issue I was looking into this morning. Which ended up being not network related anyway, it just looked that way at first glance (actually was IO limited LOL and isn't even a 'real' problem)

  • (Score: 5, Interesting) by Snotnose on Friday April 03, @11:07PM

    by Snotnose (1623) Subscriber Badge on Friday April 03, @11:07PM (#1438847)

    Everything TFA mentions was available 25 years ago. In fact, there was an O'Reilly book that told you about this stuff. For whatever reason Google disabled a lot of the functionality 10-15 years ago. Has Google re-enabled all these logical operators?

    --
    Trump's Grave will be the world's most popular open air toilet.
  • (Score: 3, Informative) by namefags_are_jerks on Saturday April 04, @03:37AM

    by namefags_are_jerks (17638) on Saturday April 04, @03:37AM (#1438860)

    Articles that faff on for 400 words and still give no information it claims it'll be divulging do worse than 40% at getting a click.

  • (Score: 5, Informative) by Mojibake Tengu on Saturday April 04, @06:04AM

    by Mojibake Tengu (8598) on Saturday April 04, @06:04AM (#1438868) Journal

    For ordinary people, clicking through Google search to websites also means get annoying consent popups instead of immediate content, then get flooded by Google ads. So demotivating.
    Not mentioning offensive humanity checks by clouds.

    World Wide Web is disgusting, near to useless now.
    A failed case of originally perspective technology, totally screwed by corporations on both client and server side.

    --
    Rust programming language offends both my Intelligence and my Spirit.
(1)