Google Search Document Leak Reveals Inner Workings of Ranking Algorithm

posted by hubie on Friday May 31, @09:41PM
News

An Anonymous Coward writes:

SEO Situation on fire: google's weighting parameter list leaked

https://searchengineland.com/google-search-document-leak-ranking-442617
https://github.com/yoshi-code-bot/elixir-google-api/commit/d7a637f4391b2174a2cf43ee11e6577a204a161e

Highlights:

  • Change history: Google apparently keeps a copy of every version of every page it has ever indexed. Meaning, Google can "remember" every change ever made to a page. However, Google only uses the last 20 changes of a URL when analyzing links.

  • Google stores author information associated with content and tries to determine whether an entity is the author of the document

  • Google measures the average weighted font size of terms in documents (avgTermWeight) and anchor text.

  • it's likely the internal documents were accidentally included in a code review and pushed live from Google's internal code base, where they were then discovered

See Also: https://gizmodo.com/google-search-seo-leak-gatekeeps-internet-1851508410

  • (Score: 2) by SomeRandomGeek on Friday May 31, @10:13PM

    by SomeRandomGeek (856) on Friday May 31, @10:13PM (#1358905)

    So, we have a list of 14,000 variables that are used in page ranking, but not how those variables are weighted. As a software developer myself, I immediately suspect that the vast majority of those factors aren't actually used, or are used in highly situational cases. They probably built algorithms to score a page on all these dimensions, and then chose a few dimensions that are actually useful in providing a good result. But did they rip out all of the other dimensions? No, of course not. Better to leave them in, because you never know what might be helpful in conjuction with some new dimension that you are building next year.
    So, when the article says that Google is considering something, they have no idea. They know that Google has the capability to rank searches on that something. But maybe they don't actually use it, for any of a hundred reasons.

  • (Score: 2) by Rosco P. Coltrane on Friday May 31, @10:34PM

    by Rosco P. Coltrane (4757) on Friday May 31, @10:34PM (#1358908)

    But it's kind of like anything that would be Earth-shattering to data brokers, marketers, advertisers or lawyers: I know I should care because it impacts me and it's almost certainly going to be bad news for decent people with a decent way of earning a living, but reading about them never fails to brings back a slight feeling of nausea.

