Conferences
London, UK
02-03
July
Milan, Italy
03
July
Lago Maggiore, Italy
04-05
July
New York, USA
28
July
St. Petersburg, Russia
29-30
July
New York, USA
04-05
August
26
August
Ivisa, Spain
31-02
August -
September
Budapest, Hungary
01-04
September
Budapest, Hungary
03
September
Budapest, Hungary
04-05
September
Limassol, Cyprus
11-12
September
Marbella, Spain
06
October
London, UK
22-23
October
Tashkent, Uzbekistan
13-14
November
Мехико, МексикаMexico City, Mexico
22
November
Italy
23-26
November
Colombo, Sri Lanka
30-02
November -
December
Bangkok, Thailand
03
December
Bangkok, Thailand
04-05
December

Google data leak: how Google search rankings are structured


Google has leaked a Google API Content Warehouse document containing links to private repositories and internal pages on Google’s corporate website. The 2,500-page document reveals the mechanisms of Google’s search engine. SEO specialists are actively studying the obtained data.

Based on the information provided in the documentation, Google uses more than 14,000 features to rank sites. An important role is given to the use of clicks and post-click behavior, i.e. behavioral factors. According to SEO-specialists who have studied the new data, clicks, and site navigation are now much more important, while content and links are less significant.

Google confirmed the authenticity of the documents by commenting as follows:

“We would caution against making inaccurate assumptions about Search based on out-of-context, outdated, or incomplete information. We’ve shared extensive information about how Search works and the types of factors that our systems weigh, while also working to protect the integrity of our results from manipulation.”

Google search ranking algorithms

This Google data leak, along with other leaks and recent testimony in the U.S. Department of Justice antitrust case has shed light on many aspects of their ranking algorithms that call into question some of the company’s public statements. Here are a few key points that diverge from their statements about their ranking methods and are of great interest to SEO professionals:

  • User Behavior: An important ranking component – NavBoost uses click-based data to increase or decrease a site’s ranking. NavBoost analyzes user clicks based on search results, taking into account parameters such as goodClicks, badClicks, lastLongestClicks, and more. This allows Google to understand which search results are most satisfying to users and which pages should be raised in ranking. NavBoost also takes into account user behavior such as pogo-sticking (quickly returning to search results after clicking on a result that does not satisfy the user’s query). Click length (length of time spent on a page) is analyzed to help determine the usefulness and relevance of a page.
  • Use of Chrome browser data: The leak revealed that Google collects extensive data on user behavior that is used to evaluate pages and domains. For example, Google can use the number of clicks on pages in the Chrome browser to determine the most popular URLs on a site, which affects the creation of Sitelinks.
  • Site Whitelisting: Google has whitelisted sites for travel, COVID, and election related sites. This allows Google to control search results for counter-version or potentially problematic queries, ensuring that only verified and trustworthy sources are shown.
  • Domain Authority: Google has repeatedly stated that it does not use the Domain Authority metric in its algorithms. However, leaked data has shown that there is a siteAuthority metric that is used by Q* to assess site authority. This indicates that there is an internal equivalent to the Domain Authority metric.
  • “Sandbox”: Google claimed that there is no “sandbox” and new sites are not subject to special restrictions. However, the leak mentions the hostAge attribute, which is used in “sandbox fresh spam”. This confirms that Google does use some form of “sandboxing” for new or suspicious sites.
  • Data from EWOK: EWOK is Google’s internal platform used to assess search quality, where live people view search results pages and rate them based on several criteria such as relevance, usefulness, and trustworthiness of the source. Data from quality assessors can be used to directly influence page rankings.
  • Consideration of brand size: Popular and well-known brands are prioritized for ranking. Google uses various ways to identify and rank brands, including brand size, which is determined not only by the site itself but also by mentions of that site on the internet (even without links).

Additional important points

  • Date matters: Google actively associates dates with content using bylineDate (the specified date on the page), syntacticDate (extracted date from URL or title), and semanticDate (date derived from page content).
  • Original content and keywords: Short content is evaluated for originality and this affects its ranking. Page titles should be relevant to user queries, which remains an important factor.
  • Font Size matters: Google tracks the average weighted font size of terms in documents and links, which also affects ranking.
  • Homepage PageRank counts for all pages: Each document has its homepage PageRank. PageRank and siteAuthority are probably used as proxies for new pages until they have their own PageRank calculated for them.
  • Google may be purposely torching small sites down in the search: Google has a special flag indicating that a site is a “small personal site”. There is no definition of such sites, but Google can easily rank them up or down.
  • Indexing tier impacts link value: A metric called sourceType shows the relationship between where a page is indexed and how valuable it is. For reference, Google’s index is stratified into tiers where the most important, regularly updated, and accessible content is stored in flash memory. Less important content is stored on solid state drives, and irregularly updated content is stored on regular hard drives. That is, the higher the tier, the more valuable the link. Pages that are considered “fresh” are also considered high quality. This partially explains why getting rankings from highly ranking pages and news pages yields better ranking performance.

Demotion in Google’s ranking algorithms

Demotion is a decrease in the position of web pages in search results due to certain factors that negatively affect their quality or relevance. The leaked data revealed that Google uses many different algorithmic mechanisms to demote pages. Here are some of them:

  • Anchor Mismatch – when a link does not match the target site it links to, the site is demoted in ranking.
  • SERP Demotion – a signal indicating potential user dissatisfaction with a page, and likely measured by clicks.
  • Nav Demotion– This demotion applies to pages that demonstrate awkward navigation or poor user experience.
  • Exact Match Domains Demotion– a special feature to demote exact match domains (e.g. buy-cheap-shoes.com) if they don’t provide quality content.
  • Product Review Demotion – there is no specific information on this, but it is likely related to the recent product review update in 2023.
  • Location Demotion – there is an indication that “global” pages may be demoted. This indicates that Google is trying to associate pages with location and rank them accordingly.
  • Porn demotion – demotion for displaying pornography.
  • Other link demotions – demotions due to links

Ranking system Architecture

The functionality and interrelationship of different systems in Google by their internal names.

Crawling

  • Trawler is a web crawling system. It has a crawl queue, maintains crawl rates, and understands how often pages change.

Indexing

  • Alexandria is the main indexing system.
  • SegIndexer – a system that places documents by tier in the index.
  • TeraGoogle – a secondary indexing system for documents that are long stored on disk.

Rendering

  • HtmlrenderWebkitHeadless is a rendering system for JavaScript pages.

Processing

  • LinkExtractor – Extracts links from pages.
  • WebMirror – Canonicalization and duplication management system.

Ranking

  • Mustang – The primary system for scoring, ranking, and maintaining sites.
  • Ascorer – The basic ranking algorithm
  • NavBoost – a re-ranking system based on click logs of user behavior.
  • FreshnessTwiddler – A system for ranking documents based on their freshness.

Service

  • Google Web Server (GWS) is the server that Google’s frontend interacts with. It receives the payloads of data to display to the user.
  • SuperRoot is the brain of Google Search, which sends messages to Google’s servers and manages the post-processing system to re-rank and present results.
  • SnippetBrain – A system that generates snippets for search results.
  • Glue – A system for pulling together universal results based on user behavior.
  • Cookbook – A system for generating signals.

What are Twiddlers?

Twiddlers are re-ranking features that run after the main ranking algorithm, Ascorer. Twiddlers can adjust the document’s information retrieval score or change its ranking, as well as impose certain categorization restrictions.

Presumably, any of the functions with the Boost suffix work using the Twiddler framework. Here are some Boosts described in the documentation:

  • NavBoost
  • QualityBoost
  • RealTimeBoost
  • WebImageBoost 

The factors outlined above show exactly how Google ranks sites. It should be noted that the information can be updated with new data, as the documentation has recently appeared in the network and is being studied in detail by experts.

Около Арбитраж


Like it? Share with your friends!
0 Comments
Affiliate - Our assessment
Verticals
Min. sum
Site
VAVADA review
It’s an affiliate program of the eponymous online casino. A direct advertiser as of 2017. They work by RevShare (up to 50%) partnership strategy.
1,000 rubles/$15/€15 pay
фото
фото
фото
фото
фото
фото
фото
фото
фото
фото
фото
фото
фото
фото
It’s our affiliate program with gambling and betting offers. We are currently in beta but we accept traffic.
$20 pay
фото
764
Go to offers
V.Partners review
Direct advertiser of popular European casino and betting brands: Vulkan Vegas, ICE Casino, VERDE Casino, Vulkan Bet, and Hit'N'Spin. Established in 2016, they work on CPA, RevShare, and Hybrid models, accepting traffic from over 55 countries.
€100 pay
фото
фото
фото
фото
фото
фото