Canonicals
Canonicals
Table of Contents
- What Is a Canonical URL? - 1 min
- What Is rel=”canonical”? - 1 min
- How Does Google Choose A Canonical URL? - 1 min
- Why Are Canonical Tags Important? - 3 mins
- How Are Canonical Tags Different To A Redirect? - 2 mins
- Canonical Best Practices - 5 mins
- How Do You Know Which URL Google Considers The Canonical? - 1 min
- How Can You Analyse Canonicals? - 2 mins
- Final Thoughts - 1 min
A rel=”canonical” can be used to help reduce duplicate content, and provide a hint to the search engines about which version of URL you want in the search results.
In this guide we take you through the fundamentals, how you should use canonicals, as well as best practices and more advanced uses.
What Is a Canonical URL?
Google and other search engines try to index a single version of a page.
If a page can be found under multiple URLs, they will pick one to index and rank in the search results. This chosen primary version of the page is known as the ‘canonical’ URL.
Google attempts to choose the single most representative URL from a set of duplicate or similar URLs to be the canonical URL based upon a variety of signals.
What Is rel=”canonical”?
A rel=”canonical” link element (also known as a ‘canonical tag’) can be used in the HTML or HTTP header to help indicate which URL is the preferred version to the search engines.
So, if there are duplicate or similar pages, you can include a canonical tag on it and this Google the main version to index. It’s not used as a directive by Google, but as a hint to which version that should be used in the search results.
You can add an HTML link tag or rel=canonical HTTP header in your page response for all duplicate pages, pointing to the canonical page.
Code Example
A canonical link element should be placed in the HTML head of the document:
<link rel="canonical" href="https://www.screamingfrog.co.uk/learn-seo/canonicals/" />
Or a rel=”canonical” can also be placed in the HTTP Header and looks like this:
Link: <https://www.screamingfrog.co.uk/learn-seo/canonicals/>; rel="canonical"
How Does Google Choose a Canonical URL?
If Google discovers the same page under multiple URLs, it will pick a single canonical URL to index.
There are a number of ways you can indicate a preferred canonical URL:
- Canonical Link Elements – URLs specified in rel=canonical via link elements in the HTML or HTTP header are used as a hint.
- XML Sitemap – URLs included in XML Sitemaps are seen as another hint that they are more likely to be canonical versions than those not included.
- 301 Redirects – 301 redirects inform Google that they should pass indexing a link signals to the new URL, and are a much stronger signal.
Additionally, Google will use other signals to pick a canonical, such as PageRank, page quality, whether the page is served over HTTPS or HTTP and more.
Often a combination of canonicals, XML Sitemaps and redirects will be used by a website. They will ideally be consistent in specifying the canonical version of a URL.
Why Are Canonical Tags Important?
Canonicals help reduce duplicate content and ensure the correct URLs appear in the search results for users.
You might be thinking that you don’t have any duplicate pages, so why are they needed?
You can’t always control how people might link to a website such as with tracking or social parameters and IDs added.
Websites also often contain duplicate and similar pages. These can come from faceted URLs with parameters for sorting, ordering and filtering, products appearing in multiple categories or URLs resolving under different versions, such as with or without a trailing slash to name but a few.
Additionally, there are valid reasons sometimes to have multiple versions of a page, such as separate AMP URLs, or a mobile website.
Canonicals can act as a safeguard to the unpredictability of how URLs can be found, as well as helping in scenarios where there are genuine multiple versions of the same page.
There are three key reasons why canonicals are crucial:
1) They Help the Correct URLs Show in the Search Results
If you don’t specify a canonical, Google will choose one for you, which can lead to unwanted behaviour. Setting a canonical allows you to tell Google which version you want to show to users.
2) Reduce Duplicate Content & Consolidate Link Signals
Multiple similar versions of a URL can mean indexing and link signals are split between them. Canonicals provide a way to consolidate them to a single preferred URL. This can help in ranking and reduce unpredictability.
3) Improve Crawl Prioritisation
Google will spend less time crawling pages which are not canonical versions. This helps them focus crawl budget on existing or new canonical pages.
How Are Canonical Tags Different to a Redirect?
Redirects move users to a different URL in a browser, which a canonical doesn’t do. A canonical is something that only search bots see.
A 301 redirect can pass indexing and link signals to the target URL, in a similar way to a canonical.
However, a 301 redirect is a much stronger signal than a canonical, which is a hint. Google recommend using 301 redirects for when you want to get rid of existing duplicate pages that are no longer required. For example, redirects are better to use for –
- Setting the scheme to HTTPS from HTTP
- Setting URL versions, such as redirecting uppercase characters to lowercase, or trailing slash consistency
- Changing domains
- Changing URLs or pages
Ideally, canonicals should only be used to pass indexing signals to another page when the pages need to exist, or redirects can’t be used. They should also be used as a safeguard to set the correct canonical URL across all pages.
Canonical Best Practices
Implementing canonicals is generally quite simple, and should ideally obey the following best practices:
Canonicals Should Contain Canonical URL Versions
This seems obvious, but canonicals should contain the canonical URL versions. This means canonicals should be crawlable and indexable, and not be blocked by robots.txt, redirect, go to an error page, or be canonicalised to another URL.
Every page in a canonical should have a 200 response and be the primary canonical URL version.
Every Page Should Have a Canonical Set
Every page should have a canonical link element or HTTP header with a preferred canonical set.
Canonical Pages Should Have Self-Referencing Canonicals
A canonical page should have self-referencing canonicals. This means if tracking parameters are added, or the URL is available under different versions, then the primary version will be indexed.
Canonicalise Duplicate Pages When They Have To Exist
Don’t use canonicals to remove all duplicate or near duplicate content. If a duplicate page doesn’t need to exist, then remove it, update any internal linking to the canonical, and 301 redirect the non-canonical version to the canonical URL instead.
This is a much stronger signal to the search engines.
If pages need to exist, then using a canonical to consolidate the page is a viable option. If the pages do not need to exist, then canonicals should not be used to correct the problem.
Avoid Multiple Canonicals or Mixing Implementations
You shouldn’t have more than one canonical set on a page, and avoid having two implementations such as a canonical link element and canonical HTTP header.
Use Absolute Rather Than Relative Paths
While using relative URLs in canonicals is valid, Google recommend using absolute rather than relative paths in canonicals to avoid any mistakes.
Avoid Canonicalising Paginated Pages
Paginated pages which contain additional content, such as products or articles, should not be canonicalised to the first page in the series. They should have self-referencing canonicals, as they are unique pages, rather than duplicates.
Setting their canonical to the first page in the series could impact Google passing PageRank to links from the paginated pages as they stop crawling them.
Internally Link to Canonical URL Versions
In a perfect world, you should only link internally to canonical versions of URLs across a website, rather than duplicate versions to avoid mixed signals.
Ensure Canonicals Are Accurate in the Raw HTML
Google prefer canonicals to be specified in the raw HTML, rather than in the rendered HTML post JavaScript execution. While Google renders virtually every page it indexes, it might not always see the canonical supplied in the rendered version.
Don’t Use Fragment URLs
Don’t specify a fragment URL as canonical, as Google generally doesn’t support fragment URLs. Annotations that include a fragment are actually ignored completely by Google.
Use Separate Link Annotations for Hreflang, Lang, Media, & Type Attributes
rel=”canonical” annotations that suggest alternate versions of a page are ignored. Specifically, annotations with hreflang, lang, media, and type attributes are not used for canonicalization by Google. So use separate link elements for these if required.
Include Canonical URLs in XML Sitemaps
Google consider URLs in XML Sitemaps as a hint to canonical URL versions. So ensure only canonical URL versions are included.
Specify a Canonical When Using Hreflang
Google recommends setting a canonical page when there are multiple pages in the same language. Hreflang doesn’t stop URLs from being folded together and canonicalised, it simply helps show the right URL to the right audience under search.
Use rel=”alternate” & rel=”amphtml” To Indicate Mobile & AMP Equivalents
If the canonical page has a mobile or AMP variant, add a rel=”alternate” or rel=”amphtml pointing to the mobile or AMP version of the page respectively.
How Do You Know Which URL Google Considers the Canonical?
Use the URL Inspection tool in Google Search Console to understand which page Google considers to be the canonical version.
Remember, canonicals are a hint rather than a directive. Google may decide to pick another URL as the canonical.
In the screenshot above, you can see both the user-declared canonical for our SEO Spider page, and the Google selected canonical (which are the same).
If you don’t have access to Google Search Console, you can also run searches in Google for the URL (using site: queries) to see which version Google returns under search. This can be a little less reliable.
How Can You Analyse Canonicals?
To analyse canonicals for the best practices listed above, you can use a variety of tools.
Browsers
You can right click and ‘view page source’ in Chrome to see the original HTML of the page and view a canonical link element.
You can right click and ‘inspect element’ to see the canonical within the rendered HTML. If your website is reliant on JavaScript, ideally this would match the canonical in the original HTML.
If the website uses rel=”canonical” in the HTTP header, then you can right click ‘inspect’ in Chrome to bring up Dev Tools. You can then click on the ‘Network’ tab on the URL in question and view the ‘Headers’ issued.
SEO Spider
Our Screaming Frog SEO Spider software can help you view and analyse your canonical tags and HTTP headers in bulk. Just download the tool, and crawl 500 URLs for free (or more with a licence) and click on the ‘Canonicals’ tab.
This will display all pages and their specified canonicals, which can be reviewed. It automatically flags any pages which have a different canonical URL set to the page (‘canonicalised’ URLs), or that are missing a specified canonical, have multiple canonicals set, or a canonical that is non-indexable.
Please read our tutorial on How to Audit Canonicals using The SEO Spider.
Final Thoughts
Specifying canonicals can be crucial for Google and other search engines ranking the correct URLs in the search results and reducing unpredictability.
Keeping to the best practices outlined above should help both search engines and users see the right pages.