If your site has pages that can’t be crawled by search engines, your website may not be indexed correctly, if at all. If your website does not appear in the index, it cannot be found by users.
Ensuring that search bots can crawl your website and collect data from it correctly means search engines can accurately place your site on the SERPs and you can rank for those all-important keywords.
There are a few things you need to consider when looking for crawlability issues:
Indexation errors
Robots.txt errors
Sitemap issues
Optimizing the crawl budget
Identifying indexation issues
Priority: High
Ensuring your pages are indexed is imperative if you want to appear anywhere on Google.
The simplest way to check how your site is indexed is by heading to Google Search Console and checking the Coverage report. Here, you can see exactly which pages are indexed, which pages have warnings, as well as which ones are excluded and why:
Coverage report in Google Search Console
Note that pages will only appear in the search results if they are indexed without any issues.
If your pages are not being indexed, there are a number of issues that may be causing this. We will take a look at the top few below, but you can also check our other guide for a more in-depth walkthrough.
Checking the robots.txt file
Priority: High
The robots.txt file is arguably the most straightforward file on your website. But it is something that people consistently get wrong. Although you may advise search engines on how to crawl your site, it is easy to make errors.
Most search engines, especially Google, like to abide by the rules you set out in the robots.txt file. So if you tell a search engine not to crawl and/or index certain URLs or even your entire site by accident, that’s what will happen.
This is what the robots.txt file, which tells search engines not to crawl any pages, looks like:
Disallowing search engines via robots.txt
Often, these instructions are left within the file even after the site goes live, preventing the site from being crawled. This is a rare easy fix that acts as a panacea to your SEO.
You can also check whether a single page is accessible and indexed by typing the URL into the Google Search Console search bar. If it’s not indexed yet and it’s accessible, you can “Request Indexing.”
Requesting indexing in Google Search Console
The Coverage report in Google Search whatsapp number list Console can also let you know if you’re blocking certain pages in robots.txt despite them being indexed:
recommended name is what makes affiliate marketing such a powerful pillar of digital marketing.
Pages blocked via robots.txt in Google Search Console
Recommended reading: Robots.txt and SEO: Everything You Need to Know
Robots meta tags
Priority: High
A robots meta tag is an HTML snippet that tells search engines how to crawl or index a certain page. It’s placed into the <head> section of a webpage and looks like this:
<meta name="robots" content="noindex" />
This noindex is the most common one. And as you’ve guessed, it tells search engines not to index the page. We also often see the following robots meta tag on pages across whole websites:
<meta name="robots" content=”max-snippet:-1, max-image-preview:large, max-video-preview:-1" />
This tells Google to use any of your content freely on its SERPs. The Yoast SEO plugin for WordPress adds this by default unless you add noindex or nosnippet directives.