Not surprisingly, website crawl errors such as Soft 404s often go unnoticed by many marketers in favour of less technical aspects of SEO. We often discuss topics such as the creation of quality content, and the importance of building authority in your website’s domain, while it’s all too easy for marketers to throw technical SEO issues back over the fence and onto the desk of their web developer.

 

However, while many web developers know exactly how to build an aesthetically pleasing website, they are often oblivious to how the site is served to web crawlers such as Googlebot (Google’s Web Crawler). This article will focus on one particular category of crawl error, one that, if left unresolved, can hugely reduce the amount of pages search engines such as Google crawl and index in their search results: ‘Soft 404’ Errors.

 

What’s a Soft 404 Error?

 

When starting a site with a new internal structure and URL structure, we use 301 redirects to show search engines that the page has lastingly moved to a fresh page. This page should be the “new version of the page”. Declining to do so will result in the common 404 error. Sometimes, even if you have 301 redirects in place, you will still get “Soft 404″ error. The best method for redirects can be summed up easily as follows:

“A page about cats should be redirected to a new page about cats.”

For small sites, it is not hard to set manual redirects in the htaccess. For large sites it needs thousands of redirecting pages, this can usually be done most efficiently with URL rewrite rules which match the old URL formation to the new formation as efficiently as possible. Seldom, these redirects may not match with 100% accuracy, and particular pages may be redirected to higher level broader categories. that is the page about cats gets redirected to the new page about animals for lack of a better page.

 

In most cases, this redirect will work as intended although it may not pass 100% of rankings to the target page. Sometimes, Google will not recognise the redirect because they consider it to be not relevant. This triggers a soft 404 error. This means that the former URL is not live AND the redirected page is not returning a 404 error.

 

To fix you may want to develop upon the logic used in the “URL rewrite rules” or in some states set manual redirects for individual pages where logic is unable to make the redirect.

Some common mistakes that cause of soft 404’s is redirecting hundreds of old pages to the homepage or similar high-level category within site. All of these early pages can’t possibly live on a single page. If there is no new target page on the same topic, then it is sufficient to let the page return a 404 error. An excellent overview of acceptable redirects can be found in Google Webmaster Guidelines.

The Problem with Soft 404 Errors.

If your website returns an HTTP status code other than 404 (or 410) for a non-existent page, it can negatively impact the website’s performance in organic search. Firstly, by failing to serve a 404 status code, your site is showing search engines that there’s a real page URL they’re trying to access. As a result, the URL you’ve deleted (with no content) will be crawled and indexed, thus wasting precious crawl budget.

Crawl budget is the idea that Google only allocates a specified period for crawling a site before it ends the process and moves on to another site. Google doesn’t want to waste endless time crawling content on the same website, so it makes sense for them to allow a time limit to their web crawls before moving on to another site.

Sticking with the idea of crawl funds, if a site has a massive proportion of Soft 404 errors, then those pages will be crawled. The method of crawling these non-existent pages will always take up excessive amounts of the crawl budget allotted to the website. Because of the time Googlebot uses crawling Soft 404s, your different URLs may consequently not be discovered as quickly or crawled as often – thus reducing the visibility of the relevant content on your site. It should, therefore, appear as no wonder that when Soft 404 errors are fixed, the appearance of a website in organic search outcomes tends to improve.

To describe how you’d evaluate the extent of a Soft 404 issue, let’s take a look at an example of a site that is displaying some Soft 404 errors in Google Webmaster Tools. In the case below, we see more than 439 Soft 404 errors being reported for the website in the inquiry. This may well set alarm gongs ringing, but we first need to place that figure in the right context.

 

To do this, you’ll want to tell how many pages the website has that you want Google to crawl and index. We’d take a peek at the XML sitemap for the site in question – which is a crucial indicator of how many pages a website has.
We can see that this website has around 4,200 pages, and the 439 Soft 404 errors now start to look a little less dangerous. Still, at over 10% of the site’s total pages, the 439 Soft 404 errors will be wasting a considerable part of the crawl budget assigned to this website. In this case, Google will be spending too much time crawling URLs that simply don’t exist.
How Do I Resolve These Issues?
Google only lets you transport a maximum of 1000 URLs to Webmaster Tools. In the example above, there are under 1,000 errors being recorded, so these can be downloaded directly from Google Webmaster Tools. Once you’ve exported it, you’ll need to assess why those pages are reported as Soft 404s. Google provides somewhat limited information on the URLs they highlight as “Soft 404s”
In most cases, you will find that a website will be serving a 200 (OK) status code on pages that turn a (page not found message). Therefore, the first you should do is run a selection of the Soft 404 error pages through an HTTP status code checker such as httpstatus.io, to assess which status codes those pages are returning.
Example-domain below was displaying a 404 page to the user trying to access it, but when we checked the response code using an HTTP status code checker, it returned an HTTP 200 response. This is an excellent example of a Soft 404 error, as the HTTP response code is showing to search engine robots that the page exists and should be crawled. However, there is no content on the page that’s returned by the server.

The other issue we encounter when diagnosing the cause of Soft 404 errors is improper 301/302 redirects. Some webmasters want to redirect all deleted pages to the website’s homepage rather than serving a 404 error, which is not at all relevant and will disturb and annoy search engine robots. The primary thing to identify here is that deleted pages should only be redirected to a direct replacement –if a direct replacement doesn’t be then you should serve a custom 404 error page to present alternative options or products to the user.

I have highlighted an example of inappropriate redirects triggering Soft 404 Errors below.The webmaster is using 302  to redirect anyone who is trying to access a page that’s been deleted, and redirecting these users to a custom 404-page one which doesn’t serve an HTTP 404 status code.

 

 This will hugely affect how search engines crawl the site in question, as search engines are being told to look elsewhere for pages that have been deleted, via a 302 redirect. If a search engine robot follows those instructions, they will eventually be served an HTTP 200 (OK) status code for a page that represents a 404 error message, which is a whole ‘other level of bad practice.

You should nevermore use redirects to serve a 404 error page. Instead, help an HTTP 404 response code when any pages you delete or remove from your website are requested. This will prevent your site triggering a huge number of Soft 404 Errors, and will secure search engines just crawl and index the pages you want to rank.