Why Duplicate Content is Bad for SEO Health
Search engines have a limited amount of resources with which to crawl and understand a website. So when a bot, like Googlebot, is crawling a website to see what the pages are all about, the bot may choose not to waste their crawl budget crawling or indexing pages that look like multiple versions of the same page.
Search Engine Journal lists duplicate content as a factor that can “significantly reduce your site’s crawling potential.” When describing best practices for managing your URL inventory, Google explains that eliminating duplicate content allows bots to “focus crawling on unique content rather than unique URLs.” If search engines aren’t crawling, indexing, or showing your pages in the search results, this will inevitably affect your website’s ability to drive traffic.
You might be thinking, “we have never just duplicated a page, so we probably don’t have duplicate content.” However, duplicate content includes more than exact duplicates of pages. Google describes duplicate content as “substantive blocks of content” that are “appreciably similar.” This means that two or more pieces of content with a lot of differences between them can still be considered duplicate content, which can affect those pages’ ability to appear in search and drive traffic to the site.
Pruning out duplicate content is a pretty common SEO task, and there are lots of ways to do it. When we have come across duplicate content at UpBuild, this is the process my team uses to deal with it.
Step 1 – Gather Your Content.
Tool to use: Screaming Frog.
In this step, you extract the duplicate content from your site and place them into a Google Sheet.
Essentially, you are organizing your site. So who better to turn to than Marie Kondo for advice on how to begin? We have already decided to focus on reducing duplicate content. From here, Marie would recommend laying everything out in front of you.
Fire up Screaming Frog and open a new Excel or Google sheet.
- Gathering all of the content takes just a few steps, outlined in this Screaming Frog Guide to checking for duplicate content, which will show you how to configure your crawl and filter your data.
- Now, export it and drop it into that Excel or Google sheet.
- Add a column where you can make decisions. Create a drop-down menu for these cells with the following list: Consolidation Page, Consolidate Into, 301 Redirect, 410 Status Code, Keep, or Use a Canonical Tag (more on what these mean in Step 3).
Step 2 – Pull in Relevant Data.
Tool to use: Moz’s Link Explorer, Google Search Console, and Google Analytics
Take a look at the quality and quantity of inbound links and evaluate that content’s value for your users.
We want this data to provide insight into each page’s value to help us decide what decision to make for each page. Here are some aspects to look at when assessing the value of a page:
- Inbound Links: Using a tool like the Moz Link Explorer, you can look at how many inbound links there are to a given URL. After clicking for more information, you can see some inbound link metrics that provide insight into the value added to that page.
- Google Search Console Data: It is always helpful to look at Google Search Console data so you can see how many impressions and clicks a URL is getting from search.
- Google Analytics Data: Pull in data from Google Analytics (or your analytics tool of choice) to see what kind of traffic the page is getting — from organic search as well as other channels — and how well that traffic converts.
- The page’s content: Sometimes, for a variety of reasons, a page may not be performing well but still has valuable information that could answer users’ questions. In the case of duplicate content, it makes sense that this content might not be performing as well. Make sure to check out the content of each page before making decisions.
To save time leveraging all of the data above, we suggest exporting the metrics and importing them into your sheet next to their respective URLs using a formula like a VLOOKUP function.
Step 3 – Analyze and Decide.
Tool to use: Patience.
Decide whether you can consolidate a page with another page, 301 redirect the page to a more relevant page with higher value, or send a 410 gone response code. You can also keep the page as the canonical version of that content, or rewrite the content to make it distinct and unique from other content on your site.
It will be a tedious process depending on how much duplicate content you have, but if it is a long process, that probably means that you have a lot of duplicate content that is negatively affecting your site’s health, making it a very valuable process. We focused on a few routes you can take when deciding what to do with a single URL. Here is an explanation of each decision option using a made-up website (snakesarecool.com):
I have selected this option for the Snake Breeds Page because both this page and its near-duplicate, the Snake Types page, have some unique and valuable content that would be richer if consolidated into a single page.. Using Moz Link Explorer to look at the URLs’ inbound links, I found that /snake-breeds has more high-quality inbound links than /snake-types, so I will make /snake-breeds the consolidation page.
I have selected this option for Snake Types Page because a similar page, the Snake Breeds Page, has a higher value than this page. I will be taking the relevant unique content from this page and adding it into the higher value page.
I have selected this option for snakesarecool.com/blog/anaheim-reptile-expo-2013 because this page does not have a lot of content, and the information is outdated. The URL I will redirect to is snakesarecool.com/blog/anaheim-reptile-expo. This page contains rich evergreen content and has 15 inbound links from other reptile blogs and companies with relatively high domain authority, as well as being topically similar to the Reptile Expo 2013 page.
Note: The page you choose to redirect to must fulfill a similar need as the original page. You would not want to link from /blog/anaheim-reptile-expo-2013 to /blog/snake-breeds. If a user was on a Top 10 Places to Get Reptiles blog and clicked on a link in the blog to /blog/anaheim-reptile-expo-2013 it took them to /blog/snake-breeds, the user would feel misled. It is also tempting to redirect to the homepage since it seems daunting to ultimately set a page as gone, but we advise that you should not blanket-redirect URLs to your homepage.
410 Status Code:
I have selected this option for snakesarecool.com/blog/2014-vegas-meet-up-times because this page does not have a lot of rich information and is outdated, there is no similar content to redirect it to, and it covers a topic that’s no longer relevant to the site or its users. This status code tells Google that this page is gone and not to come back to see if there are changes.
Keep the Page:
I have selected this option for Top 5 Large Snakes That Don’t Bite because the similar content on other pages can be rewritten to be more unique. In this case, I will keep the Top 5 Large Snakes That Don’t Bite page as-is and rewrite the other, similar pages on the topic of large snakes as pets to be more unique. You can search for your topic (like large snakes as pets) in Google and scan the search results, look at features like People Also Ask and Related Searches section at the bottom. You can use this information to decide how to write the content to distinguish this content from other content on your site.
Use a Canonical Tag:
I have selected this option for Where to Buy Snake Food Directory pages that are duplicate content because I want to keep all of these pages. I am going to set a canonical tag for Where to Buy Snake Food Directory page and the URLs with similar content that I want to keep.
Step 4 Take Action & Assign Tasks.
Tools to use: Teamwork and developer resources.
Assign tasks to the appropriate team or team members. Ensure that any content consolidated into another page is also assigned a 301 redirect to the consolidation page.
To help organize assignments, consider adding additional columns for the names or organizations responsible for each task, a task complete text box column, and a progress notes column.
Completing the 301 redirects and 410 status codes is a straightforward task for your development team. When it comes to consolidating two pieces of content, here is the process that we followed:
Take each piece of content and outline the bones to get an idea of where the consolidation into URL can fit into the consolidation page. The easiest way to do the outline is to take the headings, but you can also include other elements, text blocks, buttons. Using our snakesarecool.com example, here is how that would look:
|<h1> Popular Snake Breeds </h1>|
<h2> Corn Snake </h2><h3> Corn Snake Diet </h3>
<h2> Ball Python </h2> <h3> Ball Python Diet </h3>
<h2> Rainbow Boa </h2> <h3> Rainbow Boa Diet</h3><h2> Milk Snake </h2><h3> Milk Snake Diet </h3>
|<h1> Common Types of Snakes and How to Care for Them </h1>|
<h2> Ball Python </h2><h3> Ball Python Diet </h3><h3> Best Substrate for Ball Pythons </h3>
<h2>Rosy Boa</h2><h3> Rosy Boa Diet </h3><h3> Best Substrate for Rosy Boas </h3><h2>Gopher Snake </h2><h3> Gopher Snake Diet </h3><h3> Best Substrate for Gopher Snakes </h3>
<h2> Corn Snake </h2><h3> Corn Snake Diet </h3><h3> Best Substrate for Corn Snakes </h3>
Once you have the bones of each one laid out, it is simple to combine the two. You can see certain breeds of snakes and information on the best substrate for each snake included in /snake-types but not in /snake-breeds — this content would be an excellent addition to the Snake Breeds page and would negate the need for the Snake Types page entirely.
Once you know what you want to add, copy and paste the content from your consolidation page into a document and begin rewriting as needed to add the new information in a way that’s seamless for the user and adds value to the page’s content.
After you pull the valuable content from /snake-types and build it out into /snake-breeds, send a 301 redirect from /snake-types to the new updated page /snake-breeds.
Step 5 – Follow Best Practices.
Resource: SEMrush SEO Best Practices to Avoid Duplicate Content
As you are navigating through content creation moving forward, follow a best practices guide in your content creation process to avoid the creation of duplicate content in the future. Create your own or use one built out by other experts.
After all of this doable (maybe time-consuming) work, you probably will want to avoid doing this process again for the same site in 5 years. In the SEMrush Best Practices Guide to Avoid Creating Duplicate Content, they summarize this step as “focusing on creating unique quality content for your site.” They go on to explain that the best way to do this is to “think carefully about site structure and focus your users and their journey onsite.”
Additionally, Google recommends only publishing pages with rich content and reducing blocks of the content included in every page’s template like a descriptive copyright text.
Summary of the Process:
- Gather Your Content – You need Screaming Frog and an Excel or Google Sheet.
- Pull in Relevant Data – We recommend a tool like Moz’s Link Explorer, access to Google Search Console, and some excel or google sheet formulas.
- Analyze and Decide – You need the sheet you have built out and some patience (optional coffee, tea, or energy drink).
- Implement Decisions & Delegate Tasks – You need your team and a way to keep track of task completions.
- Steer Clear of Creating Duplicate Content – use best practices already built out, or create a guide for your content writer(s) to avoid duplicate content.
This is one process that has worked for us. We would be glad to hear if you have any tips or questions!