
How to Find & Fix Duplicate Content on Your Website
Duplicate content can be bad. Using the same content, either in total or partial form, on your website leads to a poor user experience, and triggers a red flag in Google’s search algorithm. In the old days of SEO, duplicate content was often used as a cheap trick to get more keywords and more content on a website, so Google evolved a system to weed out the spammers who violated best practices by doing this. Today, if you’re caught using duplicate content, your domain authority could suffer and your keyword rankings could drop. In this post we discuss: What is duplicate content? Why is it bad? Content syndication & duplicate content What other content production tools can cause duplicate content? Types of duplicate content. Which are benign, which are toxic. How does Generative AI (artificial intelligence) content fit into the mix? How to avoid and/or clean up duplicate content Duplicate Content Defined In the vast majority of cases, duplicate content is non-malicious and simply a product of whichever CMS (content management system) the website happens to be running on. For example, WordPress (the industry-standard CMS) automatically creates “Category” pages and “tag” duplicate pages which list all blog posts within certain categories or tags. Or, the www vs.ur non-www version of a site may not be redirected properly, causing duplicate content from multiple URLs. This creates multiple pages or URL parameters within the domain that contain the same content. 1) Google may decide to let me off with a “warning” and simply choose not to index 99 of my 100 duplicate posts, but keep one of them indexed. NOTE: This doesn’t mean my website’s search rankings would be affected in any way by the duplicate content. 2) Google may decide it’s such a blatant attempt at gaming the system that it completely de-indexes my entire website from all search results. This means that, even if you searched directly for “Example.com” Google would find no results. So, one of those two scenarios is guaranteed to happen. Which one it is depends on how egregious Google determines your blunder to be. In Google’s own words: Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the pages with duplicate content is to be deceptive and manipulate search engine results. If your site suffers from duplicate content issues, and you don’t follow the advice listed above, we do a good job of choosing a canonical version of the content to show in a given search result. This type of non-malicious duplication is fairly common, especially since many CMSs don’t handle this well by default. So when people say that having this type of duplicate content can affect your site, it’s not because you’re likely to be penalized; it’s simply due to the way that web sites and search engines work. Most search engines strive for a certain level of variety; they want to show you ten different results on a search results page, not ten different URLs that all have the same content. To this end, Google tries to filter out duplicate content and documents so that users experience less redundancy. So, what happens when a search engine crawler detects duplicate content? (from https://searchengineland.com/search-illustrated-how-a-search-engine-determines-duplicate-content-13980) How to Find Duplicate Content Fixing duplicate content is relatively easy. Finding duplicate content is the hard part. Like I mentioned above, duplicate content can be tricky to detect—just because you don’t have any repeated content from a user experience perspective doesn’t mean you don’t have repeated content from a search algorithm’s perspective. Your first step is a manual one; go through your site and see if there are any obvious repetitions of content. As an example, do you have an identical paragraph concluding each of the pages on your site? Rewrite it. Did you re-use a section of a past blog post in a new post? Make a distinction. Once you’ve completed this initial manual scan, there are two main tools you can use to find more, better hidden instances of duplicate content. Perform Your Own Search First, you can perform a search to see through Google’s eyes. Use a Site: tag to restrict your search to your site only, and follow up with an intitle: tag to search for a specific phrase. It should look a little something like this: Site:thisisyoursite.comintitle:”thisisyourtargetphrase” This search will generate all the results on your given site that correlate to your chosen phrase. If you see multiple identical results, you know you have a duplicate content problem. Check Google Search Console (GSC) A simpler way to check for duplicate content is to use Google Webmaster Tools to crawl your site and report back on any errors. Once you’ve created and verified your Webmaster Tools account, head to the Search Appearance tab and click on “HTML Improvements.” Here, you’ll be able to see and download a list of duplicate meta descriptions and title tags. These are common and easily fixable issues that just require a bit of time to rewrite. To determine whether a sample of duplicate content is going to pull down your rankings, first you have to determine why you are going to publish such content in the first place. It all boils down to your purpose. If your goal is to try to punk the system by using a piece of content that has been published elsewhere, you’re bound to get penalized. The purpose is clearly deceptive and intended to manipulate search results. This is what Google has to say about this sort of behavior: Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results. Copyscape For 5 cents per search, you can have Copyscape vet an entire piece for you. But if your budget won’t allow that kind of expenditure, you can still use Copyscape for free. The catch with free Copyscape is that you’ll have to