Google’s Duplicate Content webmaster guide defines duplicate content (for purposes of search engine optimization) as “substantive blocks of content within or across domains that either completely match other content or are appreciably similar”.
Google’s guide goes on to list the following as examples of duplicate content:
- Discussion forums that can generate both regular and stripped-down pages targeted at mobile devices
- Store items shown or linked via multiple distinct URLs
- Printer-only versions of web pages
Search engines need to penalize some instances of duplicate content that are designed to spam their search index such as:
- scraper sites which copy content wholesale
- simplistic article spinning techniques which generate “new” content by selectively replacing words in existing content.
When search engines find duplicate content they may:
- Penalize an entire site that contains duplicate content. (when spammy)
- Pick a page as the canonical source of the content and lower the priority or not index the other page with the duplication. (common)
- Take no punitive action and index multiple copies of the content (rare)
Avoiding internal duplication
When asked about duplicate content, Google’s Matt Cutts said that it should only hurt you if it looks spammy, however many webmasters employ the following techniques to avoid unnecessary content duplication:
- Ensure that content is only accessible under one canonical URL
- If your site must return the same content under multiple URLs (e.g. for a “print view” page) specify a canonical URL manually with a link element in the document header
- In cases where your site returns similar content based upon parameters encoded in the URL (e.g. sorting a product catalog) exclude the URL parameters in Google Webmaster Tools
Publishing content on your site that has been published elsewhere is called content syndication. Creating duplicate content through content syndication can be OK:
- As long as you have permission to do so
- You tell your users what the content is and where it came from
- You link to an original source (A direct deep link to original content from the page with the copy, not just a link to the the home page of the site where the original can be found)
- Your users find it useful
- You have something to add to that content such that users would rather find that content on your site than elsewhere. (Commentary or critique for example.)
- You have enough original content on your site as well (at least 50% original, but ideally 80% original)
While Google doesn’t penalize for every instance of duplicated content, even non-penalized duplicate content may not help you get visitors:
- You are competing with all the other copies that are out there
- Google will likely prefer the original source of the content and the most reputable copy of the content.
Google will penalize duplicate content published on your website from other sources if:
- It appears to be scraped or stolen (especially without attribution).
- Users don’t react well to it (especially clicking back to Google after visiting your site.)
- There are so many copies of it out there that there is no reason to send users to your copy of it.
- Your copy isn’t the original, most reputable, or most usable; and doesn’t have any commentary or critique.
- Your site doesn’t have enough original content to balance all the republished content.
- You duplicate pages so often within your own site that Googlebot has trouble crawling the full site.
Internationalization and Geo Targeting
Content localization is one area in which duplicating content can be beneficial for SEO. It is perfectly fine to publish the same content on sites targeted at different countries that speak the same language. For example you may have a US site, a UK site, and an Australian site, all with the same content.
With a site for each country, it is usually possible to rank better for users in that country. In addition, it is possible to specifically cater to users in each country with minor spelling differences, pricing in the currency of the country, or product shipping options. For more information on setting up geo-targeted websites see How should I structure my URLs for both SEO and localization?
Dealing with Content Scrapers
Other sites that steal your content and republish it without permission can occasionally cause duplicate content problems for your site. Search engines work hard to ensure that it is hard for scraper sites to benefit from duplicating your content. If a scraper site is causing problems for you, then it may be possible to get the site removed from the Google index by filing a DMCA request with Google