The truth about duplicate content and Google
by Brad Callen
So a concern of many webmasters out there is the idea of duplicate content. With all you read out there today I am sure this has crossed your mind from time to time as it has mine.
The idea of Google being the 800-lb gorilla is a long standing one and thinking that maybe some how and some way they have the ability to screen the internet for duplicate content on the fly sounds CRAZY but we always perpetuate the idea as paranoia is much easier to believe than fact in many cases.
I mean think about this for a second just in terms of computing power:
1. Google shows a cached version of your website and not the actual site listing in its results. This means that Google ACTUALLY STORES information about your sites pages and updates them on the fly.
2. Google’s index likely contains BILLIONS of webpages.
So what this means is that for Google to determine if your site is an exact copy of another persons website than it has to store your pages content and screen that content against its ENTIRE active index…sounds nearly impossible as far as I am concerned. To boot you have to consider this is happening simultaneous to the active crawling and indexing and re-indexing of new pages in Google.
I hope this is putting a little bit of perspective to the situation for you. Let’s further the concept. I will take an article from Ezinearticles.com:
http://ezinearticles.com/?Cocoa-Beans-and-the-Fierce-Competition-in-the-Chocolate-Industry&id=1674880
Now let’s make the assumption that there is a duplicate content filter. If that is the case Google will eliminate all duplications of this article from its index and a phrase match for this article title will only return 1 result, the one it considers to be the best right?
http://www.google.com/search?rlz=1C1GGLS_enUS291US304&sourceid=chrome&ie=UTF-8&q=”Cocoa+Beans+and+the+Fierce+Competition+in+the+Chocolate+Industry”
Yet somehow there are 162 results for that title. You can do this for every article title you test. Try it for yourself…it works and it is still the basis for article marketing and why article marketing still works.
Now this comes on the heels of an interesting article from Google which should further dispel any remaining myths you may have:
http://googlewebmastercentral.blogspot.com/2008/09/demystifying-duplicate-content-penalty.html
As SEO experts have been telling us for years there is NO DUPLICATE CONTENT PENALTY!
Ok….that is a little bit of a lie.
There are 2 types of duplicate content: external and internal duplicate content. Let’s take the first of the two.
1. External duplicate content.
External duplicate content is very similar to the example I just posted above where you see multiple copies of the same content on DIFFERENT WEBSITES. That last part is very important there as it seperates what could be considered as a penalty but in reality it is just competition.
So how is it that 162 sites can list the exact same article in Google and they allow it?
The quick answer is that they really don’t care. Yeah it may be mudding up the index but if a site or page is still showing in the index it is because they have enough authority or link clout to be there. Pages will be removed from the index if they get stale or have no links pointing to them for example but not for duplicating content.
The real issue becomes how can you soar to the top with a site that has repurposed content? I am inclined to say its not easy and more importantly its just not a good idea. If you are using content to build your website AND to promote it be sure to do this:
1. Your sites content is unique to your website
2. Your promotional content (PR’s, articles, etc…) belong outside of your website
This makes your life a lot easier. IT IS NOT FOR DUPLICATE CONTENT THOUGH. It’s to avoid competition with those with the same content. If you must republish content from another website make sure you add some commentary or additional content around it. This can include images or other variations of media and your doing this for your visitors sake. You need to stand out, not repeat what others are doing.
Bottom line, with external content the site with the most links and authority in Google will always win. Same as with keyword competition.
2. Internal Duplicate Content.
Ok this is as real as it gets but for the same reason. Its a web filter to keep you from competing for yourself but its a lot simpler to handle than you think.
This is important for the following individuals:
a. If your running a CMS out of the box
b. If you are running an eCommerce platform
c. If you are pulling pages and having them autogenerated
These are just a few examples of people who have had issues with internal duplication.
When Google does index your pages it will look at your title tags and meta descriptions and one of the evaluating factors it uses to determine to actually list your pages are the uniqueness of those tags. If my shopping cart system has 10 products which are all “shoes” and that is what I use as my page title for all of them and the meta description for all of them is “tennis shoes” Google will look at all 10 and simply choose the best 1 to list.
This is a big and easy fix and something I would advise for both usability and for regular old good SEO.
Every page NEEDS a unique title tag
Every page SHOULD HAVE a unique description
(or at the vesy least enough unique content to differentiate it from the others pages on your site)
So rest easy. There is NOT a duplicate content penalty from Google. If you site is not ranking I can assure you its because of the lack of links targeting your core keywords so go out and start link building!