The Great Deal Of Duplicate Content.

Sunday, December 24, 2006

Someone is stealing your content.

Over the time we had a lot about content. Content is important and owns a key-role in the optimization of our blogs and websites. Web has even evolved with the concept of user-developed content. Different formats of content have been evolved under same medium of Internet like never before.

Being in a pool of Internet content it is even more important for us to have it unique and fresh. Much of the content on Internet is not original, it is either duplicate content or near Duplicate content.

What is Duplicate Content?

We all know about viruses and how they propagate in web. Content propagates in web in the same way. In the mean time many copies of content are formed. User developed content on net with no special addition of information or value to the real information is known as Duplicate Content. Sometimes this is done giving credits to the original websites and sometimes it’s just ripped of.

What are different types of Duplicate Content?

Content that is syndicated and re-branded for different users and different market like Private Label Articles.
Websites/Blogs developed by just aggregating the content from different websites from around the web.
Plagiarism: Copying from Public domain websites like Wikipedia, Project Gutenberg.
Web press releases are most often are duplicated in many blogs and other web media sites.
Businesses in quest of building brand image and protecting their Trademark ownership, register all domains related to their main domain and build near duplicate, or similar websites with no new content.
Auto generated content from different types of content extractor software.
Many registered domains either containing keywords or optimized for different keywords, redirected to same website. This type of optimization is dangerous as content of the main website can be taken as Duplicate content.

If we leave some business compulsions, most Duplicate pages are generated to develop content rich websites and generate a passive income, by placing contextual ads and manipulating search engines for high search engine rankings and optimizing their main website, taking the un-authorized credit.

Most times this is done without providing any valuable unique content for the visitor, neither giving credits to the real owner of the content, nor putting any useful inputs in it.

Duplicate content may be good for many big firms in the way that their name is attached with the content, and even though the content is duplicate, there is no harm to the real owner/source of content as it is providing inbound links to the main site of the real content owner. But this is not always the case.

Why should we be concerned?

Most often when we write even about latest news or info, chances are there its been discussed at many other websites. Search engines try to ignore or even sometimes de-rank the sites, which have very little original content.

Using 301redirects is also an important ways of telling search engines that page have been shifted instead of developing similar page. But using too many redirects to a particular page may make search engines take your website content duplicate.

Latest trend of Page Jacking is really getting furious for new and small websites. Sometimes owners of authority websites with good page rank take the content from smaller sites/blogs and use them in their websites. Being white listed they have more relevant content at their website, and this ripped content looks more natural to that website. Now this may make search engines think as if the original website have duplicate content and it may face consequences.

You, now don’t have worry about things like http://domain.com and http://www.domain.com

How do Search engines look for Duplicate Content?

Google, Alta Vista(now owned by Yahoo.com) have many patented technologies with them to find duplicate content and it’s not matching word by word content but finding similarities.

Google looks for subsets of other content sources. It checks the for how recent the information has been showed and also checks the previous authority of blog or website.
Yahoo compares the outbound links from the content
Public Domain websites and other high page authority sites are very well crawled by these search engines. Most of the content optimized websites have similar or just copied content from these websites are easy prey of the search engines.
Similar title tag, Meta tag description, with similar content are easily identified by search engines, are noticeable traces of content by automated systems.

How to find if your content is duplicated?

Check by taking any 3-4 lines at random or taking important phrases from the article, and Google them by putting them in inverted commas.
Use service called “copyscape plagiarism” provided by www.copyscape.com.

More information and protection laws can be found at CreativeCommons.org and also at DisclosurePolicy.org.

Its really important for us to make information easily accessible as well we add valuable content to the web. Sooner or later most attempts of stealing content would be countered by search engines with better evolving technologies.

To your Success,
Divya Uttam
Happy Blogging.

Further Reading on Google Blog:- Deftly dealing with duplicate content

Related Posts:-
Neither Content Nor Links
Blog Optimization- Is Content the Only Real King online?
Blog Optimization- Content and its Supremacy.

Bookmark this article:-

Technorati Tags:- Blogging,Duplicate Content, Blogging Optimization,Optimize Blogging

Labels: Content Management

posted by Divya Uttam, 5:52 AM

2 Comments:

Thanks for a thoughtful article.

Makes me think there is a opening for a new web stabdard.

One like PDF being human readable, which is hard to copy automatically. The difference would be that search engines could read and categorise it.

This would kill the automated duplicate content & just leave plain old manual copy methods.

commented by

BrianP, 12:23 AM

Hi Divya,

An interesting article, but I don't believe there is a Duplicate Content Penalty Filter at Google.

They simply don't need to have one because they just return the most relevant search result to their users based on Pagerank... which is hard to manipulate.

See my blog for a movie debunking the duplicate penalty myth.

Neil Shearing

commented by

Anonymous, 4:37 AM

Add a comment

Blogging to Fame Feed Options

Most Discussed Stories

Reviews- A mere Link Building and Money Making Tactics

Design is to Attract, Usability is to Engage Part 1

How can you give more to your Readers

The Death of Search Engine Domination in Web2.0

Latest Entries

11 Ways to Increase Blog Traffic using Feedburner Analytics for better Optimization Neither Content Nor Links Blog Optimization- Is Content the Only Real King o... Blog Optimization: The ethical and unethical ways... Blog Optimization-Make your Blog a Linkbait Enough… Make your blog work. Blog Optimization- Content and its Supremacy. So You Are A Wanna be A-List Blogger, Ugh! Keyword Research and Adsense Income.

To Readers

With Blogging to fame I share my experiences and methods I learn here being an Internet e-preneur, SEO, and Web usability consultant to many websites and Blogs. Readers would definitely gain lots of information and knowledge by being regular here. Feel free to help me in taking this information to people who would be benefited or by providing your wisdom to the readers of this blog. And Yes I would be secretly developing a formula for Blogging To Fame.

email- divya[at]worldnetlabs[dot]com
Ph no.- +919703561600

Bookmark this Blog

Save This Blog with Technorati

Save This Blog with del.icio.us

Blogging to Fame!