Friday, August 04, 2006
Much current SEO thinking on what works and what doesn't is largely speculation and informed guesses. Some SEOs have carried out controlled experiments to gauge the effects of different approaches to search optimization.
The following, though, are some of the considerations search engines could be building into their algorithms, and the list of Google patents may give some indication as to what is in the pipeline:
• Age of site
• Length of time domain has been registered
• Age of content
• Regularity with which new content is added
• Age of link and reputation of linking site
• Standard on-site factors
• Negative scoring for on-site factors (for example, a dampening for sites with extensive keyword meta tags indicative of having being SEO-ed)
• Uniqueness of content
• Related terms used in content (the terms the search engine associates as being related to the main content of the page)
• Google Pagerank (Only used in Google's algorithm)
• External links, the anchor text in those external links and in the sites/pages containing those links
• Citations and research sources (indicating the content is of research quality)
• Stem-related terms in the search engine's database (finance/financing)
• Incoming backlinks and anchor text of incoming backlinks
• Negative scoring for some incoming backlinks (perhaps those coming from low value pages, reciprocated backlinks, etc.)
• Rate of acquisition of backlinks: too many too fast could indicate "unnatural" link buying activity
• Text surrounding outward links and incoming backlinks. A link following the words "Sponsored Links" could be ignored
• Use of "rel=nofollow" to suggest that the search engine should ignore the link
• Depth of document in site
• Metrics collected from other sources, such as monitoring how frequently users hit the back button when SERPs send them to a particular page
• Metrics collected from sources like the Google Toolbar, Google AdWords/Adsense programs, etc.
• Metrics collected in data-sharing arrangements with third parties (like providers of statistical programs used to monitor site traffic)
• Rate of removal of incoming links to the site
• Use of sub-domains, use of keywords in sub-domains and volume of content on sub-domains… and negative scoring for such activity
• Semantic connections of hosted documents
• Rate of document addition or change
• IP of hosting service and the number/quality of other sites hosted on that IP
• Other affiliations of linking site with the linked site (do they share an IP? have a common postal address on the "contact us" page?)
• Technical matters like use of 301 to redirect moved pages, showing a 404 server header rather than a 200 server header for pages that don't exist, proper use of robots.txt
• Hosting uptime
• Whether the site serves different content to different categories of users (cloaking)
• Broken outgoing links not rectified promptly
• Unsafe or illegal content
• Quality of HTML coding, presence of coding errors
• Actual click through rates observed by the search engines for listings displayed on their SERPs
• Hand ranking by humans of the most frequently accessed SERPs
The following, though, are some of the considerations search engines could be building into their algorithms, and the list of Google patents may give some indication as to what is in the pipeline:
• Age of site
• Length of time domain has been registered
• Age of content
• Regularity with which new content is added
• Age of link and reputation of linking site
• Standard on-site factors
• Negative scoring for on-site factors (for example, a dampening for sites with extensive keyword meta tags indicative of having being SEO-ed)
• Uniqueness of content
• Related terms used in content (the terms the search engine associates as being related to the main content of the page)
• Google Pagerank (Only used in Google's algorithm)
• External links, the anchor text in those external links and in the sites/pages containing those links
• Citations and research sources (indicating the content is of research quality)
• Stem-related terms in the search engine's database (finance/financing)
• Incoming backlinks and anchor text of incoming backlinks
• Negative scoring for some incoming backlinks (perhaps those coming from low value pages, reciprocated backlinks, etc.)
• Rate of acquisition of backlinks: too many too fast could indicate "unnatural" link buying activity
• Text surrounding outward links and incoming backlinks. A link following the words "Sponsored Links" could be ignored
• Use of "rel=nofollow" to suggest that the search engine should ignore the link
• Depth of document in site
• Metrics collected from other sources, such as monitoring how frequently users hit the back button when SERPs send them to a particular page
• Metrics collected from sources like the Google Toolbar, Google AdWords/Adsense programs, etc.
• Metrics collected in data-sharing arrangements with third parties (like providers of statistical programs used to monitor site traffic)
• Rate of removal of incoming links to the site
• Use of sub-domains, use of keywords in sub-domains and volume of content on sub-domains… and negative scoring for such activity
• Semantic connections of hosted documents
• Rate of document addition or change
• IP of hosting service and the number/quality of other sites hosted on that IP
• Other affiliations of linking site with the linked site (do they share an IP? have a common postal address on the "contact us" page?)
• Technical matters like use of 301 to redirect moved pages, showing a 404 server header rather than a 200 server header for pages that don't exist, proper use of robots.txt
• Hosting uptime
• Whether the site serves different content to different categories of users (cloaking)
• Broken outgoing links not rectified promptly
• Unsafe or illegal content
• Quality of HTML coding, presence of coding errors
• Actual click through rates observed by the search engines for listings displayed on their SERPs
• Hand ranking by humans of the most frequently accessed SERPs