You are previewing The Art of SEO, 2nd Edition.

The Art of SEO, 2nd Edition

Cover of The Art of SEO, 2nd Edition by Eric Enge... Published by O'Reilly Media, Inc.
  1. The Art of SEO
  2. Dedication
  3. Foreword
  4. Preface
    1. Who Should Read This Book
    2. Conventions Used in This Book
    3. Using Code Examples
    4. Safari® Books Online
    5. How to Contact Us
    6. Acknowledgments
  5. 1. Search: Reflecting Consciousness and Connecting Commerce
    1. The Mission of Search Engines
    2. The Market Share of Search Engines
    3. The Human Goals of Searching
      1. Who Searches and What Do They Search For?
    4. Determining Searcher Intent: A Challenge for Both Marketers and Search Engines
      1. Navigational Queries
      2. Informational Queries
      3. Transactional Queries
      4. Adaptive Search
      5. How Publishers Can Leverage Intent
    5. How People Search
    6. How Search Engines Drive Commerce on the Web
    7. Eye Tracking: How Users Scan Results Pages
    8. Click Tracking: How Users Click on Results, Natural Versus Paid
      1. Distribution of Search Results and Traffic
      2. Different Intents and Effects of Listings in Paid Versus Natural Results
      3. Interaction Between Natural and Paid Search
    9. Conclusion
  6. 2. Search Engine Basics
    1. Understanding Search Engine Results
      1. Understanding the Layout of Search Results Pages
      2. How Vertical Results Fit into the SERPs
    2. Algorithm-Based Ranking Systems: Crawling, Indexing, and Ranking
      1. Crawling and Indexing
      2. Retrieval and Rankings
      3. Evaluating Content on a Web Page
      4. What Content Can Search Engines “See” on a Web Page?
    3. Determining Searcher Intent and Delivering Relevant, Fresh Content
      1. Document Analysis and Semantic Connectivity
      2. Measuring Content Quality and User Engagement
      3. Link Analysis
      4. Evaluating Social Media Signals
      5. Problem Words, Disambiguation, and Diversity
      6. A Few Reasons Why These Algorithms Sometimes Fail
    4. Analyzing Ranking Factors
      1. Negative Ranking Factors
      2. Other Ranking Factors
    5. Using Advanced Search Techniques
      1. Advanced Google Search Operators
      2. Bing Advanced Search Operators
      3. More Advanced Search Operator Techniques
    6. Vertical Search Engines
      1. Vertical Search from the Major Search Engines
      2. Universal Search/Blended Search
    7. Country-Specific Search Engines
      1. Optimizing for Specific Countries
    8. Conclusion
  7. 3. Determining Your SEO Objectives and Defining Your Site’s Audience
    1. Strategic Goals SEO Practitioners Can Fulfill
      1. Visibility (Branding)
      2. Website Traffic
      3. High ROI
    2. Every SEO Plan Is Custom
    3. Understanding Search Engine Traffic and Visitor Intent
    4. Developing an SEO Plan Prior to Site Development
      1. Business Factors That Affect the SEO Plan
    5. Understanding Your Audience and Finding Your Niche
      1. Mapping Your Products and Services
      2. Content Is King
      3. Segmenting Your Site’s Audience
    6. SEO for Raw Traffic
    7. SEO for Ecommerce Sales
    8. SEO for Mindshare/Branding
    9. SEO for Lead Generation and Direct Marketing
    10. SEO for Reputation Management
    11. SEO for Ideological Influence
      1. Advanced Methods for Planning and Evaluation
    12. Conclusion
  8. 4. First Stages of SEO
    1. The Major Elements of Planning
      1. Technology Choices
      2. Market Segmentation
      3. Where You Can Find Great Links
      4. Content Resources
      5. Branding Considerations
      6. Competition
    2. Identifying the Site Development Process and Players
    3. Defining Your Site’s Information Architecture
      1. Technology Decisions
      2. Structural Decisions
    4. Auditing an Existing Site to Identify SEO Problems
      1. Elements of an Audit
      2. The Importance of Keyword Reviews
      3. Keyword Cannibalization
      4. Example: Fixing an Internal Linking Problem
      5. Server and Hosting Issues
    5. Identifying Current Server Statistics Software and Gaining Access
      1. Web Analytics
      2. Logfile Tracking
      3. Google and Bing Webmaster Tools
      4. Search Analytics
    6. Determining Top Competitors
      1. Two Spam Examples
      2. Seeking the Best
      3. Uncovering Their Secrets
    7. Assessing Historical Progress
      1. Maintain a Timeline of Site Changes
      2. Types of Site Changes That Can Affect SEO
      3. Previous SEO Work
    8. Benchmarking Current Indexing Status
    9. Benchmarking Current Rankings
    10. Benchmarking Current Traffic Sources and Volume
    11. Leveraging Business Assets for SEO
      1. Other Domains You Own/Control
      2. Partnerships On and Off the Web
      3. Content or Data You’ve Never Put Online
      4. Customers Who Have Had a Positive Experience
      5. Your Fans
    12. Combining Business Assets and Historical Data to Conduct SEO/Website SWOT Analysis
    13. Conclusion
  9. 5. Keyword Research
    1. Thinking Strategically
    2. Understanding the Long Tail of the Keyword Demand Curve
    3. Traditional Approaches: Domain Expertise, Site Content Analysis
      1. Include Competitive Analysis
    4. Keyword Research Tools
      1. Things to Keep in Mind
      2. Keyword Research Data from the Engines
      3. Keyword Research with Tools
      4. Other Tools of Interest
    5. Determining Keyword Value/Potential ROI
      1. Estimating Value, Relevance, and Conversion Rates
      2. Testing Ad Campaign Runs and Third-Party Search Data
      3. Using Landing Page Optimization
    6. Leveraging the Long Tail of Keyword Demand
      1. Extracting Terms from Relevant Web Pages
      2. Mining Keyword Research Tools
      3. Identifying Long-Tail Patterns
      4. Editorial Content Strategies for Long-Tail Targeting
      5. User-Generated Content Strategies for Long-Tail Targeting
    7. Trending, Seasonality, and Seasonal Fluctuations in Keyword Demand
    8. Conclusion
  10. 6. Developing an SEO-Friendly Website
    1. Making Your Site Accessible to Search Engines
      1. Indexable Content
      2. Spiderable Link Structures
      3. XML Sitemaps
    2. Creating an Optimal Information Architecture (IA)
      1. The Importance of a Logical, Category-Based Flow
      2. Site Architecture Design Principles
      3. Flat Versus Deep Architecture
      4. Search-Friendly Site Navigation
    3. Root Domains, Subdomains, and Microsites
      1. When to Use a Subfolder
      2. When to Use a Subdomain
      3. When to Use a Separate Root Domain
      4. Microsites
      5. When to Use a TLD Other than .com
    4. Optimization of Domain Names/URLs
      1. Optimizing Domains
      2. Picking the Right URLs
    5. Keyword Targeting
      1. Title Tags
      2. Meta Description Tags
      3. Heading (H1, H2, H3) Tags
      4. Document Text
      5. Image Filenames and alt Attributes
      6. Boldface Text
      7. Avoiding Keyword Cannibalization
      8. Keyword Targeting in CMSs and Automatically Generated Content
      9. SEO Copywriting: Encouraging Effective Keyword Targeting by Content Creators
      10. Long-Tail Keyword Targeting
    6. Content Optimization
      1. Content Structure
      2. CSS and Semantic Markup
      3. and Microformats
      4. Content Uniqueness and Depth
      5. Content Themes
    7. Duplicate Content Issues
      1. Consequences of Duplicate Content
      2. How Search Engines Identify Duplicate Content
      3. Identifying and Addressing Copyright Infringement
      4. How to Avoid Duplicate Content on Your Own Site
    8. Controlling Content with Cookies and Session IDs
      1. What’s a Cookie?
      2. What Are Session IDs?
      3. How Do Search Engines Interpret Cookies and Session IDs?
      4. Why Would You Want to Use Cookies or Session IDs to Control Search Engine Access?
    9. Content Delivery and Search Spider Control
      1. Cloaking and Segmenting Content Delivery
      2. When to Show Different Content to Engines and Visitors
      3. How to Display Different Content to Search Engines and Visitors
    10. Redirects
      1. Why and When to Redirect
      2. Good and Bad Redirects
      3. Methods for URL Redirecting and Rewriting
      4. Redirecting a Home Page Index File Without Looping
    11. Content Management System (CMS) Issues
      1. Selecting a CMS
      2. Third-Party CMS Add-ons
      3. Flash
      4. Flash Coding Best Practices
    12. Best Practices for Multilanguage/Country Targeting
      1. Targeting a Specific Country
      2. Problems with Using Your Existing Domain
      3. The Two Major Approaches
      4. Multiple-Language Issues
    13. Conclusion
  11. 7. Creating Link-Worthy Content and Link Marketing
    1. How Links Influence Search Engine Rankings
      1. The Original PageRank Algorithm
      2. Additional Factors That Influence Link Value
      3. How Search Engines Use Links
    2. Further Refining How Search Engines Judge Links
      1. Additional Link Evaluation Criteria
      2. Determining a Link’s Value
    3. The Psychology of Linking
      1. Why Are Links Created?
      2. How Can Sites Approach Getting Links?
    4. Types of Link Building
      1. Using Content to Attract Links
      2. Marketing Content for Link Acquisition
      3. Directories
      4. Incentive-Based Link Requests
      5. Direct Link Requests
      6. Manual Social Media Link Creation
      7. Gray Hat/Black Hat
    5. Choosing the Right Link-Building Strategy
      1. Outline of a Process
      2. Link-Building Process Summary
      3. Putting It All Together
    6. More Approaches to Content-Based Link Acquisition
      1. A Closer Look at Content Syndication
      2. Leveraging User-Generated Content
      3. Creating Link Bait/Viral Content
    7. Incentive-Based Link Marketing
      1. Helping Other Sites Boost Their Value
      2. Offering Customer Discounts/Incentives
    8. How Search Engines Fight Link Spam
      1. Algorithmic Approaches to Fighting Link Spam
      2. Other Search Engine Courses of Action
    9. Social Networking for Links
      1. Blogging for Links
      2. Leveraging Social News and Tagging Sites
      3. Forum and Social Network Participation
      4. Offline Relationship Building
      5. Some Success Stories Using YouTube
      6. Social Media Tips for More Sites
      7. Social Media Summary
    10. Conclusion
  12. 8. How Social Media and User Data Play a Role in Search Results and Rankings
    1. Why Rely on Social Signals?
      1. Social Media Signals Provide New Tools
    2. Social Signals That Directly Influence Search Results
      1. Tweeted Links Are Similar to Web Page Links
      2. Facebook Shares/Links as a Ranking Factor
      3. Facebook Likes Are Votes, Too
      4. Google+ Shares as a Ranking Factor
      5. Google +1s Are Also an Endorsement
    3. The Indirect Influence of Social Media Marketing
      1. Social Search Features from Google and Bing
      2. Summary of Social Sources to Consider
    4. Monitoring, Measuring, and Improving Social Media Marketing
      1. Best Practices and Recommendations for Social Media Marketing
      2. Claim Key Profiles
      3. Deciding on a New Social Network
      4. Tracking Social Media in Your Web Analytics
      5. Tools for Measuring Social Media Metrics
    5. User Engagement as a Measure of Search Quality
      1. How Google and Bing Collect Engagement Metrics
      2. Potential User Engagement Signals
      3. Voting Mechanisms
    6. Document Analysis
      1. Poor Editorial Quality
      2. Reading Level
      3. Keyword Stuffing/Lack of Synonyms
      4. Ad Density and Offensive Ads
      5. Sameness
      6. Page Speed
    7. Optimizing the User Experience to Improve SEO
      1. Step 1: Build a Survey
      2. Step 2: Send It to Your Customers/Potential Customers
      3. Step 3: Record Responses and Leverage Them to Build What the People Want
    8. Additional Social Media Resources
      1. Social Media Blogs
      2. Tools
    9. Conclusion
  13. 9. Optimizing for Vertical Search
    1. The Opportunities in Vertical Search
      1. Universal Search and Blended Search
      2. The Opportunity Unleashed
    2. Optimizing for Local Search
      1. Foundation: Check Your Local Listings
      2. Introduction to Local Business Profiles
      3. Local Agency Management
      4. Optimizing Your Website for Local Search Engines
    3. Optimizing for Image Search
      1. Image Optimization Tips
      2. Optimizing Through Flickr and Other Image Sharing Sites
    4. Optimizing for Product Search
      1. Getting into Google Product Search
    5. Optimizing for News, Blog, and Feed Search
      1. RSS Feed Optimization
      2. RSS Feed Tracking and Measurement
      3. Media RSS (mRSS)
      4. Other RSS Optimization Considerations
      5. Blog Optimization
      6. News Search Optimization
    6. Others: Mobile, Video/Multimedia Search
      1. Mobile Search
      2. Video Search Optimization
    7. Conclusion
  14. 10. Tracking Results and Measuring Success
    1. Why Measuring Success Is Essential to the SEO Process
      1. The Tracking Cycle: Produce, Launch, Measure, Refine
      2. How to Establish a Proper Baseline
      3. Using Analytics as a Business Case for SEO
    2. Measuring Search Traffic
      1. Basic Overview
      2. Selecting the Right Analytics Package
      3. Valuable SEO Data in Web Analytics
      4. Segmenting Search Traffic
      5. Referring Sites
      6. Using Analytics Dashboards
      7. A Deeper Look at Action Tracking
      8. Separating the Analytics Wheat from the Chaff
    3. Tying SEO to Conversion and ROI
      1. Attribution
      2. Setting Up Analytics Software to Track Conversions
      3. Segmenting Campaigns and SEO Efforts by Conversion Rate
      4. Increasing Conversions
      5. Determining Project ROI
    4. Competitive and Diagnostic Search Metrics
      1. Search Engine and Competitive Metrics
      2. Site Indexing Data
      3. Link Building, Link Tracking, and Link-Based Metrics (Including Anchor Text Analysis)
      4. Ranking
      5. Shelf space
      6. SEO Platforms
      7. Crawl Errors
      8. Tracking the Blogosphere
      9. Tracking Your Blog(s)
      10. Search Engine Robot Traffic Analysis
      11. Web Traffic Comparison
      12. Temporal Link Growth Measurements
    5. Key Performance Indicators for Long-Tail SEO
      1. Tracking Duplicate Content
    6. Other Third-Party Tools
      1. SEO for Firefox
      2. UrlTrends
      3. SpyFu
      4. SEMRush
      5. Covario Organic Search Insight (OSI)
      6. Covario Organic Search Optimizer (OSO)
      7. Searchmetrics Essentials
    7. Conclusion
  15. 11. Domain Changes, Post-SEO Redesigns, and Troubleshooting
    1. The Basics of Moving Content
      1. Large-Scale Content Moves
      2. Mapping Content Moves
      3. Expectations for Content Moves
    2. Maintaining Search Engine Visibility During and After a Site Redesign
    3. Maintaining Search Engine Visibility During and After Domain Name Changes
      1. Unique Challenges of Domain Name Changes
      2. Pre-move Preparations
    4. Changing Servers
      1. Monitoring After Your Server Move
      2. Other Scenarios Similar to Server Moves
    5. Hidden Content
      1. Identifying Content That Engines Don’t See
      2. Identifying the Cause of Non-Spidering
      3. Hidden Content That May Be Viewed as Spam
    6. Spam Filtering and Penalties
      1. Recognizing Low-Quality Domains and Spam Sites
      2. Competitors Can Report You
      3. Duplicate Content
      4. Basic Rules for Spam-Free SEO
      5. Identifying Search Engine Penalties
      6. Reinclusion/Reconsideration Requests
    7. Content Theft
    8. Changing SEO Vendors or Staff Members
      1. Potential Problems
      2. Documenting SEO Actions and Progress
      3. Rapid Training
      4. Cleaning Up
    9. Conclusion
  16. 12. SEO Research and Study
    1. SEO Research and Analysis
      1. SEO Resources
      2. SEO Testing
      3. Analysis of Top-Ranking Sites and Pages
      4. Analysis of Algorithmic Differentiation Across Engines and Search Types
      5. The Importance of Experience
    2. Competitive Analysis
      1. Content Analysis
      2. Internal Link Structure and Site Architecture
      3. External Link Attraction Analysis
      4. What Is Their SEO Strategy?
      5. Competitive Analysis Summary
      6. Using Competitive Link Analysis Tools
      7. Competitive Analysis for Those with a Big Budget
    3. Using Search Engine–Supplied SEO Tools
      1. Search Engine Webmaster Tools
    4. The SEO Industry on the Web
      1. Blogs
      2. Forums
      3. Communities in Social Networks
    5. Participation in Conferences and Organizations
    6. Conclusion
  17. 13. Build an In-House SEO Team, Outsource It, or Both?
    1. The Business of SEO
      1. Understand Your Market Opportunity
      2. Get Buy-in Across the Organization
      3. Lay the Groundwork
      4. Motivate Resources That Don’t Share Your Goals to Help You
      5. Progress Through the Stages of SEO Maturity
      6. Build an SEO team
    2. The Dynamics and Challenges of Using In-House Talent Versus Outsourcing
      1. The Value of In-House SEO
      2. The Value of Outsourced Solutions
      3. Leveraging SEO Knowledge in an Organization
    3. The Impact of Site Complexity on SEO Workload
    4. Solutions for Small Organizations
      1. In-House SEO Specialist
      2. Outsourced Agency/Consultant/Contractor
      3. Working with Limited Resources/Budget
    5. Solutions for Large Organizations
      1. Contracting for Specialist Knowledge and Experience
      2. Applying SEO Recommendations Intelligently
    6. Hiring SEO Talent
      1. How to Select the Right SEO Practitioner
      2. Pitching the Person
      3. Sample Job Opening
      4. Making the Offer
    7. The Case for Working with an Outside Expert
      1. How to Best Leverage Outside Help
    8. Selecting an SEO Firm/Consultant
      1. Getting the Process Started
      2. Preparing a Request for Proposal (RFP)
      3. Communicating with Candidate SEO Firms
      4. Making the Decision
    9. Mixing Outsourced SEO with In-House SEO Teams
    10. Building a Culture of SEO into Your Organization
    11. Conclusion
  18. 14. An Evolving Art Form: The Future of SEO
    1. The Ongoing Evolution of Search
      1. The Growth of Search Complexity
      2. Google’s Dominance
    2. More Searchable Content and Content Types
      1. Engines Will Make Crawling Improvements
      2. Engines Are Getting New Content Sources
      3. Multimedia Is Becoming Indexable
    3. Personalization, Localization, and User Influence on Search
      1. Determining User Intent
      2. User Interactions
      3. New Search Patterns
      4. Growing Reliance on the Cloud
    4. The Increasing Importance of Local, Mobile, and Voice Recognition Search
      1. Local Search
      2. Mobile Search
      3. Voice-Recognition Search
    5. Increased Market Saturation and Competition
    6. SEO as an Enduring Art Form
    7. Conclusion
  19. Glossary
  20. Index
  21. About the Authors
  22. Colophon
  23. Copyright
O'Reilly logo

Auditing an Existing Site to Identify SEO Problems

Auditing an existing site is one of the most important tasks that SEO professionals encounter. SEO is still a relatively new field,and many of the limitations of search engine crawlers are nonintuitive. In addition, many web developers, unfortunately, are not well versed in SEO. Even more unfortunately, some stubbornly refuse to learn, or, worse still, have learned the wrong things about SEO. This includes those who have developed CMS platforms, so there is a lot of opportunity to find problems when conducting a site audit.

Elements of an Audit

As we will discuss in Chapter 6, your website needs to be a strong foundation for the rest of your SEO efforts to succeed. An SEO site audit is often the first step in executing an SEO strategy.

The following sections identify what you should look for when performing a site audit.


Although this may not be seen as a direct SEO issue, it is a very good place to start. Usability affects many factors, including conversion rate as well as the propensity of people to link to a site.


Make sure the site is friendly to search engine spiders. We discuss this in detail in Making Your Site Accessible to Search Engines and Creating an Optimal Information Architecture (IA).

Search engine health check

Here are some quick health checks:

  • Perform a search in the search engines to check how many of your pages appear to be in the index. Compare this to the number of unique pages you believe you have on your site.

  • Test a search on your brand terms to make sure you are ranking for them (if not, you may be suffering from a penalty).

  • Check the Google cache to make sure the cached versions of your pages look the same as the live versions of your pages.

  • Check to ensure major search engine “tools” have been verified for the domain (Google and Bing currently offer site owner validation to “peek” under the hood of how the engines view your site).

Keyword health checks

Are the right keywords being targeted? Does the site architecture logically flow from the way users search on related keywords? Does more than one page target the same exact keyword (a.k.a. keyword cannibalization)? We will discuss these items in Keyword Targeting.

Duplicate content checks

The first thing you should do is to make sure the non-www versions of your pages (i.e., 301-redirect to the www versions of your pages (i.e.,, or vice versa (this is often called the canonical redirect). While you are at it, check that you don’t have https: pages that are duplicates of your http: pages. You should check the rest of the content on the site as well.

The easiest way to do this is to take unique strings from each of the major content pages on the site and search on them in Google. Make sure you enclose the string inside double quotes (e.g., “a phrase from your website that you are using to check for duplicate content”) so that Google will search for that exact string.

If your site is monstrously large and this is too big a task, make sure you check the most important pages, and have a process for reviewing new content before it goes live on the site.

You can also use commands such as inurl: and intitle: (see Table 2-1) to check for duplicate content. For example, if you have URLs for pages that have distinctive components to them (e.g., “1968-mustang-blue” or “1097495”), you can search for these with the inurl: command and see whether they return more than one page.

Another duplicate content task to perform is to make sure each piece of content is accessible at only one URL. This probably trips up more big commercial sites than any other issue. The issue is that the same content is accessible in multiple ways and on multiple URLs, forcing the search engines (and visitors) to choose which is the canonical version, which to link to, and which to disregard. No one wins when sites fight themselves—make peace, and if you have to deliver the content in different ways, rely on cookies so that you don’t confuse the spiders.

URL check

Make sure you have clean, short, descriptive URLs. Descriptive means keyword-rich but not keyword-stuffed. You don’t want parameters appended (have a minimal number if you must have any), and you want them to be simple and easy for users (and spiders) to understand.

Title tag review

Make sure the title tag on each page of the site is unique and descriptive. If you want to include your company brand name in the title, consider putting it at the end of the title tag, not at the beginning, as placement of keywords at the front of a URL brings ranking benefits. Also check to make sure the title tag is fewer than 70 characters long.

Content review

Do the main pages of the site have enough content? Do these pages all make use of header tags? A subtler variation of this is making sure the number of pages on the site with little content is not too high compared to the total number of pages on the site.

Meta tag review

Check for a meta robots tag on the pages of the site. If you find one, you may have already spotted trouble. An unintentional NoIndex or NoFollow tag (we define these in Content Delivery and Search Spider Control) could really mess up your search ranking plans.

Also make sure every page has a unique meta description. If for some reason that is not possible, consider removing the meta description altogether. Although the meta description tags are generally not a significant factor in ranking, they may well be used in duplicate content calculations, and the search engines frequently use them as the description for your web page in the SERPs; therefore, they affect click-though rate.

Sitemaps file and robots.txt file verification

Use the Google Webmaster Tools “Test robots.txt” verification tool to check your robots.txt file. Also verify that your Sitemaps file is identifying all of your (canonical) pages.

Redirect checks

Use a server header checker such as Live HTTP Headers ( to check that all the redirects used on the site return a 301 HTTP status code. Check all redirects this way to make sure the right thing is happening. This includes checking that the canonical redirect is properly implemented.

Unfortunately, given the nonintuitive nature of why the 301 redirect is preferred, you should verify that this has been done properly even if you have provided explicit direction to the web developer in advance. Mistakes do get made, and sometimes the CMS or the hosting company makes it difficult to use a 301.

Internal linking checks

Look for pages that have excessive links. Google advises 100 per page as a maximum, although it is OK to increase that on more important and heavily linked-to pages.

Make sure the site makes good use of anchor text in its internal links. This is a free opportunity to inform users and search engines what the various pages of your site are about. Don’t abuse it, though. For example, if you have a link to your home page in your global navigation (which you should), call it “Home” instead of picking your juiciest keyword. The search engines view that particular practice as spammy, and it does not engender a good user experience. Furthermore, the anchor text of internal links to the home page is not helpful for rankings anyway. Keep using that usability filter through all of these checks!


A brief aside about hoarding PageRank: many people have taken this to an extreme and built sites where they refused to link out to other quality websites, because they feared losing visitors and link juice. Ignore this idea! You should link out to quality websites. It is good for users, and it is likely to bring you ranking benefits (through building trust and relevance based on what sites you link to). Just think of your human users and deliver what they are likely to want. It is remarkable how far this will take you.

Avoidance of unnecessary subdomains

The engines may not apply the entirety of a domain’s trust and link juice weight to subdomains. This is largely due to the fact that a subdomain could be under the control of a different party, and therefore in the search engine’s eyes it needs to be separately evaluated. In the great majority of cases, subdomain content can easily go in a subfolder.


If the domain is targeting a specific country, make sure the guidelines for country geotargeting outlined in Best Practices for Multilanguage/Country Targeting in Chapter 6 are being followed. If your concern is primarily about ranking for chicago pizza because you own a pizza parlor in Chicago, IL, make sure your address is on every page of your site. You should also check your results in Google Local to see whether you have a problem there. Additionally, you will want to register with Google Places, which is discussed in detail in Chapter 9.

External linking

Check the inbound links to the site. Use a backlinking tool such as Open Site Explorer ( or Majestic SEO ( to collect data about your links. Look for bad patterns in the anchor text, such as 87% of the links having the critical keyword for the site in them. Unless the critical keyword happens to also be the name of the company, this is a sure sign of trouble. This type of distribution is quite likely the result of link purchasing or other manipulative behavior.

On the flip side, make sure the site’s critical keyword is showing up a fair number of times. A lack of the keyword usage in inbound anchor text is not good either. You need to find a balance.

Also look to see that there are links to pages other than the home page. These are often called deep links and they will help drive the ranking of key sections of your site. You should look at the links themselves, too. Visit the linking pages and see whether the links appear to be paid for. They may be overtly labeled as sponsored, or their placement may be such that they are clearly not natural endorsements. Too many of these are another sure sign of trouble.

Lastly, check how the link profile for the site compares to the link profiles of its major competitors. Make sure that there are enough external links to your site, and that there are enough high-quality links in the mix.

Page load time

Is the page load time excessive? Too long a load time may slow down crawling and indexing of the site. However, to be a factor, this really does need to be excessive—certainly longer than five seconds, and perhaps even longer than that.

Image alt tags

Do all the images have relevant, keyword-rich image alt attribute text and filenames? Search engines can’t easily tell what is inside an image, and the best way to provide them with some clues is with the alt attribute and the filename of the image. These can also reinforce the overall context of the page itself.

Code quality

Although W3C validation is not something the search engines require, checking the code itself is a good idea. Poor coding can have some undesirable impacts. You can use a tool such as SEO Browser ( to see how the search engines see the page.

The Importance of Keyword Reviews

Another critical component of an architecture audit is a keyword review. Basically, this involves the following steps.

Step 1: Keyword research

It is vital to get this done as early as possible in any SEO process. Keywords drive on-page SEO, so you want to know which ones to target. You can read about this in more detail in Chapter 5.

Step 2: Site architecture

Coming up with a site architecture can be very tricky. At this stage, you need to look at your keyword research and the existing site (to make as few changes as possible). You can think of this in terms of your site map.

You need a hierarchy that leads site visitors to your money pages (i.e., the pages where conversions are most likely to occur). Obviously, a good site hierarchy allows the parents of your money pages to rank for relevant keywords (which are likely to be shorter tail).

Most products have an obvious hierarchy they fit into, but when you start talking in terms of anything that naturally has multiple hierarchies, it gets incredibly tricky. The trickiest hierarchies, in our opinion, occur when there is a location involved. In London alone there are London boroughs, metropolitan boroughs, Tube stations, and postcodes. London even has a city (“The City of London”) within it.

In an ideal world, you will end up with a single hierarchy that is natural to your users and gives the closest mapping to your keywords. Whenever there are multiple ways in which people search for the same product, establishing a hierarchy becomes challenging.

Step 3: Keyword mapping

Once you have a list of keywords and a good sense of the overall architecture, start mapping the major relevant keywords to URLs (not the other way around). When you do this, it is a very easy job to spot pages that you were considering creating that aren’t targeting a keyword (perhaps you might skip creating these), and, more importantly, keywords that don’t have a page.

It is worth pointing out that between step 2 and step 3 you will remove any wasted pages.

If this stage is causing you problems, revisit step 2. Your site architecture should lead naturally to a mapping that is easy to use and includes your keywords.

Step 4: Site review

Once you are armed with your keyword mapping, the rest of the site review becomes a lot easier. Now when you are looking at title tags and headings, you can refer back to your keyword mapping and see not only see whether the heading is in an <h1> tag, but also whether it includes the right keywords.

Keyword Cannibalization

Keyword cannibalization typically starts when a website’s information architecture calls for the targeting of a single term or phrase on multiple pages of the site. This is often done unintentionally, but it can result in several or even dozens of pages that have the same keyword target in the title and header tags. Figure 4-5 shows the problem.

Example of keyword cannibalization

Figure 4-5. Example of keyword cannibalization

Search engines will spider the pages on your site and see 4 (or 40) different pages, all seemingly relevant to one particular keyword (in the example in Figure 4-5, the keyword is snowboards). For clarity’s sake, Google doesn’t interpret this as meaning that your site as a whole is more relevant to snowboards or should rank higher than the competition. Instead, it forces Google to choose among the many versions of the page and pick the one it feels best fits the query. When this happens, you lose out on a number of rank-boosting features:

Internal anchor text

Since you’re pointing to so many different pages with the same subject, you can’t concentrate the value of internal anchor text on one target.

External links

If four sites link to one of your pages on snowboards, three sites link to another of your snowboard pages, and six sites link to yet another snowboard page, you’ve split up your external link value among three pages, rather than consolidating it into one.

Content quality

After three or four pages of writing about the same primary topic, the value of your content is going to suffer. You want the best possible single page to attract links and referrals, not a dozen bland, repetitive pages.

Conversion rate

If one page is converting better than the others, it is a waste to have multiple lower-converting versions targeting the same traffic. If you want to do conversion tracking, use a multiple-delivery testing system (either A/B or multivariate).

So, what’s the solution? Take a look at Figure 4-6.

Solution to keyword cannibalization

Figure 4-6. Solution to keyword cannibalization

The difference in this example is that instead of every page targeting the single term snowboards, the pages are focused on unique, valuable variations and all of them link back to an original, canonical source for the single term. Google can now easily identify the most relevant page for each of these queries. This isn’t just valuable to the search engines; it also represents a far better user experience and overall information architecture.

What should you do if you’ve already got a case of keyword cannibalization? Employ 301s liberally to eliminate pages competing with each other, or figure out how to differentiate them. Start by identifying all the pages in the architecture with this issue and determine the best page to point them to, and then use a 301 from each of the problem pages to the page you wish to retain. This ensures not only that visitors arrive at the right page, but also that the link equity and relevance built up over time are directing the engines to the most relevant and highest-ranking-potential page for the query.

Example: Fixing an Internal Linking Problem

Enterprise sites range between 10,000 and 10 million pages in size. For many of these types of sites, an inaccurate distribution of internal link juice is a significant problem. Figure 4-7 shows how this can happen.

Link juice distribution on a very large site

Figure 4-7. Link juice distribution on a very large site

Figure 4-7 is an illustration of the link juice distribution issue. Imagine that each of the tiny pages represents between 5,000 and 100,000 pages in an enterprise site. Some areas, such as blogs, articles, tools, popular news stories, and so on, might be receiving more than their fair share of internal link attention. Other areas—often business-centric and sales-centric content—tend to fall by the wayside. How do you fix this problem? Take a look at Figure 4-8.

The solution is simple, at least in principle: have the link-rich pages spread the wealth to their link-bereft brethren. As easy as this may sound, in execution it can be incredibly complex. Inside the architecture of a site with several hundred thousand or a million pages, it can be nearly impossible to identify link-rich and link-poor pages, never mind adding code that helps to distribute link juice equitably.

The answer, sadly, is labor-intensive from a programming standpoint. Enterprise site owners need to develop systems to track inbound links and/or rankings and build bridges (or, to be more consistent with Figure 4-8, spouts) that funnel juice between the link-rich and link-poor.

An alternative is simply to build a very flat site architecture that relies on relevance or semantic analysis. This strategy is more in line with the search engines’ guidelines (though slightly less perfect) and is certainly far less labor-intensive.

Interestingly, the massive increase in weight given to domain authority over the past two to three years appears to be an attempt by the search engines to overrule potentially poor internal link structures (as designing websites for PageRank flow doesn’t always serve users particularly well), and to reward sites that have great authority, trust, and high-quality inbound links.

Using cross-links to push link juice where you want it

Figure 4-8. Using cross-links to push link juice where you want it

Server and Hosting Issues

Thankfully, only a handful of server or web hosting dilemmas affect the practice of search engine optimization. However, when overlooked, they can spiral into massive problems, and so are worthy of review. The following are some server and hosting issues that can negatively impact search engine rankings:

Server timeouts

If a search engine makes a page request that isn’t served within the bot’s time limit (or that produces a server timeout response), your pages may not make it into the index at all, and will almost certainly rank very poorly (as no indexable text content has been found).

Slow response times

Although this is not as damaging as server timeouts, it still presents a potential issue. Not only will crawlers be less likely to wait for your pages to load, but surfers and potential linkers may choose to visit and link to other resources because accessing your site is problematic.

Shared IP addresses

Basic concerns include speed, the potential for having spammy or untrusted neighbors sharing your IP address, and potential concerns about receiving the full benefit of links to your IP address (discussed in more detail at

Blocked IP addresses

As search engines crawl the Web, they frequently find entire blocks of IP addresses filled with nothing but egregious web spam. Rather than blocking each individual site, engines do occasionally take the added measure of blocking an IP address or even an IP range. If you’re concerned, search for your IP address at Bing using the ip:address query.

Bot detection and handling

Some sys admins will go a bit overboard with protection and restrict access to files to any single visitor making more than a certain number of requests in a given time frame. This can be disastrous for search engine traffic, as it will constantly limit the spiders’ crawling ability.

Bandwidth and transfer limitations

Many servers have set limitations on the amount of traffic that can run through to the site. This can be potentially disastrous when content on your site becomes very popular and your host shuts off access. Not only are potential linkers prevented from seeing (and thus linking to) your work, but search engines are also cut off from spidering.

Server geography

This isn’t necessarily a problem, but it is good to be aware that search engines do use the location of the web server when determining where a site’s content is relevant from a local search perspective. Since local search is a major part of many sites’ campaigns and it is estimated that close to 40% of all queries have some local search intent, it is very wise to host in the country (it is not necessary to get more granular) where your content is most relevant.

The best content for your career. Discover unlimited learning on demand for around $1/day.