Why aren’t search engines indexing my content?


There are a number of reasons your content may not appear in search
engine results, however, it is important to note that
a search engine’s
index may contain pages that it doesn’t display in its
results
page
.

How to tell if your content is actually indexed

It may actually be difficult to tell if your content is indexed.

  • Search for all the documents from your site and see how many are listed
    • Google: enter site:example.com (where example.com is your domain, there must not be any space after the colon.)
    • Bing: enter site: example.com
    • Yahoo: enter site: example.com (or use advanced search form)
  • Search for a specific document by a unique sentence of eight to twelve words and search for that sentence in quotes. For example, to find this document, you might choose to search for “number of reasons your content may not appear in search engine results”
  • In addition to above, search for keywords using inurl: and intitle: you may try something like, keyword with another keyword inurl:example.com this will bring upi pages that are indexed only for specified domain.

    • Log into webmaster tools to see stats from the search engine itself about how many pages are indexed from the site
    • Google Webmaster Tools – Information is available under “Health” » “Index Status”. If you have submitted site maps, you can also see how many documents in each site map file have been indexed.
    • Bing Webmaster Tools

In some cases, documents may not appear to be indexed via one of
these methods, but documents can be found in the index using other
methods. For example, webmaster tools may report that few documents
are indexed even when you can search for their sentences and find the
documents on the search engine. In such a case, the documents are
actually indexed.

How content becomes indexed

Before search engines index content, they must find it using a
web crawler.
You should check your webserver’s logs to see whether
search engines’ crawlers (identified by their user agent – e.g.
Googlebot,
Bing/MSNbot)
are visiting your site.

Larger search engines like Google and Bing typically crawl sites
frequently, but the crawler may not know about new site. You can
notify search engines to the existence of your site by
registering as its webmaster (Google Webmaster
Tools
, Bing Webmaster
Tools
) or, if the search
engine does not provide this facility, submitting a link to its
crawlers (e.g. Yahoo).

How long has your site/content been online?

Search engines may index content very
quickly after it has been found, however, these updates are occasionally delayed.
Smaller search engines can also be much less responsive and take weeks to index
new content.

If your content has only been online for several days and does not have
any links from other sites (or its links come from sites which
crawlers do not visit frequently) it is probably not indexed.
If your site hasn’t been live for more than a few months, the search engines
may not trust it enough to index much content from it yet.

Has the content been excluded by the webmaster?

This step is especially important if you are taking over a site from
someone else and there is an issue with a specific page or directory:
check for
robots.txt
and META
robots

exclusions and remove them if you want crawlers to index the content
being excluded.

Is there a technical issue preventing your content from being indexed?

If you have an established site but specific content is not being
indexed (there are no web crawler hits on the URLs where the content
resides) the webmaster tools provided by Google and Bing may provide
useful diagnostic information.

Google’s Crawl
Errors

documentation provides extensive background on common problems for web
crawlers which prevent content from being indexed and, if you use
Google Webmaster Tools, you will receive an alert if any of these
issues are detected on your site.

Correct errors and misconfigurations as quickly as possible to ensure
that all of your site’s content is indexed.

Is the content low quality?

Search engines don’t index most pages they crawl. They only index the highest quality content. Search engines will not index content if:

  • It is spam, gibberish, or nonsense.
  • It is found elsewhere. When search engines find duplicate content, they choose only one of the duplicates to index. Usually that is the original that has more reputation and links.
  • It is thin. It needs more than a couple lines of original text. Preferably much more. Automatically created pages with little content such as a page for each of your users are unlikely to get indexed.
  • It doesn’t have enough reputation or links. A page may be buried too deep in your site to rank. Any page without external links and more than a few clicks from the home page is unlikely to get indexed.

Is some of your content indexed, but not all?

If your site has hundreds of pages, Google will almost never choose to index every single page. If you site has tens of thousands of pages, it is very common for Google to choose to index only a small portion of those pages.

Google chooses the number of pages to index from a site based on the site’s overall reputation and the quality of the content. Google typically indexes a larger percent of a site over time as the site’s reputation grows.