How are web pages removed

How To Remove URLs From Google Search (5 Methods)

There are many ways to remove URLs from Google, but there is no “one size fits all” approach. It all depends on the circumstances.

This is an important point to understand. Not only does using the wrong method sometimes result in pages not being removed from the index as intended, but it can also have a negative impact on SEO.

To help you quickly decide which method of removal is best for you, we've created a flowchart so you can jump to the appropriate section of the article.

Flowchart to help you decide how to remove your pages from Google.

In this post you will learn:

How to check if a url is indexed

I see SEOs very often using the "site:" operator to check whether a content is indexed in Google (e.g. site: https: //ahrefs.com). Site: Search queries can be useful for identifying the pages or sections of a website that could be problematic if they appear in search results. You have to be careful, however, as these are not normal searches and won't really tell you if a page is indexed. You can display pages that are known to Google, but that doesn't mean they can be displayed in normal search results without the "site:" operator.

For example, the "site:" search may still show pages that are redirected to another page or canonicalized. If you ask for a specific website, Google may show a page from that domain with the content, title, and description from a different domain. Take moz.com, for example, which used to be seomoz.org. All regular user inquiries that lead to pages on moz.com show moz.com in the SERPs, while site: seomoz.org shows seomoz.org in the SERPs as shown below.

The reason for this important distinction is that it can trick SEOs into making mistakes, such as actively blocking or removing URLs from the index for the old domain, which prevents consolidation of signals like PageRank. I've seen many cases with domain migrations where people believe they made a mistake during the migration because these pages are still showing up for site: old-domain.com and end up actively damaging their website while doing it try to "fix" the problem.

The better way to check indexing is to use the index coverage report in the google search console or the url inspection tool for a single url. These tools will tell you if a page is indexed and provide additional information on how Google is treating the page. If you don't have access to it, just search Google for the full URL of your page.

Screenshot of the URL inspection tool in the Google search console.

In Ahrefs, if you can find the page in our "Top Pages" report or in the organic keyword ranking, it usually means that we saw it in the ranking in normal searches and is a good indication that the page is indexing has been. Note that the pages were indexed when we saw them, but that may have changed. Check the date we last saw the page for a search query.

If there is a problem with a particular URL and it needs to be removed from the index, follow the flowchart at the beginning of the article to find the correct removal option, then skip to the appropriate section below.

Removal option 1: delete the contents

If you remove the page and return either a 404 (not found) or 410 (disappeared) status code, the page will be removed from the index shortly after the page is crawled again. Until it has been removed, the page may still appear in search results. And even if the page itself is no longer available, a cached version of the page may be temporarily available.

If you maybe need another option:

  • I need immediate removal. See the URL Removal Tool section.
  • I need to consolidate signals like links. See the section on canonicalization.
  • I need the page available to users. Check whether the Noindex or Restricted access sections are appropriate for your situation.

Distance option 2: Noindex

A noindex meta robots tag or an x ​​‑ robots header response instructs search engines to remove a page from the index. The meta robots tag works for pages where the x ‑ robots response works for pages and additional file types such as PDFs. In order for these tags to be displayed, a search engine must be able to crawl the pages - so make sure that they are not blocked in the robots.txt. Also note that removing pages from the index can prevent consolidation of links and other signals.

Example of a meta robots noindex:

<meta name="robots" content="noindex">

Example for the x ‑ robots noindex tag in the header response:

HTTP / 1.1 200 OK X-Robots-Tag: noindex

If you might need another option:

  • I don't want users to access these pages. See the section on restricted access.
  • I need to consolidate signals like links. See the section on canonicalization.

Removal option 3: Restrict access

If you want the page to be accessible to some users but not search engines, then one of these three options is likely to work for you:

  1. a kind of registration system (login);
  2. HTTP authentication (requiring a password to access);
  3. IP whitelisting (which only allows certain IP addresses to access the pages).

This type of setup is best for things like internal networks, member-only content, or staging, testing, or development sites. It allows a group of users to access the page, but search engines cannot access it and do not index the pages.

If you might need another option:

  • I need immediate removal. See the section on URL Removal Tool. In this particular case, if the content you are trying to hide has been cached and you need to prevent users from seeing that content, you may want more immediate removal.

Removal option 4: URL removal tool

The name for this tool from Google is easily misleading as it temporarily hides the content. Google will still see and crawl this content, but the pages will not be displayed to users. This temporary effect lasts for six months on Google, while Bing has a similar tool that works for three months. These tools should be used in the most extreme cases for things like security issues, data leaks, personally identifiable information (PII), etc. Use the Removals tool for Google and, in the case of Bing, take a look at how to block URLs.

You need to use another method along with using the removal tool to actually remove the pages for an extended period of time (noindex or delete) or prevent users from accessing the content if they still have the links (delete or restrict access ). This option just gives you a faster way to hide the pages while the removal has time to edit. The processing of the application can take up to a day.

Distance option 5: Canonization

If you have multiple versions of a page and want to consolidate signals such as links into a single version, the best practice is some form of canonicalization. The main purpose of this is to prevent duplicate content while simultaneously consolidating multiple versions of a page into a single indexed URL.

You have several options for canonicalization:

  • Canonical day. This will specify a different URL than the canonical version or the version to be displayed. If there are duplicate or very similar pages this should be fine. If pages are too different, the canonical version can be ignored as it is purely a hint and not an instruction.
  • Redirects. A redirect leads a user and a search robot from one side to the other. 301 is the most common redirect used by SEOs, and it tells search engines that the final URL should be the one that appears in the search results and where the signals are consolidated. A 302 or temporary redirect tells the search engines that the original URL should be the one that will remain in the index and that the signals will be consolidated there.
  • URL parameter handling. A parameter is appended to the end of the URL and usually contains a question mark, like ahrefs.com?dies=Parameter. With this tool from Google you can tell Google how to treat URLs with certain parameters. For example, you can specify whether the parameter changes the page content or whether it is only used to track usage.

How to prioritize distances

If you need to remove multiple pages from the Google index, they should be prioritized accordingly.

Highest priority: These pages are usually security-relevant or relate to confidential data. This includes content that contains personal information (PII), customer information, or information that is protected by copyright.

Medium priority: This is usually content that is intended for a specific group of users. Company intranets or employee portals, content that is only intended for members, as well as staging, test or development environments.

Low priority: These pages are usually duplicate content of some kind. Some examples of this would be pages served from multiple URLs, URLs with parameters, and again could include staging, test, or development environments.

Common mistakes to avoid when removing

I want to cover some of the URL removal errors that I see a lot and what happens in each scenario so that people can understand why they don't work.

Noindex in robots.txt

While Google used to unofficially support noindex in robots.txt, it was never an official standard, and they have now officially removed support. Many of the sites that did this mistakenly did so, thereby harming themselves.

Block crawling in robots.txt

Crawling is not the same as indexing. Even if Google is blocked from crawling pages, a page can be indexed if there are internal or external links to a page. Google won't know what's on the page as they don't crawl the page, but they do know a page exists and even write a title to show up in search results based on signals like the anchor text of links to the side.

Nofollow

This is often confused with noindex, and some people use it at the page level and expect the page not to be indexed. Nofollow is an indication, and while it originally prevented links on the page and individual links with the nofollow attribute from being crawled, it is no longer crawled. Google can now crawl these links if it so wishes. Nofollow was also used for individual links to try to prevent Google from crawling to specific pages, as well as for modeling PageRank. This also no longer works, as nofollow is an indication. If the page had a different link to it in the past, Google can still find out about this alternate crawl path.

Note that using this filter in the Page Explorer in the Ahrefs Site Audit Tool you can find nofollowed pages in large quantities.

Since it rarely makes sense not to follow all the links on a page, the number of results should be zero or close to zero. If the results are consistent, I urge you to check whether the nofollow directive was inadvertently added instead of noindex and, if so, choose a more appropriate removal method.

You can also find individual links that are marked with nofollow using this filter in the Link Explorer.

Noindex and canonical to another url

These signals are contradicting one another. Noindex says the page should be removed from the index and canonical says another page is the version that should be indexed. This could actually work for consolidation as Google typically ignores the noindex and instead uses the canonical signal as the main signal. However, this is not absolute behavior. There is an algorithm at play and there is a risk that the noindex tag could be the signal counted. If so, then the pages are not being properly consolidated.

Note that you can find unindexed pages with non-self-referencing canonicals using the filters below in the Page Explorer in the Site Audit tool:

Noindex, wait for Google to crawl, then block crawling

This is usually done in several ways:

  1. Pages are already blocked but get indexed, people add the noindex and release the blocking so that Google can crawl and see the noindex, and then block the pages from crawling again.
  2. People add noindex tags for the pages that should be removed and after Google crawls and processes the noindex tag, they block the pages from crawling.

Either way, it prevents the bottom line from being crawled. If you remember, we talked earlier about crawling isn't the same thing as indexing. Even if these pages are blocked, they can still end up in the index.

What if it's your content but not on a website that is yours?

If you own the content used on another website, you may be able to make a claim under the Digital Millennium Copyright Act (DMCA). You can use Google's copyright removal tool to perform what is known as a DMCA take down, which requires the removal of copyrighted material.

What if there is content about you but not on a website that you own?

If you live in the EU, you can have content that contains information about you removed thanks to a court order on the right to be forgotten. You can request that personal information be removed using the EU Privacy Removal form.

The easiest way to remove images from Google is to use robots.txt. While unofficial support for removing pages from robots.txt has been removed (as we mentioned earlier), simply prohibiting the crawling of images is the way to go to remove images.

For a single image:

User-agent: Googlebot-Image Disallow: /images/dogs.jpg For all images: User-agent: Googlebot-Image Disallow: /

Final thoughts

How you remove URLs is pretty situational. We've talked about several options, but if you're still confused which one is right for you, look again at the flowchart at the beginning.

You can also go through the legal troubleshooter provided by Google to remove content.

Do you have any questions? Let me know on Twitter.

Translated bysehrausch.de: Search engine & conversion optimization, online marketing & paid advertising. A perfect fit from a single source.