Everybody wants to get search engines attention for their public content. But there are situations when something improper leaks to public by mistake, and then we are trying to reduce harm caused by an incident.
That happened to one of our clients so I get a request to remove particular page from Google index and cache. Of course it wouldn't help much for content that was hosted for about 1.5 year - if people found it interesting, there is practically no possibility to remove that content from the Internet. But most people are seeing only Google results so decision was to target Google.
It seemed simple and obvious. The possible methods are:
Google support pages has detailed procedure about removing individual pages from index and from cache.
There are obvious assumptions that could be easily forgotten. Your site needs to have valid robots.txt file or be hosted without robots.txt. In case when the Google crawler engine assume that something is wring with your robots.txt it even don't move any further to update your site index or cache. After some tweaking it seemed that case is ended. I registered urgent request removal on webmaster tools. And waited for crawlers.
Everything should work as planned? Wrong!
After few days I get my "urgent URL removal request" denied. Reason?
"The content you submitted for cache removal appears on the live third-party page. ....
As you may know, information in our search results is actually located on third-party, publicly available webpages. Even if we removed this page from our index, the content in question would still be available on the web"
No clues about what that third-party page is. Probably crawler engine can't resolve issues caused by multihosting on my server plus couple of DNS names for requested site. I'm still trying to get more information about problem.
UPDATE
Adding second domain to Google webmaster tools helped. That time I requested page removal from cache. Indexed URL is actual also.
That happened to one of our clients so I get a request to remove particular page from Google index and cache. Of course it wouldn't help much for content that was hosted for about 1.5 year - if people found it interesting, there is practically no possibility to remove that content from the Internet. But most people are seeing only Google results so decision was to target Google.
It seemed simple and obvious. The possible methods are:
- filtering page by robots.txt
- meta tags for bots in page html
- 404 http answer
Google support pages has detailed procedure about removing individual pages from index and from cache.
There are obvious assumptions that could be easily forgotten. Your site needs to have valid robots.txt file or be hosted without robots.txt. In case when the Google crawler engine assume that something is wring with your robots.txt it even don't move any further to update your site index or cache. After some tweaking it seemed that case is ended. I registered urgent request removal on webmaster tools. And waited for crawlers.
Everything should work as planned? Wrong!
After few days I get my "urgent URL removal request" denied. Reason?
"The content you submitted for cache removal appears on the live third-party page. ....
As you may know, information in our search results is actually located on third-party, publicly available webpages. Even if we removed this page from our index, the content in question would still be available on the web"
No clues about what that third-party page is. Probably crawler engine can't resolve issues caused by multihosting on my server plus couple of DNS names for requested site. I'm still trying to get more information about problem.
UPDATE
Adding second domain to Google webmaster tools helped. That time I requested page removal from cache. Indexed URL is actual also.
No comments:
Post a Comment