Google indexing pages despite explicitly blocked in robots.txt

UPDATE: I had always (wrongly) thought that Google wouldn’t index pages that were blocked in robots.txt. But John Mueller clarified this for me (thank you John): robots.txt will control the crawling BUT NOT the indexing. Good explanation of that is here:


Google is now sometimes indexing pages DESPITE explicitly being blocked in robots.txt:

Google Search Console report showing blocked pages indexed anyways

The screenshot above is from the new Search Console index coverage report, and shows that Google is choosing to index 36 pages that they see are explicitly blocked in robots.txt.

Hacking Google MyMaps

One of my clients discovered this latest dirty trick for ranking, and it’s being used by dirtbags who have pirated content from legit publishers.

If you do this search, you’ll see the 3 of the top organic listings are from a very trusted domain…

The problem is, each of these is a user-generated MyMaps page, and it’s just a crappy page with a link to the download and some text.

Google MyMaps hack

Reporting this to Google now…and, the publisher of the legit content is submitting a DMCA take-down.

Lesson of the day: never assume a damn thing

On my website, I display a map of where my clients are, all over the world. Mostly because I like messing around with Google Maps via the Maps Javascript API–it’s really powerful and you can do some really cool things with it.

All of a sudden, my map went wonky. The markers would all show, but no background map or imagery would show at all!

I spent HOURS today trolling through StackOverflow, forums, etc., and finally discovered the problem.
