How to get google to crawl your angular application

Description

I first discovered there was a problem when I went to google’s webmaster tools -> fetch as google -> and submitted a page of one of my sites to be crawled. It had a hash “#” in the URL as that was the default routeProvider using the yeoman framework. To my surprise google would not crawl that page. After doing some research google stops indexing pages after a “#”, unless u do something kind of crazy to trick it.



Step 1: Enable HTML 5 routes in angular with bang prefix

Inside my app.js you need to enable html

.config(function ($routeProvider, $locationProvider) {
//
//
.otherwise({
        redirectTo: '/'
      });
      $locationProvider.hashPrefix('!');
      $locationProvider.html5Mode(true);
  });

This now takes all my URL’s from host.com/#/foo to host.com/foo. The biggest caveat to this approach is if I want to browse directly to host.com/foo my angular app throws a 404. This apparently is because the application didn’t load all the javascript it needed. I would have to go to the main page and click to each link I want to go. This simply won’t work – what production level site on the internet does this?

Step 2: NGinX magic

Come to find out if you go to host.com/#!/foo the page will load and then redirect you to host.com/foo and load everything for you. Well I don’t want people to see that happen in their browsers. Solution? Add a some rewrite logic to the nginx conf file to handle all these type of redirects for me.

  if ($args ~ "_escaped_fragment_=/?(.+)") {
    set $path $1;
    rewrite ^ /snapshots/$path;
  }




Conclusion

Now my site in production, which runs using angular’s routing, now has prettier URLs and can go directly to the page I need. I still need to figure out how to get my grunt serve to do the same magic as my nginx configuration. Every time livereload happens I get a 404. For more information on why I used a hashbang “#!” as my escaped_fragment – google has some good documentation they provided their developers as to why they make their search crawler act that way. When I made the change I went back to the Fetch as Google section and sent in my page with no hash marks in the middle. Google finally will be able to index all the pages of my angular app.

Comments are closed.