বুধবার, ৩১ মে, ২০১৭

Optimizing AngularJS Single-Page Applications for Googlebot Crawlers

Optimizing AngularJS Single-Page Applications for Googlebot Crawlers http://ift.tt/2rTk79h

Posted by jrridley

It’s almost certain that you’ve encountered AngularJS on the web somewhere, even if you weren’t aware of it at the time. Here’s a list of just a few sites using Angular:

  • Upwork.com
  • Freelancer.com
  • Udemy.com
  • Youtube.com

Any of those look familiar? If so, it’s because AngularJS is taking over the Internet. There’s a good reason for that: Angular- and other React-style frameworks make for a better user and developer experience on a site. For background, AngularJS and ReactJS are part of a web design movement called single-page applications, or SPAs. While a traditional website loads each individual page as the user navigates the site, including calls to the server and cache, loading resources, and rendering the page, SPAs cut out much of the back-end activity by loading the entire site when a user first lands on a page. Instead of loading a new page each time you click on a link, the site dynamically updates a single HTML page as the user interacts with the site.

image001.png

Image c/o Microsoft

Why is this movement taking over the Internet? With SPAs, users are treated to a screaming fast site through which they can navigate almost instantaneously, while developers have a template that allows them to customize, test, and optimize pages seamlessly and efficiently. AngularJS and ReactJS use advanced Javascript templates to render the site, which means the HTML/CSS page speed overhead is almost nothing. All site activity runs behind the scenes, out of view of the user.

Unfortunately, anyone who’s tried performing SEO on an Angular or React site knows that the site activity is hidden from more than just site visitors: it’s also hidden from web crawlers. Crawlers like Googlebot rely heavily on HTML/CSS data to render and interpret the content on a site. When that HTML content is hidden behind website scripts, crawlers have no website content to index and serve in search results.

Of course, Google claims they can crawl Javascript (and SEOs have tested and supported this claim), but even if that is true, Googlebot still struggles to crawl sites built on a SPA framework. One of the first issues we encountered when a client first approached us with an Angular site was that nothing beyond the homepage was appearing in the SERPs. ScreamingFrog crawls uncovered the homepage and a handful of other Javascript resources, and that was it.

SF Javascript.png

Another common issue is recording Google Analytics data. Think about it: Analytics data is tracked by recording pageviews every time a user navigates to a page. How can you track site analytics when there’s no HTML response to trigger a pageview?

After working with several clients on their SPA websites, we’ve developed a process for performing SEO on those sites. By using this process, we’ve not only enabled SPA sites to be indexed by search engines, but even to rank on the first page for keywords.

5-step solution to SEO for AngularJS

  1. Make a list of all pages on the site
  2. Install Prerender
  3. “Fetch as Google”
  4. Configure Analytics
  5. Recrawl the site

1) Make a list of all pages on your site

If this sounds like a long and tedious process, that’s because it definitely can be. For some sites, this will be as easy as exporting the XML sitemap for the site. For other sites, especially those with hundreds or thousands of pages, creating a comprehensive list of all the pages on the site can take hours or days. However, I cannot emphasize enough how helpful this step has been for us. Having an index of all pages on the site gives you a guide to reference and consult as you work on getting your site indexed. It’s almost impossible to predict every issue that you’re going to encounter with an SPA, and if you don’t have an all-inclusive list of content to reference throughout your SEO optimization, it’s highly likely you’ll leave some part of the site un-indexed by search engines inadvertently.

One solution that might enable you to streamline this process is to divide content into directories instead of individual pages. For example, if you know that you have a list of storeroom pages, include your /storeroom/ directory and make a note of how many pages that includes. Or if you have an e-commerce site, make a note of how many products you have in each shopping category and compile your list that way (though if you have an e-commerce site, I hope for your own sake you have a master list of products somewhere). Regardless of what you do to make this step less time-consuming, make sure you have a full list before continuing to step 2.

2) Install Prerender

Prerender is going to be your best friend when performing SEO for SPAs. Prerender is a service that will render your website in a virtual browser, then serve the static HTML content to web crawlers. From an SEO standpoint, this is as good of a solution as you can hope for: users still get the fast, dynamic SPA experience while search engine crawlers can identify indexable content for search results.

Prerender’s pricing varies based on the size of your site and the freshness of the cache served to Google. Smaller sites (up to 250 pages) can use Prerender for free, while larger sites (or sites that update constantly) may need to pay as much as $200+/month. However, having an indexable version of your site that enables you to attract customers through organic search is invaluable. This is where that list you compiled in step 1 comes into play: if you can prioritize what sections of your site need to be served to search engines, or with what frequency, you may be able to save a little bit of money each month while still achieving SEO progress.

3) "Fetch as Google"

Within Google Search Console is an incredibly useful feature called “Fetch as Google.” “Fetch as Google” allows you to enter a URL from your site and fetch it as Googlebot would during a crawl. “Fetch” returns the HTTP response from the page, which includes a full download of the page source code as Googlebot sees it. “Fetch and Render” will return the HTTP response and will also provide a screenshot of the page as Googlebot saw it and as a site visitor would see it.

This has powerful applications for AngularJS sites. Even with Prerender installed, you may find that Google is still only partially displaying your website, or it may be omitting key features of your site that are helpful to users. Plugging the URL into “Fetch as Google” will let you review how your site appears to search engines and what further steps you may need to take to optimize your keyword rankings. Additionally, after requesting a “Fetch” or “Fetch and Render,” you have the option to “Request Indexing” for that page, which can be handy catalyst for getting your site to appear in search results.

4) Configure Google Analytics (or Google Tag Manager)

As I mentioned above, SPAs can have serious trouble with recording Google Analytics data since they don’t track pageviews the way a standard website does. Instead of the traditional Google Analytics tracking code, you’ll need to install Analytics through some kind of alternative method.

One method that works well is to use the Angulartics plugin. Angulartics replaces standard pageview events with virtual pageview tracking, which tracks the entire user navigation across your application. Since SPAs dynamically load HTML content, these virtual pageviews are recorded based on user interactions with the site, which ultimately tracks the same user behavior as you would through traditional Analytics. Other people have found success using Google Tag Manager “History Change” triggers or other innovative methods, which are perfectly acceptable implementations. As long as your Google Analytics tracking records user interactions instead of conventional pageviews, your Analytics configuration should suffice.

5) Recrawl the site

After working through steps 1–4, you’re going to want to crawl the site yourself to find those errors that not even Googlebot was anticipating. One issue we discovered early with a client was that after installing Prerender, our crawlers were still running into a spider trap:

As you can probably tell, there were not actually 150,000 pages on that particular site. Our crawlers just found a recursive loop that kept generating longer and longer URL strings for the site content. This is something we would not have found in Google Search Console or Analytics. SPAs are notorious for causing tedious, inexplicable issues that you’ll only uncover by crawling the site yourself. Even if you follow the steps above and take as many precautions as possible, I can still almost guarantee you will come across a unique issue that can only be diagnosed through a crawl.

If you’ve come across any of these unique issues, let me know in the comments! I’d love to hear what other issues people have encountered with SPAs.

Results

As I mentioned earlier in the article, the process outlined above has enabled us to not only get client sites indexed, but even to get those sites ranking on first page for various keywords. Here’s an example of the keyword progress we made for one client with an AngularJS site:

Also, the organic traffic growth for that client over the course of seven months:

All of this goes to show that although SEO for SPAs can be tedious, laborious, and troublesome, it is not impossible. Follow the steps above, and you can have SEO success with your single-page app website.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

Optimizing AngularJS Single-Page Applications for Googlebot Crawlers

Optimizing AngularJS Single-Page Applications for Googlebot Crawlers http://ift.tt/2rTk79h

Posted by jrridley

It’s almost certain that you’ve encountered AngularJS on the web somewhere, even if you weren’t aware of it at the time. Here’s a list of just a few sites using Angular:

  • Upwork.com
  • Freelancer.com
  • Udemy.com
  • Youtube.com

Any of those look familiar? If so, it’s because AngularJS is taking over the Internet. There’s a good reason for that: Angular- and other React-style frameworks make for a better user and developer experience on a site. For background, AngularJS and ReactJS are part of a web design movement called single-page applications, or SPAs. While a traditional website loads each individual page as the user navigates the site, including calls to the server and cache, loading resources, and rendering the page, SPAs cut out much of the back-end activity by loading the entire site when a user first lands on a page. Instead of loading a new page each time you click on a link, the site dynamically updates a single HTML page as the user interacts with the site.

image001.png

Image c/o Microsoft

Why is this movement taking over the Internet? With SPAs, users are treated to a screaming fast site through which they can navigate almost instantaneously, while developers have a template that allows them to customize, test, and optimize pages seamlessly and efficiently. AngularJS and ReactJS use advanced Javascript templates to render the site, which means the HTML/CSS page speed overhead is almost nothing. All site activity runs behind the scenes, out of view of the user.

Unfortunately, anyone who’s tried performing SEO on an Angular or React site knows that the site activity is hidden from more than just site visitors: it’s also hidden from web crawlers. Crawlers like Googlebot rely heavily on HTML/CSS data to render and interpret the content on a site. When that HTML content is hidden behind website scripts, crawlers have no website content to index and serve in search results.

Of course, Google claims they can crawl Javascript (and SEOs have tested and supported this claim), but even if that is true, Googlebot still struggles to crawl sites built on a SPA framework. One of the first issues we encountered when a client first approached us with an Angular site was that nothing beyond the homepage was appearing in the SERPs. ScreamingFrog crawls uncovered the homepage and a handful of other Javascript resources, and that was it.

SF Javascript.png

Another common issue is recording Google Analytics data. Think about it: Analytics data is tracked by recording pageviews every time a user navigates to a page. How can you track site analytics when there’s no HTML response to trigger a pageview?

After working with several clients on their SPA websites, we’ve developed a process for performing SEO on those sites. By using this process, we’ve not only enabled SPA sites to be indexed by search engines, but even to rank on the first page for keywords.

5-step solution to SEO for AngularJS

  1. Make a list of all pages on the site
  2. Install Prerender
  3. “Fetch as Google”
  4. Configure Analytics
  5. Recrawl the site

1) Make a list of all pages on your site

If this sounds like a long and tedious process, that’s because it definitely can be. For some sites, this will be as easy as exporting the XML sitemap for the site. For other sites, especially those with hundreds or thousands of pages, creating a comprehensive list of all the pages on the site can take hours or days. However, I cannot emphasize enough how helpful this step has been for us. Having an index of all pages on the site gives you a guide to reference and consult as you work on getting your site indexed. It’s almost impossible to predict every issue that you’re going to encounter with an SPA, and if you don’t have an all-inclusive list of content to reference throughout your SEO optimization, it’s highly likely you’ll leave some part of the site un-indexed by search engines inadvertently.

One solution that might enable you to streamline this process is to divide content into directories instead of individual pages. For example, if you know that you have a list of storeroom pages, include your /storeroom/ directory and make a note of how many pages that includes. Or if you have an e-commerce site, make a note of how many products you have in each shopping category and compile your list that way (though if you have an e-commerce site, I hope for your own sake you have a master list of products somewhere). Regardless of what you do to make this step less time-consuming, make sure you have a full list before continuing to step 2.

2) Install Prerender

Prerender is going to be your best friend when performing SEO for SPAs. Prerender is a service that will render your website in a virtual browser, then serve the static HTML content to web crawlers. From an SEO standpoint, this is as good of a solution as you can hope for: users still get the fast, dynamic SPA experience while search engine crawlers can identify indexable content for search results.

Prerender’s pricing varies based on the size of your site and the freshness of the cache served to Google. Smaller sites (up to 250 pages) can use Prerender for free, while larger sites (or sites that update constantly) may need to pay as much as $200+/month. However, having an indexable version of your site that enables you to attract customers through organic search is invaluable. This is where that list you compiled in step 1 comes into play: if you can prioritize what sections of your site need to be served to search engines, or with what frequency, you may be able to save a little bit of money each month while still achieving SEO progress.

3) "Fetch as Google"

Within Google Search Console is an incredibly useful feature called “Fetch as Google.” “Fetch as Google” allows you to enter a URL from your site and fetch it as Googlebot would during a crawl. “Fetch” returns the HTTP response from the page, which includes a full download of the page source code as Googlebot sees it. “Fetch and Render” will return the HTTP response and will also provide a screenshot of the page as Googlebot saw it and as a site visitor would see it.

This has powerful applications for AngularJS sites. Even with Prerender installed, you may find that Google is still only partially displaying your website, or it may be omitting key features of your site that are helpful to users. Plugging the URL into “Fetch as Google” will let you review how your site appears to search engines and what further steps you may need to take to optimize your keyword rankings. Additionally, after requesting a “Fetch” or “Fetch and Render,” you have the option to “Request Indexing” for that page, which can be handy catalyst for getting your site to appear in search results.

4) Configure Google Analytics (or Google Tag Manager)

As I mentioned above, SPAs can have serious trouble with recording Google Analytics data since they don’t track pageviews the way a standard website does. Instead of the traditional Google Analytics tracking code, you’ll need to install Analytics through some kind of alternative method.

One method that works well is to use the Angulartics plugin. Angulartics replaces standard pageview events with virtual pageview tracking, which tracks the entire user navigation across your application. Since SPAs dynamically load HTML content, these virtual pageviews are recorded based on user interactions with the site, which ultimately tracks the same user behavior as you would through traditional Analytics. Other people have found success using Google Tag Manager “History Change” triggers or other innovative methods, which are perfectly acceptable implementations. As long as your Google Analytics tracking records user interactions instead of conventional pageviews, your Analytics configuration should suffice.

5) Recrawl the site

After working through steps 1–4, you’re going to want to crawl the site yourself to find those errors that not even Googlebot was anticipating. One issue we discovered early with a client was that after installing Prerender, our crawlers were still running into a spider trap:

As you can probably tell, there were not actually 150,000 pages on that particular site. Our crawlers just found a recursive loop that kept generating longer and longer URL strings for the site content. This is something we would not have found in Google Search Console or Analytics. SPAs are notorious for causing tedious, inexplicable issues that you’ll only uncover by crawling the site yourself. Even if you follow the steps above and take as many precautions as possible, I can still almost guarantee you will come across a unique issue that can only be diagnosed through a crawl.

If you’ve come across any of these unique issues, let me know in the comments! I’d love to hear what other issues people have encountered with SPAs.

Results

As I mentioned earlier in the article, the process outlined above has enabled us to not only get client sites indexed, but even to get those sites ranking on first page for various keywords. Here’s an example of the keyword progress we made for one client with an AngularJS site:

Also, the organic traffic growth for that client over the course of seven months:

All of this goes to show that although SEO for SPAs can be tedious, laborious, and troublesome, it is not impossible. Follow the steps above, and you can have SEO success with your single-page app website.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

মঙ্গলবার, ৩০ মে, ২০১৭

No, Paid Search Audiences Won’t Replace Keywords

No, Paid Search Audiences Won’t Replace Keywords http://ift.tt/2r6mRPa

Posted by PPCKirk

I have been chewing on a keyword vs. audience targeting post for roughly two years now. In that time we have seen audience targeting grow in popularity (as expected) and depth.

“Popularity” is somewhat of an understatement here. I would go so far as to say that I've heard it lauded in messianic-like “thy kingdom come, thy will be done” reverential awe by some paid search marketers. as if paid search were lacking a heartbeat before the life-giving audience targeting had arrived and 1-2-3-clear’ed it into relevance.

However, I would argue that despite audience targeting’s popularity (and understandable success), we have also seen the revelation of some weaknesses as well. It turns out it’s not quite the heroic, rescue-the-captives targeting method paid searchers had hoped it would be.

The purpose of this post is to argue against the notion that audience targeting can replace the keyword in paid search.

Now, before we get into the throes of keyword philosophy, I’d like to reduce the number of angry comments this post receives by acknowledging a crucial point.

It is not my intention in any way to set up a false dichotomy. Yes, I believe the keyword is still the most valuable form of targeting for a paid search marketer, but I also believe that audience targeting can play a valuable complementary role in search bidding.

In fact, as I think about it, I would argue that I am writing this post in response to what I have heard become a false dichotomy. That is, that audience targeting is better than keyword targeting and will eventually replace it.

I disagree with this idea vehemently, as I will demonstrate in the rest of this article.

One seasoned (age, not steak) traditional marketer’s point of view

The best illustration I've heard on the core weakness of audience targeting was from an older traditional marketer who has probably never accessed the Keyword Planner in his life.

“I have two teenage daughters.” He revealed, with no small amount of pride.

“They are within 18 months of each other, so in age demographic targeting they are the same person.”

“They are both young women, so in gender demographic targeting they are the same person.”

“They are both my daughters in my care, so in income demographic targeting they are the same person.”

“They are both living in my house, so in geographical targeting they are the same person.”

“They share the same friends, so in social targeting they are the same person.”

“However, in terms of personality, they couldn’t be more different. One is artistic and enjoys heels and dresses and makeup. The other loves the outdoors and sports, and spends her time in blue jeans and sneakers.”

If an audience-targeting marketer selling spring dresses saw them in his marketing list, he would (1) see two older high school girls with the same income in the same geographical area, (2) assume they are both interested in what he has to sell, and (3) only make one sale.

The problem isn’t with his targeting, the problem is that not all those forced into an audience persona box will fit.

In September of 2015, Aaron Levy (a brilliant marketing mind; go follow him) wrote a fabulously under-shared post revealing these weaknesses in another way: What You Think You Know About Your Customers’ Persona is Wrong

In this article, Aaron first bravely broaches the subject of audience targeting by describing how it is far from the exact science we all have hoped it to be. He noted a few ways that audience targeting can be erroneous, and even *gasp* used data to formulate his conclusions.

It’s OK to question audience targeting — really!

Let me be clear: I believe audience targeting is popular because there genuinely is value in it (it's amazing data to have… when it's accurate!). The insights we can get about personas, which we can then use to power our ads, are quite amazing and powerful.

So, why the heck am I droning on about audience targeting weaknesses? Well, I’m trying to set you up for something. I’m trying to get us to admit that audience targeting itself has some weaknesses, and isn’t the savior of all digital marketing that some make it out to be, and that there is a tried-and-true solution that fits well with demographic targeting, but is not replaced by it. It is a targeting that we paid searchers have used joyfully and successfully for years now.

It is the keyword.

Whereas audience targeting chafes under the law of averages (i.e., “at some point, someone in my demographic targeted list has to actually be interested in what I am selling”), keyword targeting shines in individual-revealing user intent.

Keyword targeting does something an audience can never, ever, ever do...

Keywords: Personal intent powerhouses

A keyword is still my favorite form of targeting in paid search because it reveals individual, personal, and temporal intent. Those aren’t just three buzzwords I pulled out of the air because I needed to stretch this already obesely-long post out further. They are intentional, and worth exploring.

Individual

A keyword is such a powerful targeting method because it is written (or spoken!) by a single person. I mean, let’s be honest, it’s rare to have more than one person huddled around the computer shouting at it. Keywords are generally from the mind of one individual, and because of that they have frightening potential.

Remember, audience targeting is based off of assumptions. That is, you're taking a group of people who “probably” think the same way in a certain area, but does that mean they cannot have unique tastes? For instance, one person preferring to buy sneakers with another preferring to buy heels?

Keyword targeting is demographic-blind.

It doesn’t care who you are, where you’re from, what you did, as long as you love me… err, I mean, it doesn’t care about your demographic, just about what you're individually interested in.

Personal

The next aspect of keywords powering their targeting awesomeness is that they reveal personal intent. Whereas the “individual” aspect of keyword targeting narrows our targeting from a group of people to a single person, the “personal” aspect of keyword targeting goes into the very mind of that individual.

Don’t you wish there was a way to market to people in which you could truly discern the intentions of their hearts? Wouldn’t that be a powerful method of targeting? Well, yes — and that is keyword targeting!

Think about it: a keyword is a form of communication. It is a person typing or telling you what is on their mind. For a split second, in their search, you and they are as connected through communication as Alexander Graham Bell and Thomas Watson on the first phone call. That person is revealing to you what's on her mind, and that's a power which cannot be underestimated.

When a person tells Google they want to know “how does someone earn a black belt,” that is telling your client — the Jumping Judo Janes of Jordan — this person genuinely wants to learn more about their services and they can display an ad that matches that intent (Ready for that Black Belt? It’s Not Hard, Let Us Help!). Paid search keywords officiate the wedding of personal intent with advertising in a way that previous marketers could only dream of. We aren’t finding random people we think might be interested based upon where they live. We are responding to a person telling us they are interested.

Temporal

The final note of keyword targeting that cannot be underestimated, is the temporal aspect. Anyone worth their salt in marketing can tell you “timing is everything”. With keyword targeting, the timing is inseparable from the intent. When is this person interested in learning about your Judo classes? At the time they are searching, NOW!

You are not blasting your ads into your users lives, interrupting them as they go about their business or family time hoping to jumpstart their interest by distracting them from their activities. You are responding to their query, at the very time they are interested in learning more.

Timing. Is. Everything.

The situation settles into stickiness

Thus, to summarize: a “search” is done when an individual reveals his/her personal intent with communication (keywords/queries) at a specific time. Because of that, I maintain that keyword targeting trumps audience targeting in paid search.

Paid search is an evolving industry, but it is still “search,” which requires communication, which requires words (until that time when the emoji takes over the English language, but that’s okay because the rioting in the streets will have gotten us first).

Of course, we would be remiss in ignoring some legitimate questions which inevitably arise. As ideal as the outline I've laid out before you sounds, you're probably beginning to formulate something like the following four questions.

  • What about low search volume keywords?
  • What if the search engines kill keyword targeting?
  • What if IoT monsters kill search engines?
  • What about social ads?

We’ll close by discussing each of these four questions.

Low search volume terms (LSVs)

Low search volume keywords stink like poo (excuse the rather strong language there). I’m not sure if there is any data on this out there (if so, please share it below), but I have run into low search volume terms far more in the past year than when I first started managing PPC campaigns in 2010.

I don’t know all the reasons for this; perhaps it’s worth another blog post, but the reality is it’s getting harder to be creative and target high-value long-tail keywords when so many are getting shut off due to low search volume.

This seems like a fairly smooth way being paved for Google/Bing to eventually “take over” (i.e., “automate for our good”) keyword targeting, at the very least for SMBs (small-medium businesses) where LSVs can be a significant problem. In this instance, the keyword would still be around, it just wouldn’t be managed by us PPCers directly. Boo.

Search engine decrees

I’ve already addressed the power search engines have here, but I will be the first to admit that, as much as I like keyword targeting and as much as I have hopefully proven how valuable it is, it still would be a fairly easy thing for Google or Bing to kill off completely. Major boo.

Since paid search relies on keywords and queries and language to work, I imagine this would look more like an automated solution (think DSAs and shopping), in which they make keyword targeting into a dynamic system that works in conjunction with audience targeting.

While this was about a year and a half ago, it is worth noting that at Hero Conference in London, Bing Ads’ ebullient Tor Crockett did make the public statement that Bing at the time had no plans to sunset the keyword as a bidding option. We can only hope this sentiment remains, and transfers over to Google as well.

But Internet of Things (IoT) Frankenstein devices!

Finally, it could be that search engines won’t be around forever. Perhaps this will look like IoT devices such as Alexa that incorporate some level of search into them, but pull traffic away from using Google/Bing search bars. As an example of this in real life, you don’t need to ask Google where to find (queries, keywords, communication, search) the best price on laundry detergent if you can just push the Dash button, or your smart washing machine can just order you more without a search effort.

Image source

On the other hand, I still believe we're a long way off from this in the same way that the freak-out over mobile devices killing personal computers has slowed down. That is, we still utilize our computers for education & work (even if personal usage revolves around tablets and mobile devices and IoT freaks-of-nature… smart toasters anyone?) and our mobile devices for queries on the go. Computers are still a primary source of search in terms of work and education as well as more intensive personal activities (vacation planning, for instance), and thus computers still rely heavily on search. Mobile devices are still heavily query-centered for various tasks, especially as voice search (still query-centered!) kicks in harder.

The social effect

Social is its own animal in a way, and why I believe it is already and will continue to have an effect on search and keywords (though not in a terribly worrisome way). Social definitely pulls a level of traffic from search, specifically in product queries. “Who has used this dishwasher before, any other recommendations?” Social ads are exploding in popularity as well, and in large part because they are working. People are purchasing more than they ever have from social ads and marketers are rushing to be there for them.

The flip side of this: a social and paid search comparison is apples-to-oranges. There are different motivations and purposes for using search engines and querying your friends.

Audience targeting works great in a social setting since that social network has phenomenally accurate and specific targeting for individuals, but it is the rare individual curious about the ideal condom to purchase who queries his family and friends on Facebook. There will always be elements of social and search that are unique and valuable in their own way, and audience targeting for social and keyword targeting for search complement those unique elements of each.

Idealism incarnate

Thus, it is my belief that as long as we have search, we will still have keywords and keyword targeting will be the best way to target — as long as costs remain low enough to be realistic for budgets and the search engines don’t kill keyword bidding for an automated solution.

Don’t give up, the keyword is not dead. Stay focused, and carry on with your match types!

I want to close by re-acknowledging the crucial point I opened with.

It has not been my intention in any way to set up a false dichotomy. In fact, as I think about it, I would argue that I am writing this in response to what I have heard become a false dichotomy. That is, that audience targeting is better than keyword targeting and will eventually replace it…

I believe the keyword is still the most valuable form of targeting for a paid search marketer, but I also believe that audience demographics can play a valuable complementary role in bidding.

A prime example that we already use is remarketing lists for search ads, in which we can layer on remarketing audiences in both Google and Bing into our search queries. Wouldn’t it be amazing if we could someday do this with massive amounts of audience data? I've said this before, but were Bing Ads to use its LinkedIn acquisition to allow us to layer on LinkedIn audiences into our current keyword framework, the B2B angels would surely rejoice over us (Bing has responded, by the way, that something is in the works!).

Either way, I hope I've demonstrated that far from being on its deathbed, the keyword is still the most essential tool in the paid search marketer’s toolbox.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

সোমবার, ২৯ মে, ২০১৭

Evidence of the Surprising State of JavaScript Indexing

Evidence of the Surprising State of JavaScript Indexing http://ift.tt/2qxWjmk

Posted by willcritchlow

Back when I started in this industry, it was standard advice to tell our clients that the search engines couldn’t execute JavaScript (JS), and anything that relied on JS would be effectively invisible and never appear in the index. Over the years, that has changed gradually, from early work-arounds (such as the horrible escaped fragment approach my colleague Rob wrote about back in 2010) to the actual execution of JS in the indexing pipeline that we see today, at least at Google.

In this article, I want to explore some things we've seen about JS indexing behavior in the wild and in controlled tests and share some tentative conclusions I've drawn about how it must be working.

A brief introduction to JS indexing

At its most basic, the idea behind JavaScript-enabled indexing is to get closer to the search engine seeing the page as the user sees it. Most users browse with JavaScript enabled, and many sites either fail without it or are severely limited. While traditional indexing considers just the raw HTML source received from the server, users typically see a page rendered based on the DOM (Document Object Model) which can be modified by JavaScript running in their web browser. JS-enabled indexing considers all content in the rendered DOM, not just that which appears in the raw HTML.

There are some complexities even in this basic definition (answers in brackets as I understand them):

  • What about JavaScript that requests additional content from the server? (This will generally be included, subject to timeout limits)
  • What about JavaScript that executes some time after the page loads? (This will generally only be indexed up to some time limit, possibly in the region of 5 seconds)
  • What about JavaScript that executes on some user interaction such as scrolling or clicking? (This will generally not be included)
  • What about JavaScript in external files rather than in-line? (This will generally be included, as long as those external files are not blocked from the robot — though see the caveat in experiments below)

For more on the technical details, I recommend my ex-colleague Justin’s writing on the subject.

A high-level overview of my view of JavaScript best practices

Despite the incredible work-arounds of the past (which always seemed like more effort than graceful degradation to me) the “right” answer has existed since at least 2012, with the introduction of PushState. Rob wrote about this one, too. Back then, however, it was pretty clunky and manual and it required a concerted effort to ensure both that the URL was updated in the user’s browser for each view that should be considered a “page,” that the server could return full HTML for those pages in response to new requests for each URL, and that the back button was handled correctly by your JavaScript.

Along the way, in my opinion, too many sites got distracted by a separate prerendering step. This is an approach that does the equivalent of running a headless browser to generate static HTML pages that include any changes made by JavaScript on page load, then serving those snapshots instead of the JS-reliant page in response to requests from bots. It typically treats bots differently, in a way that Google tolerates, as long as the snapshots do represent the user experience. In my opinion, this approach is a poor compromise that's too susceptible to silent failures and falling out of date. We've seen a bunch of sites suffer traffic drops due to serving Googlebot broken experiences that were not immediately detected because no regular users saw the prerendered pages.

These days, if you need or want JS-enhanced functionality, more of the top frameworks have the ability to work the way Rob described in 2012, which is now called isomorphic (roughly meaning “the same”).

Isomorphic JavaScript serves HTML that corresponds to the rendered DOM for each URL, and updates the URL for each “view” that should exist as a separate page as the content is updated via JS. With this implementation, there is actually no need to render the page to index basic content, as it's served in response to any fresh request.

I was fascinated by this piece of research published recently — you should go and read the whole study. In particular, you should watch this video (recommended in the post) in which the speaker — who is an Angular developer and evangelist — emphasizes the need for an isomorphic approach:

Resources for auditing JavaScript

If you work in SEO, you will increasingly find yourself called upon to figure out whether a particular implementation is correct (hopefully on a staging/development server before it’s deployed live, but who are we kidding? You’ll be doing this live, too).

To do that, here are some resources I’ve found useful:

Some surprising/interesting results

There are likely to be timeouts on JavaScript execution

I already linked above to the ScreamingFrog post that mentions experiments they have done to measure the timeout Google uses to determine when to stop executing JavaScript (they found a limit of around 5 seconds).

It may be more complicated than that, however. This segment of a thread is interesting. It's from a Hacker News user who goes by the username KMag and who claims to have worked at Google on the JS execution part of the indexing pipeline from 2006–2010. It’s in relation to another user speculating that Google would not care about content loaded “async” (i.e. asynchronously — in other words, loaded as part of new HTTP requests that are triggered in the background while assets continue to download):

“Actually, we did care about this content. I'm not at liberty to explain the details, but we did execute setTimeouts up to some time limit.

If they're smart, they actually make the exact timeout a function of a HMAC of the loaded source, to make it very difficult to experiment around, find the exact limits, and fool the indexing system. Back in 2010, it was still a fixed time limit.”

What that means is that although it was initially a fixed timeout, he’s speculating (or possibly sharing without directly doing so) that timeouts are programmatically determined (presumably based on page importance and JavaScript reliance) and that they may be tied to the exact source code (the reference to “HMAC” is to do with a technical mechanism for spotting if the page has changed).

It matters how your JS is executed

I referenced this recent study earlier. In it, the author found:

Inline vs. External vs. Bundled JavaScript makes a huge difference for Googlebot

The charts at the end show the extent to which popular JavaScript frameworks perform differently depending on how they're called, with a range of performance from passing every test to failing almost every test. For example here’s the chart for Angular:

Slide5.PNG

It’s definitely worth reading the whole thing and reviewing the performance of the different frameworks. There's more evidence of Google saving computing resources in some areas, as well as surprising results between different frameworks.

CRO tests are getting indexed

When we first started seeing JavaScript-based split-testing platforms designed for testing changes aimed at improving conversion rate (CRO = conversion rate optimization), their inline changes to individual pages were invisible to the search engines. As Google in particular has moved up the JavaScript competency ladder through executing simple inline JS to more complex JS in external files, we are now seeing some CRO-platform-created changes being indexed. A simplified version of what’s happening is:

  • For users:
    • CRO platforms typically take a visitor to a page, check for the existence of a cookie, and if there isn’t one, randomly assign the visitor to group A or group B
    • Based on either the cookie value or the new assignment, the user is either served the page unchanged, or sees a version that is modified in their browser by JavaScript loaded from the CRO platform’s CDN (content delivery network)
    • A cookie is then set to make sure that the user sees the same version if they revisit that page later
  • For Googlebot:
    • The reliance on external JavaScript used to prevent both the bucketing and the inline changes from being indexed
    • With external JavaScript now being loaded, and with many of these inline changes being made using standard libraries (such as JQuery), Google is able to index the variant and hence we see CRO experiments sometimes being indexed

I might have expected the platforms to block their JS with robots.txt, but at least the main platforms I’ve looked at don't do that. With Google being sympathetic towards testing, however, this shouldn’t be a major problem — just something to be aware of as you build out your user-facing CRO tests. All the more reason for your UX and SEO teams to work closely together and communicate well.

Split tests show SEO improvements from removing a reliance on JS

Although we would like to do a lot more to test the actual real-world impact of relying on JavaScript, we do have some early results. At the end of last week I published a post outlining the uplift we saw from removing a site’s reliance on JS to display content and links on category pages.

odn_additional_sessions.png

A simple test that removed the need for JavaScript on 50% of pages showed a >6% uplift in organic traffic — worth thousands of extra sessions a month. While we haven’t proven that JavaScript is always bad, nor understood the exact mechanism at work here, we have opened up a new avenue for exploration, and at least shown that it’s not a settled matter. To my mind, it highlights the importance of testing. It’s obviously our belief in the importance of SEO split-testing that led to us investing so much in the development of the ODN platform over the last 18 months or so.

Conclusion: How JavaScript indexing might work from a systems perspective

Based on all of the information we can piece together from the external behavior of the search results, public comments from Googlers, tests and experiments, and first principles, here’s how I think JavaScript indexing is working at Google at the moment: I think there is a separate queue for JS-enabled rendering, because the computational cost of trying to run JavaScript over the entire web is unnecessary given the lack of a need for it on many, many pages. In detail, I think:

  • Googlebot crawls and caches HTML and core resources regularly
  • Heuristics (and probably machine learning) are used to prioritize JavaScript rendering for each page:
    • Some pages are indexed with no JS execution. There are many pages that can probably be easily identified as not needing rendering, and others which are such a low priority that it isn’t worth the computing resources.
    • Some pages get immediate rendering – or possibly immediate basic/regular indexing, along with high-priority rendering. This would enable the immediate indexation of pages in news results or other QDF results, but also allow pages that rely heavily on JS to get updated indexation when the rendering completes.
    • Many pages are rendered async in a separate process/queue from both crawling and regular indexing, thereby adding the page to the index for new words and phrases found only in the JS-rendered version when rendering completes, in addition to the words and phrases found in the unrendered version indexed initially.
  • The JS rendering also, in addition to adding pages to the index:
    • May make modifications to the link graph
    • May add new URLs to the discovery/crawling queue for Googlebot

The idea of JavaScript rendering as a distinct and separate part of the indexing pipeline is backed up by this quote from KMag, who I mentioned previously for his contributions to this HN thread (direct link) [emphasis mine]:

“I was working on the lightweight high-performance JavaScript interpretation system that sandboxed pretty much just a JS engine and a DOM implementation that we could run on every web page on the index. Most of my work was trying to improve the fidelity of the system. My code analyzed every web page in the index.

Towards the end of my time there, there was someone in Mountain View working on a heavier, higher-fidelity system that sandboxed much more of a browser, and they were trying to improve performance so they could use it on a higher percentage of the index.”

This was the situation in 2010. It seems likely that they have moved a long way towards the headless browser in all cases, but I’m skeptical about whether it would be worth their while to render every page they crawl with JavaScript given the expense of doing so and the fact that a large percentage of pages do not change substantially when you do.

My best guess is that they're using a combination of trying to figure out the need for JavaScript execution on a given page, coupled with trust/authority metrics to decide whether (and with what priority) to render a page with JS.

Run a test, get publicity

I have a hypothesis that I would love to see someone test: That it’s possible to get a page indexed and ranking for a nonsense word contained in the served HTML, but not initially ranking for a different nonsense word added via JavaScript; then, to see the JS get indexed some period of time later and rank for both nonsense words. If you want to run that test, let me know the results — I’d be happy to publicize them.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

Evidence of the Surprising State of JavaScript Indexing

Evidence of the Surprising State of JavaScript Indexing http://ift.tt/2qxWjmk

Posted by willcritchlow

Back when I started in this industry, it was standard advice to tell our clients that the search engines couldn’t execute JavaScript (JS), and anything that relied on JS would be effectively invisible and never appear in the index. Over the years, that has changed gradually, from early work-arounds (such as the horrible escaped fragment approach my colleague Rob wrote about back in 2010) to the actual execution of JS in the indexing pipeline that we see today, at least at Google.

In this article, I want to explore some things we've seen about JS indexing behavior in the wild and in controlled tests and share some tentative conclusions I've drawn about how it must be working.

A brief introduction to JS indexing

At its most basic, the idea behind JavaScript-enabled indexing is to get closer to the search engine seeing the page as the user sees it. Most users browse with JavaScript enabled, and many sites either fail without it or are severely limited. While traditional indexing considers just the raw HTML source received from the server, users typically see a page rendered based on the DOM (Document Object Model) which can be modified by JavaScript running in their web browser. JS-enabled indexing considers all content in the rendered DOM, not just that which appears in the raw HTML.

There are some complexities even in this basic definition (answers in brackets as I understand them):

  • What about JavaScript that requests additional content from the server? (This will generally be included, subject to timeout limits)
  • What about JavaScript that executes some time after the page loads? (This will generally only be indexed up to some time limit, possibly in the region of 5 seconds)
  • What about JavaScript that executes on some user interaction such as scrolling or clicking? (This will generally not be included)
  • What about JavaScript in external files rather than in-line? (This will generally be included, as long as those external files are not blocked from the robot — though see the caveat in experiments below)

For more on the technical details, I recommend my ex-colleague Justin’s writing on the subject.

A high-level overview of my view of JavaScript best practices

Despite the incredible work-arounds of the past (which always seemed like more effort than graceful degradation to me) the “right” answer has existed since at least 2012, with the introduction of PushState. Rob wrote about this one, too. Back then, however, it was pretty clunky and manual and it required a concerted effort to ensure both that the URL was updated in the user’s browser for each view that should be considered a “page,” that the server could return full HTML for those pages in response to new requests for each URL, and that the back button was handled correctly by your JavaScript.

Along the way, in my opinion, too many sites got distracted by a separate prerendering step. This is an approach that does the equivalent of running a headless browser to generate static HTML pages that include any changes made by JavaScript on page load, then serving those snapshots instead of the JS-reliant page in response to requests from bots. It typically treats bots differently, in a way that Google tolerates, as long as the snapshots do represent the user experience. In my opinion, this approach is a poor compromise that's too susceptible to silent failures and falling out of date. We've seen a bunch of sites suffer traffic drops due to serving Googlebot broken experiences that were not immediately detected because no regular users saw the prerendered pages.

These days, if you need or want JS-enhanced functionality, more of the top frameworks have the ability to work the way Rob described in 2012, which is now called isomorphic (roughly meaning “the same”).

Isomorphic JavaScript serves HTML that corresponds to the rendered DOM for each URL, and updates the URL for each “view” that should exist as a separate page as the content is updated via JS. With this implementation, there is actually no need to render the page to index basic content, as it's served in response to any fresh request.

I was fascinated by this piece of research published recently — you should go and read the whole study. In particular, you should watch this video (recommended in the post) in which the speaker — who is an Angular developer and evangelist — emphasizes the need for an isomorphic approach:

Resources for auditing JavaScript

If you work in SEO, you will increasingly find yourself called upon to figure out whether a particular implementation is correct (hopefully on a staging/development server before it’s deployed live, but who are we kidding? You’ll be doing this live, too).

To do that, here are some resources I’ve found useful:

Some surprising/interesting results

There are likely to be timeouts on JavaScript execution

I already linked above to the ScreamingFrog post that mentions experiments they have done to measure the timeout Google uses to determine when to stop executing JavaScript (they found a limit of around 5 seconds).

It may be more complicated than that, however. This segment of a thread is interesting. It's from a Hacker News user who goes by the username KMag and who claims to have worked at Google on the JS execution part of the indexing pipeline from 2006–2010. It’s in relation to another user speculating that Google would not care about content loaded “async” (i.e. asynchronously — in other words, loaded as part of new HTTP requests that are triggered in the background while assets continue to download):

“Actually, we did care about this content. I'm not at liberty to explain the details, but we did execute setTimeouts up to some time limit.

If they're smart, they actually make the exact timeout a function of a HMAC of the loaded source, to make it very difficult to experiment around, find the exact limits, and fool the indexing system. Back in 2010, it was still a fixed time limit.”

What that means is that although it was initially a fixed timeout, he’s speculating (or possibly sharing without directly doing so) that timeouts are programmatically determined (presumably based on page importance and JavaScript reliance) and that they may be tied to the exact source code (the reference to “HMAC” is to do with a technical mechanism for spotting if the page has changed).

It matters how your JS is executed

I referenced this recent study earlier. In it, the author found:

Inline vs. External vs. Bundled JavaScript makes a huge difference for Googlebot

The charts at the end show the extent to which popular JavaScript frameworks perform differently depending on how they're called, with a range of performance from passing every test to failing almost every test. For example here’s the chart for Angular:

Slide5.PNG

It’s definitely worth reading the whole thing and reviewing the performance of the different frameworks. There's more evidence of Google saving computing resources in some areas, as well as surprising results between different frameworks.

CRO tests are getting indexed

When we first started seeing JavaScript-based split-testing platforms designed for testing changes aimed at improving conversion rate (CRO = conversion rate optimization), their inline changes to individual pages were invisible to the search engines. As Google in particular has moved up the JavaScript competency ladder through executing simple inline JS to more complex JS in external files, we are now seeing some CRO-platform-created changes being indexed. A simplified version of what’s happening is:

  • For users:
    • CRO platforms typically take a visitor to a page, check for the existence of a cookie, and if there isn’t one, randomly assign the visitor to group A or group B
    • Based on either the cookie value or the new assignment, the user is either served the page unchanged, or sees a version that is modified in their browser by JavaScript loaded from the CRO platform’s CDN (content delivery network)
    • A cookie is then set to make sure that the user sees the same version if they revisit that page later
  • For Googlebot:
    • The reliance on external JavaScript used to prevent both the bucketing and the inline changes from being indexed
    • With external JavaScript now being loaded, and with many of these inline changes being made using standard libraries (such as JQuery), Google is able to index the variant and hence we see CRO experiments sometimes being indexed

I might have expected the platforms to block their JS with robots.txt, but at least the main platforms I’ve looked at don't do that. With Google being sympathetic towards testing, however, this shouldn’t be a major problem — just something to be aware of as you build out your user-facing CRO tests. All the more reason for your UX and SEO teams to work closely together and communicate well.

Split tests show SEO improvements from removing a reliance on JS

Although we would like to do a lot more to test the actual real-world impact of relying on JavaScript, we do have some early results. At the end of last week I published a post outlining the uplift we saw from removing a site’s reliance on JS to display content and links on category pages.

odn_additional_sessions.png

A simple test that removed the need for JavaScript on 50% of pages showed a >6% uplift in organic traffic — worth thousands of extra sessions a month. While we haven’t proven that JavaScript is always bad, nor understood the exact mechanism at work here, we have opened up a new avenue for exploration, and at least shown that it’s not a settled matter. To my mind, it highlights the importance of testing. It’s obviously our belief in the importance of SEO split-testing that led to us investing so much in the development of the ODN platform over the last 18 months or so.

Conclusion: How JavaScript indexing might work from a systems perspective

Based on all of the information we can piece together from the external behavior of the search results, public comments from Googlers, tests and experiments, and first principles, here’s how I think JavaScript indexing is working at Google at the moment: I think there is a separate queue for JS-enabled rendering, because the computational cost of trying to run JavaScript over the entire web is unnecessary given the lack of a need for it on many, many pages. In detail, I think:

  • Googlebot crawls and caches HTML and core resources regularly
  • Heuristics (and probably machine learning) are used to prioritize JavaScript rendering for each page:
    • Some pages are indexed with no JS execution. There are many pages that can probably be easily identified as not needing rendering, and others which are such a low priority that it isn’t worth the computing resources.
    • Some pages get immediate rendering – or possibly immediate basic/regular indexing, along with high-priority rendering. This would enable the immediate indexation of pages in news results or other QDF results, but also allow pages that rely heavily on JS to get updated indexation when the rendering completes.
    • Many pages are rendered async in a separate process/queue from both crawling and regular indexing, thereby adding the page to the index for new words and phrases found only in the JS-rendered version when rendering completes, in addition to the words and phrases found in the unrendered version indexed initially.
  • The JS rendering also, in addition to adding pages to the index:
    • May make modifications to the link graph
    • May add new URLs to the discovery/crawling queue for Googlebot

The idea of JavaScript rendering as a distinct and separate part of the indexing pipeline is backed up by this quote from KMag, who I mentioned previously for his contributions to this HN thread (direct link) [emphasis mine]:

“I was working on the lightweight high-performance JavaScript interpretation system that sandboxed pretty much just a JS engine and a DOM implementation that we could run on every web page on the index. Most of my work was trying to improve the fidelity of the system. My code analyzed every web page in the index.

Towards the end of my time there, there was someone in Mountain View working on a heavier, higher-fidelity system that sandboxed much more of a browser, and they were trying to improve performance so they could use it on a higher percentage of the index.”

This was the situation in 2010. It seems likely that they have moved a long way towards the headless browser in all cases, but I’m skeptical about whether it would be worth their while to render every page they crawl with JavaScript given the expense of doing so and the fact that a large percentage of pages do not change substantially when you do.

My best guess is that they're using a combination of trying to figure out the need for JavaScript execution on a given page, coupled with trust/authority metrics to decide whether (and with what priority) to render a page with JS.

Run a test, get publicity

I have a hypothesis that I would love to see someone test: That it’s possible to get a page indexed and ranking for a nonsense word contained in the served HTML, but not initially ranking for a different nonsense word added via JavaScript; then, to see the JS get indexed some period of time later and rank for both nonsense words. If you want to run that test, let me know the results — I’d be happy to publicize them.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

শুক্রবার, ২৬ মে, ২০১৭

Should SEOs Care About Internal Links? - Whiteboard Friday

Should SEOs Care About Internal Links? - Whiteboard Friday http://ift.tt/2r3W0DH

Posted by randfish

Internal links are one of those essential SEO items you have to get right to avoid getting them really wrong. Rand shares 18 tips to help inform your strategy, going into detail about their attributes, internal vs. external links, ideal link structures, and much, much more in this edition of Whiteboard Friday.

Should SEOs Care About Internl Links?

Click on the whiteboard image above to open a high-resolution version in a new tab!

Video Transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we're going to chat a little bit about internal links and internal link structures. Now, it is not the most exciting thing in the SEO world, but it's something that you have to get right and getting it wrong can actually cause lots of problems.

Attributes of internal links

So let's start by talking about some of the things that are true about internal links. Internal links, when I say that phrase, what I mean is a link that exists on a website, let's say ABC.com here, that is linking to a page on the same website, so over here, linking to another page on ABC.com. We'll do /A and /B. This is actually my shipping routes page. So you can see I'm linking from A to B with the anchor text "shipping routes."

The idea of an internal link is really initially to drive visitors from one place to another, to show them where they need to go to navigate from one spot on your site to another spot. They're different from internal links only in that, in the HTML code, you're pointing to the same fundamental root domain. In the initial early versions of the internet, that didn't matter all that much, but for SEO, it matters quite a bit because external links are treated very differently from internal links. That is not to say, however, that internal links have no power or no ability to change rankings, to change crawling patterns and to change how a search engine views your site. That's what we need to chat about.



1. Anchor text is something that can be considered. The search engines have generally minimized its importance, but it's certainly something that's in there for internal links.

2. The location on the page actually matters quite a bit, just as it does with external links. Internal links, it's almost more so in that navigation and footers specifically have attributes around internal links that can be problematic.

Those are essentially when Google in particular sees manipulation in the internal link structure, specifically things like you've stuffed anchor text into all of the internal links trying to get this shipping routes page ranking by putting a little link down here in the footer of every single page and then pointing over here trying to game and manipulate us, they hate that. In fact, there is an algorithmic penalty for that kind of stuff, and we can see it very directly.



We've actually run tests where we've observed that jamming this type of anchor text-rich links into footers or into navigation and then removing it gets a site indexed, well let's not say indexed, let's say ranking well and then ranking poorly when you do it. Google reverses that penalty pretty quickly too, which is nice. So if you are not ranking well and you're like, "Oh no, Rand, I've been doing a lot of that," maybe take it away. Your rankings might come right back. That's great.



3. The link target matters obviously from one place to another.

4. The importance of the linking page, this is actually a big one with internal links. So it is generally the case that if a page on your website has lots of external links pointing to it, it gains authority and it has more ability to sort of generate a little bit, not nearly as much as external links, but a little bit of ranking power and influence by linking to other pages. So if you have very well-linked two pages on your site, you should make sure to link out from those to pages on your site that a) need it and b) are actually useful for your users. That's another signal we'll talk about.



5. The relevance of the link, so pointing to my shipping routes page from a page about other types of shipping information, totally great. Pointing to it from my dog food page, well, it doesn't make great sense. Unless I'm talking about shipping routes of dog food specifically, it seems like it's lacking some of that context, and search engines can pick up on that as well.

6. The first link on the page. So this matters mostly in terms of the anchor text, just as it does for external links. Basically, if you are linking in a bunch of different places to this page from this one, Google will usually, at least in all of our experiments so far, count the first anchor text only. So if I have six different links to this and the first link says "Click here," "Click here" is the anchor text that Google is going to apply, not "Click here" and "shipping routes" and "shipping." Those subsequent links won't matter as much.

7. Then the type of link matters too. Obviously, I would recommend that you keep it in the HTML link format rather than trying to do something fancy with JavaScript. Even though Google can technically follow those, it looks to us like they're not treated with quite the same authority and ranking influence. Text is slightly, slightly better than images in our testing, although that testing is a few years old at this point. So maybe image links are treated exactly the same. Either way, do make sure you have that. If you're doing image links, by the way, remember that the alt attribute of that image is what becomes the anchor text of that link.

Internal versus external links

A. External links usually give more authority and ranking ability.

That shouldn't be surprising. An external link is like a vote from an independent, hopefully independent, hopefully editorially given website to your website saying, "This is a good place for you to go for this type of information." On your own site, it's like a vote for yourself, so engines don't treat it the same.

B. Anchor text of internal links generally have less influence.

So, as we mentioned, me pointing to my page with the phrase that I want to rank for isn't necessarily a bad thing, but I shouldn't do it in a manipulative way. I shouldn't do it in a way that's going to look spammy or sketchy to visitors, because if visitors stop clicking around my site or engaging with it or they bounce more, I will definitely lose ranking influence much faster than if I simply make those links credible and usable and useful to visitors. Besides, the anchor text of internal links is not as powerful anyway.



C. A lack of internal links can seriously hamper a page's ability to get crawled + ranked.

It is, however, the case that a lack of internal links, like an orphan page that doesn't have many internal or any internal links from the rest of its website, that can really hamper a page's ability to rank. Sometimes it will happen. External links will point to a page. You'll see that page in your analytics or in a report about your links from Moz or Ahrefs or Majestic, and then you go, "Oh my gosh, I'm not linking to that page at all from anywhere else on my site." That's a bad idea. Don't do that. That is definitely problematic.

D. It's still the case, by the way, that, broadly speaking, pages with more links on them will send less link value per link.

So, essentially, you remember the original PageRank formula from Google. It said basically like, "Oh, well, if there are five links, send one-fifth of the PageRank power to each of those, and if there are four links, send one-fourth." Obviously, one-fourth is bigger than one-fifth. So taking away that fifth link could mean that each of the four pages that you've linked to get a little bit more ranking authority and influence in the original PageRank algorithm.

Look, PageRank is old, very, very old at this point, but at least the theories behind it are not completely gone. So it is the case that if you have a page with tons and tons of links on it, that tends to send out less authority and influence than a page with few links on it, which is why it can definitely pay to do some spring cleaning on your website and clear out any rubbish pages or rubbish links, ones that visitors don't want, that search engines don't want, that you don't care about. Clearing that up can actually have a positive influence. We've seen that on a number of websites where they've cleaned up their information architecture, whittled down their links to just the stuff that matters the most and the pages that matter the most, and then seen increased rankings across the board from all sorts of signals, positive signals, user engagement signals, link signals, context signals that help the engine them rank better.

E. Internal link flow (aka PR sculpting) is rarely effective, and usually has only mild effects... BUT a little of the right internal linking can go a long way.

Then finally, I do want to point out that what was previous called — you probably have heard of it in the SEO world — PageRank sculpting. This was a practice that I'd say from maybe 2003, 2002 to about 2008, 2009, had this life where there would be panel discussions about PageRank sculpting and all these examples of how to do it and software that would crawl your site and show you the ideal PageRank sculpting system to use and which pages to link to and not.



When PageRank was the dominant algorithm inside of Google's ranking system, yeah, it was the case that PageRank sculpting could have some real effect. These days, that is dramatically reduced. It's not entirely gone because of some of these other principles that we've talked about, just having lots of links on a page for no particularly good reason is generally bad and can have harmful effects and having few carefully chosen ones has good effects. But most of the time, internal linking, optimizing internal linking beyond a certain point is not very valuable, not a great value add.

But a little of what I'm calling the right internal linking, that's what we're going to talk about, can go a long way. For example, if you have those orphan pages or pages that are clearly the next step in a process or that users want and they cannot find them or engines can't find them through the link structure, it's bad. Fixing that can have a positive impact.


Ideal internal link structures

So ideally, in an internal linking structure system, you want something kind of like this. This is a very rough illustration here. But the homepage, which has maybe 100 links on it to internal pages. One hop away from that, you've got your 100 different pages of whatever it is, subcategories or category pages, places that can get folks deeper into your website. Then from there, each of those have maybe a maximum of 100 unique links, and they get you 2 hops away from a homepage, which takes you to 10,000 pages who do the same thing.



I. No page should be more than 3 link "hops" away from another (on most small-->medium sites).

Now, the idea behind this is that basically in one, two, three hops, three links away from the homepage and three links away from any page on the site, I can get to up to a million pages. So when you talk about, "How many clicks do I have to get? How far away is this in terms of link distance from any other page on the site?" a great internal linking structure should be able to get you there in three or fewer link hops. If it's a lot more, you might have an internal linking structure that's really creating sort of these long pathways of forcing you to click before you can ever reach something, and that is not ideal, which is why it can make very good sense to build smart categories and subcategories to help people get in there.

I'll give you the most basic example in the world, a traditional blog. In order to reach any post that was published two years ago, I've got to click Next, Next, Next, Next, Next, Next through all this pagination until I finally get there. Or if I've done a really good job with my categories and my subcategories, I can click on the category of that blog post and I can find it very quickly in a list of the last 50 blog posts in that particular category, great, or by author or by tag, however you're doing your navigation.



II. Pages should contain links that visitors will find relevant and useful.

If no one ever clicks on a link, that is a bad signal for your site, and it is a bad signal for Google as well. I don't just mean no one ever. Very, very few people ever and many of them who do click it click the back button because it wasn't what they wanted. That's also a bad sign.

III. Just as no two pages should be targeting the same keyword or searcher intent, likewise no two links should be using the same anchor text to point to different pages. Canonicalize!

For example, if over here I had a shipping routes link that pointed to this page and then another shipping routes link, same anchor text pointing to a separate page, page C, why am I doing that? Why am I creating competition between my own two pages? Why am I having two things that serve the same function or at least to visitors would appear to serve the same function and search engines too? I should canonicalize those. Canonicalize those links, canonicalize those pages. If a page is serving the same intent and keywords, keep it together.

IV. Limit use of the rel="nofollow" to UGC or specific untrusted external links. It won't help your internal link flow efforts for SEO.

Rel="nofollow" was sort of the classic way that people had been doing PageRank sculpting that we talked about earlier here. I would strongly recommend against using it for that purpose. Google said that they've put in some preventative measures so that rel="nofollow" links sort of do this leaking PageRank thing, as they call it. I wouldn't stress too much about that, but I certainly wouldn't use rel="nofollow."

What I would do is if I'm trying to do internal link sculpting, I would just do careful curation of the links and pages that I've got. That is the best way to help your internal link flow. That's things like...



V. Removing low-value content, low-engagement content and creating internal links that people actually do want. That is going to give you the best results.

VI. Don't orphan! Make sure pages that matter have links to (and from) them. Last, but not least, there should never be an orphan. There should never be a page with no links to it, and certainly there should never be a page that is well linked to that isn't linking back out to portions of your site that are of interest or value to visitors and to Google.

So following these practices, I think you can do some awesome internal link analysis, internal link optimization and help your SEO efforts and the value visitors get from your site. We'll see you again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

Should SEOs Care About Internal Links? - Whiteboard Friday

Should SEOs Care About Internal Links? - Whiteboard Friday http://ift.tt/2r3W0DH

Posted by randfish

Internal links are one of those essential SEO items you have to get right to avoid getting them really wrong. Rand shares 18 tips to help inform your strategy, going into detail about their attributes, internal vs. external links, ideal link structures, and much, much more in this edition of Whiteboard Friday.

Should SEOs Care About Internl Links?

Click on the whiteboard image above to open a high-resolution version in a new tab!

Video Transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we're going to chat a little bit about internal links and internal link structures. Now, it is not the most exciting thing in the SEO world, but it's something that you have to get right and getting it wrong can actually cause lots of problems.

Attributes of internal links

So let's start by talking about some of the things that are true about internal links. Internal links, when I say that phrase, what I mean is a link that exists on a website, let's say ABC.com here, that is linking to a page on the same website, so over here, linking to another page on ABC.com. We'll do /A and /B. This is actually my shipping routes page. So you can see I'm linking from A to B with the anchor text "shipping routes."

The idea of an internal link is really initially to drive visitors from one place to another, to show them where they need to go to navigate from one spot on your site to another spot. They're different from internal links only in that, in the HTML code, you're pointing to the same fundamental root domain. In the initial early versions of the internet, that didn't matter all that much, but for SEO, it matters quite a bit because external links are treated very differently from internal links. That is not to say, however, that internal links have no power or no ability to change rankings, to change crawling patterns and to change how a search engine views your site. That's what we need to chat about.



1. Anchor text is something that can be considered. The search engines have generally minimized its importance, but it's certainly something that's in there for internal links.

2. The location on the page actually matters quite a bit, just as it does with external links. Internal links, it's almost more so in that navigation and footers specifically have attributes around internal links that can be problematic.

Those are essentially when Google in particular sees manipulation in the internal link structure, specifically things like you've stuffed anchor text into all of the internal links trying to get this shipping routes page ranking by putting a little link down here in the footer of every single page and then pointing over here trying to game and manipulate us, they hate that. In fact, there is an algorithmic penalty for that kind of stuff, and we can see it very directly.



We've actually run tests where we've observed that jamming this type of anchor text-rich links into footers or into navigation and then removing it gets a site indexed, well let's not say indexed, let's say ranking well and then ranking poorly when you do it. Google reverses that penalty pretty quickly too, which is nice. So if you are not ranking well and you're like, "Oh no, Rand, I've been doing a lot of that," maybe take it away. Your rankings might come right back. That's great.



3. The link target matters obviously from one place to another.

4. The importance of the linking page, this is actually a big one with internal links. So it is generally the case that if a page on your website has lots of external links pointing to it, it gains authority and it has more ability to sort of generate a little bit, not nearly as much as external links, but a little bit of ranking power and influence by linking to other pages. So if you have very well-linked two pages on your site, you should make sure to link out from those to pages on your site that a) need it and b) are actually useful for your users. That's another signal we'll talk about.



5. The relevance of the link, so pointing to my shipping routes page from a page about other types of shipping information, totally great. Pointing to it from my dog food page, well, it doesn't make great sense. Unless I'm talking about shipping routes of dog food specifically, it seems like it's lacking some of that context, and search engines can pick up on that as well.

6. The first link on the page. So this matters mostly in terms of the anchor text, just as it does for external links. Basically, if you are linking in a bunch of different places to this page from this one, Google will usually, at least in all of our experiments so far, count the first anchor text only. So if I have six different links to this and the first link says "Click here," "Click here" is the anchor text that Google is going to apply, not "Click here" and "shipping routes" and "shipping." Those subsequent links won't matter as much.

7. Then the type of link matters too. Obviously, I would recommend that you keep it in the HTML link format rather than trying to do something fancy with JavaScript. Even though Google can technically follow those, it looks to us like they're not treated with quite the same authority and ranking influence. Text is slightly, slightly better than images in our testing, although that testing is a few years old at this point. So maybe image links are treated exactly the same. Either way, do make sure you have that. If you're doing image links, by the way, remember that the alt attribute of that image is what becomes the anchor text of that link.

Internal versus external links

A. External links usually give more authority and ranking ability.

That shouldn't be surprising. An external link is like a vote from an independent, hopefully independent, hopefully editorially given website to your website saying, "This is a good place for you to go for this type of information." On your own site, it's like a vote for yourself, so engines don't treat it the same.

B. Anchor text of internal links generally have less influence.

So, as we mentioned, me pointing to my page with the phrase that I want to rank for isn't necessarily a bad thing, but I shouldn't do it in a manipulative way. I shouldn't do it in a way that's going to look spammy or sketchy to visitors, because if visitors stop clicking around my site or engaging with it or they bounce more, I will definitely lose ranking influence much faster than if I simply make those links credible and usable and useful to visitors. Besides, the anchor text of internal links is not as powerful anyway.



C. A lack of internal links can seriously hamper a page's ability to get crawled + ranked.

It is, however, the case that a lack of internal links, like an orphan page that doesn't have many internal or any internal links from the rest of its website, that can really hamper a page's ability to rank. Sometimes it will happen. External links will point to a page. You'll see that page in your analytics or in a report about your links from Moz or Ahrefs or Majestic, and then you go, "Oh my gosh, I'm not linking to that page at all from anywhere else on my site." That's a bad idea. Don't do that. That is definitely problematic.

D. It's still the case, by the way, that, broadly speaking, pages with more links on them will send less link value per link.

So, essentially, you remember the original PageRank formula from Google. It said basically like, "Oh, well, if there are five links, send one-fifth of the PageRank power to each of those, and if there are four links, send one-fourth." Obviously, one-fourth is bigger than one-fifth. So taking away that fifth link could mean that each of the four pages that you've linked to get a little bit more ranking authority and influence in the original PageRank algorithm.

Look, PageRank is old, very, very old at this point, but at least the theories behind it are not completely gone. So it is the case that if you have a page with tons and tons of links on it, that tends to send out less authority and influence than a page with few links on it, which is why it can definitely pay to do some spring cleaning on your website and clear out any rubbish pages or rubbish links, ones that visitors don't want, that search engines don't want, that you don't care about. Clearing that up can actually have a positive influence. We've seen that on a number of websites where they've cleaned up their information architecture, whittled down their links to just the stuff that matters the most and the pages that matter the most, and then seen increased rankings across the board from all sorts of signals, positive signals, user engagement signals, link signals, context signals that help the engine them rank better.

E. Internal link flow (aka PR sculpting) is rarely effective, and usually has only mild effects... BUT a little of the right internal linking can go a long way.

Then finally, I do want to point out that what was previous called — you probably have heard of it in the SEO world — PageRank sculpting. This was a practice that I'd say from maybe 2003, 2002 to about 2008, 2009, had this life where there would be panel discussions about PageRank sculpting and all these examples of how to do it and software that would crawl your site and show you the ideal PageRank sculpting system to use and which pages to link to and not.



When PageRank was the dominant algorithm inside of Google's ranking system, yeah, it was the case that PageRank sculpting could have some real effect. These days, that is dramatically reduced. It's not entirely gone because of some of these other principles that we've talked about, just having lots of links on a page for no particularly good reason is generally bad and can have harmful effects and having few carefully chosen ones has good effects. But most of the time, internal linking, optimizing internal linking beyond a certain point is not very valuable, not a great value add.

But a little of what I'm calling the right internal linking, that's what we're going to talk about, can go a long way. For example, if you have those orphan pages or pages that are clearly the next step in a process or that users want and they cannot find them or engines can't find them through the link structure, it's bad. Fixing that can have a positive impact.


Ideal internal link structures

So ideally, in an internal linking structure system, you want something kind of like this. This is a very rough illustration here. But the homepage, which has maybe 100 links on it to internal pages. One hop away from that, you've got your 100 different pages of whatever it is, subcategories or category pages, places that can get folks deeper into your website. Then from there, each of those have maybe a maximum of 100 unique links, and they get you 2 hops away from a homepage, which takes you to 10,000 pages who do the same thing.



I. No page should be more than 3 link "hops" away from another (on most small-->medium sites).

Now, the idea behind this is that basically in one, two, three hops, three links away from the homepage and three links away from any page on the site, I can get to up to a million pages. So when you talk about, "How many clicks do I have to get? How far away is this in terms of link distance from any other page on the site?" a great internal linking structure should be able to get you there in three or fewer link hops. If it's a lot more, you might have an internal linking structure that's really creating sort of these long pathways of forcing you to click before you can ever reach something, and that is not ideal, which is why it can make very good sense to build smart categories and subcategories to help people get in there.

I'll give you the most basic example in the world, a traditional blog. In order to reach any post that was published two years ago, I've got to click Next, Next, Next, Next, Next, Next through all this pagination until I finally get there. Or if I've done a really good job with my categories and my subcategories, I can click on the category of that blog post and I can find it very quickly in a list of the last 50 blog posts in that particular category, great, or by author or by tag, however you're doing your navigation.



II. Pages should contain links that visitors will find relevant and useful.

If no one ever clicks on a link, that is a bad signal for your site, and it is a bad signal for Google as well. I don't just mean no one ever. Very, very few people ever and many of them who do click it click the back button because it wasn't what they wanted. That's also a bad sign.

III. Just as no two pages should be targeting the same keyword or searcher intent, likewise no two links should be using the same anchor text to point to different pages. Canonicalize!

For example, if over here I had a shipping routes link that pointed to this page and then another shipping routes link, same anchor text pointing to a separate page, page C, why am I doing that? Why am I creating competition between my own two pages? Why am I having two things that serve the same function or at least to visitors would appear to serve the same function and search engines too? I should canonicalize those. Canonicalize those links, canonicalize those pages. If a page is serving the same intent and keywords, keep it together.

IV. Limit use of the rel="nofollow" to UGC or specific untrusted external links. It won't help your internal link flow efforts for SEO.

Rel="nofollow" was sort of the classic way that people had been doing PageRank sculpting that we talked about earlier here. I would strongly recommend against using it for that purpose. Google said that they've put in some preventative measures so that rel="nofollow" links sort of do this leaking PageRank thing, as they call it. I wouldn't stress too much about that, but I certainly wouldn't use rel="nofollow."

What I would do is if I'm trying to do internal link sculpting, I would just do careful curation of the links and pages that I've got. That is the best way to help your internal link flow. That's things like...



V. Removing low-value content, low-engagement content and creating internal links that people actually do want. That is going to give you the best results.

VI. Don't orphan! Make sure pages that matter have links to (and from) them. Last, but not least, there should never be an orphan. There should never be a page with no links to it, and certainly there should never be a page that is well linked to that isn't linking back out to portions of your site that are of interest or value to visitors and to Google.

So following these practices, I think you can do some awesome internal link analysis, internal link optimization and help your SEO efforts and the value visitors get from your site. We'll see you again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!