Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

January 22, 2005

rel=”nofollow”

By now, you’ve surely heard about the announcement. Google (and other leading search engines) will now respect a new attribute to hyperlinks. There has been a deluge of comment, both pro and con. Phil Ringnalda has a nice list of links in his piece. I’m too lazy to add more links, but I bet you know how you could find some.

I also don’t think I have anything clever or original to add to the discussion, but want to explain what we’re doing here at Musings and why.

Previously, you could tell a search engine spider that the links on an entire page are not to be followed by adding a

<meta name="robots" content="nofollow" />

to the <head> of the page. Now you have more fine-grained control. You can write

<a href="..." rel="nofollow">

on the level of individual hyperlinks.

To understand what this is all about, let’s step back a moment. When you do a search for “X,” the search engine attempts to return to you the most relevant hits about “X.” To determine what’s relevant, Google harnesses the collective wisdom of web authors: if lots of them link to something, it must be important. The flaw is in the notion of “web author.” With the advent of guestbooks, wikis and, most importantly, blogs, the distinction between “surfers” and “authors” got blurred. Suddenly, visitors could add their own links to your site. Inevitably, some of those visitors turn out to be robots hawking Viagra and porn sites.

So now we have a way to tell the Googlebot, “These links are important, but those links don’t count.” The theory is that weblog software vendors will programmatically add this attribute to links in comments and trackbacks. It’s not clear whether this will change the incentive structure for comment-spammers, or change it in the direction we want. And it has other problems.

  1. Nothing restricts rel="nofollow" to its intended use: links added by visitors to your site. Instead, it’s yet another tool in the hands of those interested in gaming the system and manipulating the PageRanks of their own sites versus those of their competitors. Self-important jerks are positively gleeful at the thought of diddling with the PageRanks of the sites they link to. SEO “experts” are, I’m sure, licking their chops, too.
  2. If used as intended, links in comment text will be automatically tagged with rel="nofollow". The whole point of allowing comments in the first place was that they are often insightful — sometimes moreso than the original post — and hence the links therein are relevant to the topic at hand, and should be followed.

If I’ve gone to the trouble of eliminating robotic comment spammers, what’s left are my “real” commenters, and a trickle of easily-dealt-with manual spammers. The latter have taken the approach of trying to “blend in,” offering relevant-sounding comments, and saving their spammish URLs for the Commenter-URL link.

The Commenter-URL link was never really relevant to the topic at hand. At best, it’s relevant to the link-text (your name). But, in most circumstances, the resulting boost in PageRank leads only to bizarre effects. With or without the boost from the Commenter-URL link, no one should ever have trouble finding your blog. So, whereas I’m reluctant to add rel="nofollow" to the links in the body of your comment, I have little compunction about automatically adding a rel="nofollow" to your Commenter-URL link.

Trackback-URL links are a place where I am totally defenceless against spam. There, I really feel that the best available defence is to defang them with a rel="nofollow". If you really have linked to one of my posts, you will automatically get a link back on my sidebar, thanks to Technorati. Since the link appears both on my main page and on all my monthly archive pages, that’s much more Googlejuice than you “lost” when I added a rel="nofollow" to the Trackback-URL link on the individual archive page.

So, in summary, Commenter-URL and Trackback-URL links get rel="nofollow" added to them. The links in the body of your comments do not. I think that strikes the desired balance between quashing spam and respecting the value that commenters bring to these pages.

Update (2/5/2005):

I’ve tweaked this policy a bit. If you PGP-sign your comment, then it does not get rel="nofollow"ed.
Posted by distler at January 22, 2005 1:27 PM

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/498

10 Comments & 2 Trackbacks

Re: rel=”nofollow”

I’m currently beta testing some code on my blog for dealing with trackback spam: An Automatic Solution to Referral / Track-Back Spam.

You can see the automatically moderated trackbacks here.

So far, so good :-)

Posted by: Richard@Home on January 24, 2005 6:11 AM | Permalink | Reply to this

Trackback spam

I don’t see how looking at Referer strings distinguishes legitimate trackbacks from spam. Of the hundreds of trackbacks I have received, only a handful had a Referer string other than “-” :

“http://www.anomy.net/pingbuddy/”
“http://Dottext.com/Services/default.htm”

The legitimate and the illegitimate trackbacks did not differ in the slightest, with respect to their Referer strings.

In fact, looking at the page you linked to, I’m not sure we are even talking about the same thing. Do you mean referrals (links) to your page, or do you mean trackbacks?

Posted by: Jacques Distler on January 24, 2005 1:01 PM | Permalink | PGP Sig | Reply to this

Gaming the system?

Nothing restricts rel=”nofollow” to its intended use: links added by visitors to your site. Instead, it’s yet another tool in the hands of those interested in gaming the system and manipulating the PageRanks of their own sites versus those of their competitors. Self-important jerks are positively gleeful at the thought of diddling with the PageRanks of the sites they link to.

Jacques, could you clarify what you mean by this?

Last year, there was no way to link to some obnoxious site without giving them at least a tiny boost in PageRank. But this year, I can link to that site without giving them my implicit “approval”. Perhaps I am not imaginative or devious enough… but I’m not seeing how this constitutes gaming the system. At most, a company could choose to not give their competitor PageRank – PageRank that they didn’t want to grant in the first place.

The real harm would come if, say, the creator of a general-purpose CMS snuck in some code that automatically tagged links to competitors, or political sites that the author didn’t like, or something like that. But of course anyone who tried this would be laughed out of business. (In theory, this technique might appeal to certain repressive regimes… but repressive regimes have far nastier ways of keeping the lid on “dangerous ideas” than farting around with PageRank.)

Posted by: Evan on January 24, 2005 7:59 PM | Permalink | Reply to this

Re: Gaming the system?

Jacques, could you clarify what you mean by this?

If, amid a discussion of subject “X”, you link to web page “Y”, then, in a search about “X”, “Y” is more likely to appear with a higher ranking. It will, it’s true, also appear more highly-ranked in other, unrelated searches. But that effect is less strong.

Google has made it its business to fine-tune the algorithm that underlies this. If, indeed, page “Y” is relevant to the subject “X”, then we (the surfing public), if not necessarily you, want “Y” to appear higher-ranked in our searches on that topic.

Say you are in the business of selling widgets. You have a web site. And, naturally, aside from advertising your particular product line, you’ve heard it’s a good idea to put up some information pertaining to the proper selection of widgets, the care and maintenance of widgets and so forth. Useful for your existing customers and (even more important) likely to attract new customers looking for information on “widgets.”

You are not alone. Your competitors also have web sites and they, too, are thinking about putting up some useful general information about widgets to enhance their sites.

You could, of course, go through with your plan of putting up useful content on the subject (thereby increasing the Web’s net store of useful information about widgets). Or you could just link to your competitor’s page of useful information (using frames, perhaps, to maintain your own “branding”).

Previously, you might not have wanted to do that. But now you can, without hurting your relative PageRank, by using rel=nofollow.

The net result is overall less useful information about widgets on the Web and lower PageRank (than otherwise would be the case) for those sites actually containing useful information.

I’m sure that the folks at Google will, eventually, adjust to compensate for the latter effect. Hopefully, this will mitigate the former effect.

Posted by: Jacques Distler on January 24, 2005 11:14 PM | Permalink | PGP Sig | Reply to this

Re: Gaming the system?

Previously, if you were a certain well-known encyclopedia written as a wiki, you would generate a lot of PR just by having half a million pages, and that PR, along with any from external links, would be distributed around the site by internal links, and would also be distributed to any external sites you felt were important enough to link from your encyclopedia articles.

Now, as a byproduct of antispamming, every single bit of PR that each page has to distribute will be distributed only internally, to your own other pages, since all external links are nofollowed. Net result: you are more popular than before, the sites that you pretend to call important are less popular. You will tend to come out higher in search results, they will tend to come out lower.

Posted by: Phil Ringnalda on January 25, 2005 12:10 AM | Permalink | PGP Sig | Reply to this

Wikipedia

Thank you, Phil, for bringing a real example to the table.

Boy, they (who ain’t getting a link from me today) worked quickly, didn’t they?

Now all they need to do is introduce the concept of “premium links” (for $5, we won’t rel=nofollow your page) and the cycle will be complete.

Posted by: Jacques Distler on January 25, 2005 12:53 AM | Permalink | PGP Sig | Reply to this

Re: Wikipedia

Pretty soon we’re going to be charging for all comments, just because maintaining comments is becoming such a pain in the ass.

Thanks for the explanation, guys.

Posted by: Evan on January 25, 2005 10:01 AM | Permalink | Reply to this

Re: rel=”nofollow”

I don’t need nofollow, because I have this (must be) unknown invention called a spam filter. I make a point, however, of adding it to any links I make to nofollow users. This POS is just to help Google (since the popularity of blogging is forcing out commercial links)

Posted by: David Russell on May 20, 2005 3:58 AM | Permalink | Reply to this

Re: rel=”nofollow”

All these months later, is there any consensus on this? You realize that in the future Google could just start quietly following the “nofollow” links? There is no law or legal principle I know of (unlike with the implied ideas of trespass in the robots.txt instructions) that would require Google to obey nofollow.

Meanwhile, I have to say comment spam is a big problem, and it’s rewarding the worst kind of behavior on the web. If this disrupts the industry even a little while, forces a lot of people to get new jobs, maybe it will have a harder time putting itself back together again when nofollow is abandoned, and we’ll all have a precious few months of civility.

Posted by: Joel on August 18, 2005 11:14 PM | Permalink | Reply to this
Read the post Full Disclosure
Weblog: Musings
Excerpt: A serious MovableType security vulnerability.
Tracked: January 5, 2007 6:13 PM

Re: rel=”nofollow”

I think that the search engines need to explain their position on using re=nofollow on internal links. I’ve noticed a strange trend of webmasters using rel=nofollow on internal links on their own web pages…on links to their own web pages, not just outgoing links. Many seem to think that they can control flow of pagerank on their site by doing this, even if its not a blog comment. Many people use rel=nofollow on links to pages on their site like “privacy policy” and “about us.”

This practice seems counter-productive. It might also ruin your site. Some webmasters believe that this attribute can be used for SEO - Some SEOs even advocate this policy. Thoughts?

Posted by: jay-jet on June 16, 2007 6:59 PM | Permalink | Reply to this
Read the post Ist für Google NoFollow die Lösung für Alles?!
Weblog: martinwaiss.com
Excerpt: “Meine persönlich Twitterseite hat 1.700 Links, 1.500+ Followers, beinhaltet über 7.000 Tweets und hat PR von 5. Ich habe alle diese Links eingetragen. Ich habe alle diese Tweets geschrieben. Alle diese Leute folgen mir als Person. Ich habe mei...
Tracked: September 15, 2008 7:55 AM

Post a New Comment