Saturday, September 22, 2007

Very funny Matt Cutts gag from my friend Dave at www.huomah.com

If the domain name is funny, then you know you're in for an f'ing treat! Check out the "Matt Cutts Link Spam Assassin" software over on Dave's blog:

Spam Assassin by Matt Cutts

Very funny stuff! Don't forget to check out his SEO Rants home page for some really good stuff. The current rant is entitled "Your SEO Sucks", and is a don't miss.

Google Adwords/Adsense becoming overloaded?

Many websites I've visited lately have been taking an overly long period of time to load. The page loads up quickly, but the indicator icon shows that the page is still attempting to load. Looking at the status bar at the bottom of the browser, I notice that the component still trying to load is pagead2.googlesyndication.com. As far as I know, this component is used as a referrer for Adsense ads.

Often, the pagead2.googlesyndication.com component of the page never finishes loading, and I can wait for minutes for it to complete. It's as if the connection between the host server and pagead2.googlesyndication.com is locked, and can never complete whatever request is being made between the two. I know that the page never fully loads (even though I can see the content) because the Pagerank value on the toolbar doesn't update from the value shown for the previous page I had visited.

The fix, as a user, is simple enough - I hit F5 and refresh the page (sometimes twice or three times), but is it fair to suck up that website's bandwidth with multiple refreshes because a Google process is either bogged down or broken?

I've only noticed this in the last 2 or 3 months, but over the last several weeks, it has gotten a lot worse. What's puzzling is that some of the pages loading the pagead2.googlesyndication.com component don't have Adsense on them at all, at least that I can see. In my Google Analytic Tools, here's what the big G has to say about it:

"Referrals from pagead2.googlesyndication.com are clicks on your AdWords ads showing on the content network - specifically, ads showing on publisher sites in the AdSense program."

So what gives - if the page has no obvious Adsense, is Google still tracking Adwords data even though I hit the page through a natural link? That doesn't seem right.

Friday, September 21, 2007

I'm predicting a large Google algorithm change this weekend or next

I created a little tool for myself that helps me track keywords that I'm optimizing on, and the websites associated with those optimization campaigns. This is a tool used offline, and I have to enter data into it manually. It works well when gauging the progress of some of my optimization campaigns. I wrote it myself using Visual Basic 6, and it has some nifty features.

One thing I use it for is to track keywords that I no longer perform optimization for. The SERPs for these keywords are fairly stable, moving up and down through the various Google dance steps. In general, they trend down, simply because I no longer focus on them. I can generally tell when Google is testing an algo tweak because these keywords bounce around in the SERPs then settle back to roughly their same previous positions.

I created a formula based on these keywords (there are about 100 of them) that shows the variance in SERPs over time. The formula produces a number that gives me a basic idea of the current volatility of the Google search results. The number can be anywhere between -100 and 100, with zero being absolutely no change from the previous week (I normally manually add the updated SERP data once per week, but the formula will work equally well on a daily, hourly, monthly, or yearly basis. The shorter the time period, however, the more pronounced are any small changes.)

Normally, the variance figure is between 5-10 (or -5 and -10), which means a variance of 5-10. (*Note-I'm not a math whiz, so if my terminology is incorrect, you know where you can take yourself). During some of the obvious algo tweaks, the variance goes as high as 15 or 20, but usually drops after one week. Since the formula uses about 100 keywords, a severe change for one keyword doesn't alter the end result as severely as it would were I only to use 5 or 6 keywords. When the variance hits 15 or 20, that means that I've had some fairly significant change in SERPs for my keywords. The change may be up or down because the direction of the SERP change isn't as important to me as the actual amount of increase or decrease.

Two weeks ago, I started entering SERP values in as often as possible (often once per day). I've been paying close attention to the variance looking for signs that Google is doing algo testing. Twice in the last two weeks, I've seen major SERP changes that showed variances of up to 35. Even factoring in the changes introduced by doing it nearly daily, that shows at least a minor algo change has taken place. My opinion is that those were instances of Google nerds testing algo changes in preparation for a larger algorithm change.

I may just be reading the tea leaves incorrectly, but I'm expecting the algorithm changes that have been tested over the last month or so to be implemented either this weekend or next. If it doesn't happen this weekend, and next week is fairly quiet regarding SERP movements, watch out - we may be in for a major algorithm change.

(For those who are going to ask the obvious question: no I don't see a toolbar PR update happening for quite a while still.)

Thursday, September 20, 2007

Spam and virus sites infesting the Google SERPs in several categories - has the mighty Google been hacked?

It appears that a spammer has found out how to infiltrate the Google index without being caught. Here's what is happening in a nutshell:

  • Some searches (very specific phrases, and I won't list any of them right now - Google knows which they are) return results with a large number of .cn (Chinese) sites.
  • The .cn sites are often scraped content from legitimate U.S. websites
  • The legitimate sites are being ranked below the scammed .cn sites for these competitive keywords.
  • When a user clicks on one of the .cn sites returned in the result set, the user is redirected to an entirely different page which attempts to install one or more pieces of malware on the user's computer. If the user is not protected, they become infected - I don't know the specifics of the infection as I AM well protected
  • The .cn sites don't appear to be hosted ANYWHERE. They are simply redirected domain names. How they got ranked in Google in such a short period of time for fairly competitive keywords is a mystery. Google's index even shows legitimate content for the .cn sites.
  • It appears that the faked sites are redirecting the Googlebot to a location where content can be indexed, while at the same time recognizing normal users and redirecting them to a site that includes the malware mentioned earlier. This is an obvious violation of Google's guidelines, but the spammers have found ways to circumvent the rule and hide it from the Googlebot.
  • These sites are numbering in the millions for many different keywords and phrases, and appear to be developed on an automated basis. Because of privacy laws, it's hard to track down who owns the domain names - Google has the power to do so, but there has been about exactly zero information from Google about the problem so far, and even many SEO experts and webmasters are not picking up on it.
What Does This Actually Mean?
So what does all this mean? One, don't click on a .cn domain name returned from Google.com. If you need to search for a Chinese site, use Google.cn instead of Google.com. Second is to watch your own SERPs and see if you are suddenly dropping below sites with a .cn TLD. If you find that happening, report it here. Third, don't panic - Google is remaining mum on this for a number of reasons. Were the public to stop trusting Google it could cause major upheavals in the search engine business - if the problem was just spam, the public wouldn't even notice. However, since malware is involved, this is something that could hit the major media with a giant bang and cause a panic. That could affect traffic to some sites in a major way - especially those specifically optimized for the Google search engine.

A Major Infrastructure Problem?
If a smart spammer has really found a way to game the Google search results with spoofed or cloaked sites, and Google still doesn't have a fix, this could be a major issue with the underlying infrastructure of the entire Google operation. I've seen hints that a significant infrastructure change is taking place; is this spam issue the reason? Could that mean that Google was actually hacked instead of someone spamming the index? If so, webmasters may be waiting a long time for the expected Pagerank update while Google fixes the leaks.

Time to Worry?
This is the first time that I've ever been worried that Google's own index has been hacked. The obvious and blatant circumvention of a guideline normally picked up by the Googlebot quickly is worrisome. A normal website pulling this would be banned almost instantly. The fact that none of the sites have real content and don't appear to even be hosted anywhere is even more scary. How did millions of sites get indexed if they don't exist?

Some Guesses
The fact that the SERPs have been so volatile lately shows that the Google algorithm is being updated and tested - often. Coupled with the fact that Google's normal quarterly Toolbar Pagerank update didn't occur at the beginning of August points to the fact that Google is making some major changes. It's not a giant leap of logic to assume that Google may be trying to figure out a way to stop the spamming of it's index, and is looking for some sort of heuristic formula to identify the sites without hurting legitimate U.S. and European websites. The length of time it's taking is scary, but I'd rather they fix the issue than put a band aid on the problem (Microsoft are you paying attention?) hoping it will go away.

If anyone has any other observations on this problem, post them here.

Wednesday, September 19, 2007

Google becomes a "space case" with announcement of $30 moon landing prize

Google is offering $30 million dollars to the first team that can land an unmanned spacecraft on the moon. The craft must be able to traverse the moon's surface and broadcast video back to earth, plus seek out relics left by the Apollo program.

You know what this is? A huge link-bait scheme by Google - by funding the $30 million prize (a drop in the bucket compared to the total Google market cap and cash on hand). It's nice that Google is doing something to push the scientific boundaries in the U.S., especially considering how poor our next generation of scientists, mathematicians and researchers are going to be. But, I see right through it, and it comes down to Google using this as a marketing strategy, just like online link-bait pages are used to draw attention to a product, service, or business.

In the past, Google has spoken out about the overt use of non-natural link building techniques. And while Matt Cutts seems to promote link-bait schemes, I can see Google penalizing link-baiting in the future because of a claim of "tainting the search results".

Link-baiting is nothing more than smart marketing, and is VERY similar to the $30 million Google is throwing at this moon landing project.