# # (C) Copy and Copyright 2007 Lumination Aps User-agent: * Disallow: # [:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:] # # # # ----------------------------=*> Lumination <*=--------------------------- # # # # [:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:] # # =========================================================================== # # # # Can Google Predict The Future? # # # # # # =========================================================================== # # # # Google posted an AdSense blog entry this week stating that most # misclicks by AdSense vendors were automatically discounted. # http://adsense.blogspot.com/2007/05/accidents-happen.html # # "...chances are we've already detected your clicks on your ads and # discounted them." # # I was surprised to see a few noted SEO's weight in with naive # responses I have rarely seen in the SEO space. # # I would guess that Google can ID your click with 95% accuracy. The # other 5% of the clicks, they can throw out on pure guesstimation. # They don't have to be 100% certain to be able to toss a click out # with a high degree of confidence that the click was errant. # # How does Google know it is you clicking on your ads? # # 1- cookies. # # 2- your ip matches ips that have logged into Adsense control panel. # Or a login to the panel matches a previous click on your site. # # 3- you page view behavior matches an owners page view behavior. This # is by far the most common method used by Google. It is easy to ID an # owner of a site after very few numbers of page views. Google simply # tracks your ip behavior as you view your own site and ads are served # to you. Read some of the recent stuff on click fraud - it is pretty # clear this is the top way Google is tracking bad clicks. # # #4: Additionally, the majority of IP's on the cable networks are # dynamic, but dynamic within a block. Thus, it is deducible to know # that if Bob's ISP is Comcast and a Comcast address has viewed 200 # pages on his site and the same C block logged into his control # panel, and the same d block is on the Cookie - given his path # behavior - it is pretty safe bet we can throw out those clicks. # # #5: Here is another one: lets say you are using a stock piece of # blog software or blog service. Many of those pieces of software # allow one template and one template only. So you serve Google ad # code, to even your blog admin panel. Google sees an attempt to load # an ad from a restricted url on your site - presto, it has you. The # number of blind urls Google would have to check against would be # less than 10 to match 90+% of the major blogging software out there. # # # #6: Two words: Google Toolbar # # Long story short - yes Virginia, Google knows who you are from your # click. That's not the question - the question is, even if they know # it is you, how many do they left fly by without discounting them? # # Everything talked about so far is child's play that any # knowledgeable webmaster can duplicate. Now lets get a little more # advanced: # # Often overlooked is the widespread usage of Google AdSense code. # That code is living on millions (maybe billions) of pages. If you # surf a lot of sites in a day, you are loading that code hundreds, to # thousands of times a day. As you load it, you are leaving a trail. # Every time you load that code, you are leaving information on # Googles ad servers. Sooner or later, those bits of information add # up into a pattern that can be used to identify you with a high # degree of accuracy. # # For example, if Bob starts his typical morning run by surfing: # # 1- foosite.com news. # 2- bigsite.com blog. # 3- fooweather.com weather. # 4- bobs-site.com/wordpress/ # # Most people do something similar. A few to a dozen of our favorite # sites and pages make up your average morning run for most internet # users (especially webmasters). Even if Bob switches user agents, # ips, and even some of the sequence to his daily habits, there is # little doubt Google could ID Bob out of millions of users, simply # from his click and advertising behavior. # # Deja vu? Any of this sound familiar? It is the same type of pattern # recognition search engines use to find duplicate content on # websites. # # Every time you load that AdSense code, on any site, you are leaving a # bread crumb trail of information. # # Again...dig up some of what Google has talked about recently at # conferences in reference to ClickBot detection, it is fascinating to # see just how far Google has went at detecting users/bots/mischief. # # That's exactly where we were headed Hobbs. # # So we have went from basic to advanced detection. Now, lets get # leading edge by looking further at heuristic methods of prediction. # # There is AdSense code on a few associated keyword sites, Google # already knows: # # 1- The path most users take when viewing those sites (due to tool # bar and Adsense data). # 2- What sites most of those visitors visit. # 3- How often most users stay on those particular pages and sites. # 4- What type of advertising behavior those pages show. # 5- what language is on those sites. # 6- the income range of the audience, # 7- the sex and the age of the audience. # 8- and the general the psychographic make up of the website # audience, etc. # # Essentially, Google knows that Bob likes roses, daisies, orchids, # and wild flowers. Therefore, it is a good bet that he likes Tulips # as well. # # So what if Bob is on vacation in Paris: # # - he visits a public internet cafe. # # - he surfs a few of his favorite sites (not necc any of those from # his morning run, but sites that run AdSense). # # - he surfs a fresh new site that he has never seen before in this # space. # # Now, here is the fun part. Google knows it is Bob. How? # # I don't know the official name for this type of predictivity, but it # is a subset of Psycho-Graphic behavioral targeting (Click # Prediction?). # # After you dig into this line of thinking, you have to start to # conclude that: # # 1- Google is much further along the path than this. # # 2- Googles ability to "predict" user behavior is now a thousand fold # what we are talking about here. # # 3- Googles ability to track, interpret, predict, and act upon # information is now in the scary all-seeing, all knowing range. # # Number three is the most interesting to me. How good has Google # become at predicting events? Think about all the web data Google can # synthesize. # # - news tracking. # - stock tracking around the world. # - web site tracking. # - trend tracking. # - event tracking. # - gmail email reading. # - blogs. #