The Engine

If you stare at the lines in the road, you’ll never see where you’re going.

While the Devil might be in the details, if you want to know where he’ll be you need to look for the pattern. Pattern-recognition is one of the trickiest pieces of programming, partially because we know very little about how our brains work, and partially because humans make it look so damned easy. Our brains are designed to spot and record patterns.

Sure, it takes a long time. Evenutally, enough memories gel to enable us to look back and assign confidence of some correlation (if not causation.)

Red sky at night, sailor’s delight. Red sky at morning, sailor take warning.

That won’t hold up against the very best prediction engines we have, but it got us down that path. You go back and examine what happened before an event to determine what contributed.

Meanwhile, 200,000 people died at the end of 2005 because no one made the correlation that a big underwater earthquake might pose a problem. There was no warning for the tourists who were trapped by the wall of water. There was a warning for the Indonesian natives who knew that a sudden and unexpected ebb tide was a sign of a huge wave coming in. Many of the animals knew it, too.

It always seems far simpler when you know what the trigger is, instead of drowning yourself in a myriad of probably inconsequential details.

Mapping Ourselves

This post is more straightforward than I wanted it to be. It was originally envisioned as a short story describing the greatest effort in the fictional history of programming. Only I am not so sure it is fictional. It is too possible to not be probably, and world domination is potentially at stake.

I rush this essay into publication because of a fractured discussion with Steve Rubel, who tracks all things technology. Google has announced it is shutting down several of its free webservices, and Rubel mentioned in passing that Google Reader might not survive the next round without showing some value.

Reader is an RSS feed reading application. You ‘subscribe’ to blogs and content that you like online, and Reader ‘delivers’ it to you. Instead of clicking links and folders and bookmarks, the Web you like comes to you in one quick and tidy place. You can also subscribe to feeds of news searches, or mix and match your own sources.

Google Reader offers a couple of very useful features. One, all the feeds I subscribe to are searchable using the Google indexing and algorithm. I don’t have to ‘file’ things in my Reader, I can always find them later. If I choose, I can add whatever tags and descriptors I deem relevant. Second, I can share links with people I know. Real-life friends, business acquaintances, or those I network with online. I actually get ‘feeds’ of what my network of people found interesting, and we can add notes to each other pointing out key facts or summaries.

Soup to Nuts

What makes this interesting is Google’s role throughout. In the course of the content, Google plays several key roles. Not a monopoly, but it is a player in:

Content creation (Blogger, YouTube)
Content delivery (Feedburner)
Content aggregation (Reader)
Content discovery (Search)
Content sharing (Shared links in Reader)

No one else is a significant-enough player in all these aspects of the Information Age to track what is said to whom, and when it happens.

If you’re Google, you have Willy Wonka’s Golden Ticket: the map to the influencers. There’s a huge debate raging among the marketers and the public relations people and the politicos over just who has influence, and how you locate them.

Finding Influencers is important, because it allows you to target your message or your plea to only those people that really matter for a given function or moment. Those Influencers can change over time or given a different objective, but locating them is the key.

From the instant someone creates a video or a blog post, Google knows what is in it. (Again, there are other services, but Google gets enough of the video and blog business to make this scale.) Google also knows:

when it arrives in your Reader
when you read it
how you mark it
when you share it
when your friends read it
when they act on it…

…and the cycle continues. From Soup to Nuts, Google can know which people start the online tremors that lead to popularity of content. ‘Viral’ is no longer a marketing mystery – Google has the data to find the epidemiology.

The Ticket is the Beginning

Willy Wonka’s Golden Ticket wasn’t the prize itself – it was the step you had to clear to get to the prize.

Sure, Google could sell some of those results. It could offer up premium information to advertisers, or even offer direct targeted ads at the highest of most high Influencers. We’re not talking about the Pete Cashmores or Steve Rubels of the world – we want the people who are more likely to seed them with inspiration and information.

But even that isn’t the prize. Stay with me here.

Get several million people on Blogger, a large contingent of content producers. Get a couple of million more on Google Reader, and then sit back for about five years while they share data like no one’s business.

Except it is your business. You need to understand who the Nodes are, and how much time elapses between certain events. You need to learn how to look beyond the tiny pieces of data as individual bits, and instead look at the whole. Big picture, a bunch of water droplets becomes a cloud. And under certain atmospheric conditions, that cloud looks red.

Google isn’t going to drop Reader, because it needs us to keep feeding the data beast. It will take a good five years of collection (and maybe a couple of more concentrating on the data visualization to make it feasible, but isn’t that why Google hired all those engineers and algorithm people?)

Once you know what a ripple looks like, and the content of that ripple, you can track it. And you start to see the others. And eventually, you start to identify the ripples that preceded a discrete event instead of the ones that followed.

Making Waves

Google is building the world’s largest prediction engine. It’s now in a learning phase, and an early one at that. It’s building a new Vocabulary of Influence, not to sell us products but instead to tell the future. All you need is a series of similar events that you can compare, and look for correlating ripples that came before. Certain punctuated events would have no meaning, like outcomes of Super Bowls. But something like, say, a quarterly stock report, would be easy to parse.

It would be regular enough (and have a large enough data mine of its own) that you could put the most powerful computers to work just looking for the pattern. And as the owner of the ONLY data set that traces complete ripples of influence, there is no break in the chain to cloud the data.

Maybe the predictions will come with just a few hours notice, like a tsunami warning system. Maybe it will evolve into a longer-range forecasting tool for economics or finance. What could Larry and Sergey do with the Ginormous Google Gigawatt Crystal Ball? Other than promise us they Won’t Be Evil?

The truth is right there in your Google Reader, and the Devil is in your details. And that is why the High Holy Priests of Mountain View will never bring Reader to the sacrificial altar.