The technology world is taken by buzzwords: cloud computing, NoSQL, platform as a service.
And of course, big data.
This particular buzzword usually refers to the massive amounts of data created and ingested by companies like Google or Apple and, when the interlocutor is technically-savvy, to the underlying software and infrastructure used to acquire, store, and process it.
Some understand big data as the apex of corporate surveillance. Others see it as the ultimate tool to solve many of humanity’s problems. Although everyone seems to disagree in one way or another, there is one point where all voices raise in absolute harmony: big data is effective and is here to stay.
Big data is in fact so effective that whole businesses have emerged over the course of the last 15 years to track, sort, store, and understand an amount of data that has never been seen before in human history.
They range from the most diverse fields like biology, pharma, security, military, financial services, logistics, communications, and social media to cite a few. In those fields, big data analysis techniques are used to discover new drugs, optimize supply chains, simulate nuclear detonations, and share feline photos to billions of people.
The Scale of Today’s Data
But how big is big data nowadays? Let’s put this into perspective…
Google estimates that nearly 130 million books have been published in modern history. If the average size of a book was 500,000 characters this would translate to roughly 500 terabytes of data.
While that sounds like a whole lot of data, it pales in comparison to Walmart’s customer transactions that generate roughly five times that amount – 2.5 petabytes – in a single hour.
Yahoo’s Hadoop cluster capacity is rated at over 455 petabytes. Just for fun, CERN’s Large Hadron Collider has the ability to generate over 500 exabytes per day – more than 200 times the combined capacity of all other sources in the world. And one million times our humble 500 terabytes of books.
Not bad for a day’s work.
When Size Doesn’t Matter
It’s definitely BIG. But is being big good enough? It depends on your definition of the word.
Let’s take the online advertising space: there are likely millions of people viewing ads at any given moment. They are watching videos, clicking banners, buying products and services. This generates a respectable flux of impression data and human interaction data (ex: clicks) to base current and future campaign performance off.
Everything seems perfect, right? Well, not exactly.
Traditional online advertising uses what we call ‘pixels’ to track impressions. Those are transparent 1×1 images that are typically displayed when an advertisement is shown. The pixels then generate server-side log entries that are examined to extract impression and click data.
Using this technique, we can get things like IPs, timestamps, cookies, device IDs, user agents, and more.
Data in the Real World
While this is valuable information and can be used effectively to track ad impressions, it falls short in one emerging concern in the mobile advertising industry: viewability.
Viewability can be defined as the ability to tell whether one particular impression was seen by an actual human being.
In a world of rampant digital fraud, improperly integrated apps, and spiders/bots, this turns out to be quite a feat, and – since companies and marketing agencies are growing more and more interested in showing ads to potential customers – it’s a feat that is required with growing importance and increasing frequency.
This is where the big data of traditional online advertising companies usually fail.
While they succeed in having more than enough data points, they do not provide the richness and context around individual impressions that is required to solve the viewability problem.
Navigating Uncharted Territory
So where should advertisers and ad tech vendors begin?
A complete analysis and understanding of the problem is a great first step. Industry trade organizations are beginning to align, not only on the definition of the problem, but on solutions that may help fix them.
- We can filter the good placements from the bad ones.
- We can detect fraudulent activity and respond to it in a timely manner.
- We can help our customers deliver their ads to people that matter instead of some shady server in East Europe.
Others have noticed the need for richness or to find deeper meaning in data, too. Netflix boasts, (rightfully) about their big data and social-driven successes. Google’s push into deep learning helps them power things like Android’s Speech Recognition service and Google+ photosearch. Tim Berners-Lee, the creator of the Internet, argues that “if a computer collated data from your doctor, your credit card company, your smart home, your social networks, and so on, it could get a real overview of your life“.
More Than Just Numbers
When most people think about big data, they simply see a huge amount of data points.
But what truly matters most of the time is how those data points are related and work together. The richness of data is what gives us true insight into the hidden reality behind it, and is what teaches us how to effectively apply it to solve real world problems.
So next time somebody tells you they work with big data, ask them: “Cool, but how rich is it?”